EBoard 38: Sorting and algorithm analysis

Approximate overview

  • Admin
  • Background: What is CS?
  • Background: Analyzing algorithms
  • Finding extreme values
  • The problem of sorting
  • Algorithm design time
  • Examples of sorting algorithms
  • Analysis of sorting algorithms
  • Schemification of sorting algorithms

Administrative stuff

Minute of Silence

In memory of the students in Michigan who lost their lives yesterday.

Introductory notes

  • Please fill out the WGMC survey (forthcoming)
  • Happy December! (And Chappy Channukah!)
  • I am in a meeting today.

Projects

Group Apply: Spam filter

  • “Remember, Spam rhymes with Sam”

Upcoming activities

Token Events

  • Build a Professional Website with Wix. Dec 1 at 4pm. See email from CLS.
  • December 3 at 7:30 in Sebring Lewis.
  • “The Meet” this weekend (Swim and Dive): Friday night and Saturday, lots of swimming and lots of diving.
    • Make money timing!
  • Debate Tuesday at 7pm in the Kernel of the HSSC.
    • Young Adult Fiction

Upcoming work

  • Readings for Friday (our last set of readings)
  • MP8: Transforming Web Pages
    • Our last MP.
    • Due Sunday.
    • Presentations Monday.
    • Remember: Working code at (almost) all times. Start small, then add features.
    • Remember: Log your time (in fifteen-minute increments).

Q&A

Why can’t I fill out the end-of-course evaluation?

Because I believe the EOCE should be filled out at the end of the course, on the last day of class. Please plan to bring a laptop that day, if you are able.

What grade will I receive on the labs and readings?

Reasonable attempt: 1.

Not submitted or clearly a placeholder: 0.

Can you tell me more about SoLA 5, which takes place during finals week?

No new topics!

A chance to catch up on any LAs you missed.

Likely to have a much larger window of opportunity (e.g., Monday night through Friday afternoon).

When would you like a review session?

Sunday: None.

Monday (reading session): Many

Do you permit incompletes?

I generally follow College policies on incompletes (e.g., you must fill out a form and complete the work by the College’s specified date; there may be limits on the number of incompletes you take).

I require that you give me the form by the last day of class.

See prior eboards for further notes.

How do you implement match?

Grind up carbon, magnesium, sand, and a little clay.

Place on the end of a piece of cardboard or wood.

Sam will write something about the more CS-y match later.

Should we still log time for work that does not end up in our project?

Yes.

What should Monday’s presentation look like?

It’s up to you. Your goal is to show off what you’ve done (and answer questions).

You can do this live in DrRacket.

You can do this less live in some presentaton software.

Please send me stuff in advance.

Five minutes.

Background: What is CS?

  • Solving problems using computers, including societal problems.
  • The study of algorithms and data structures
    • Algorithms are sets of instructions (intended to solve problems)
    • Data structures speak to the organization of data (e.g., list vs. vector)

Computer scientists study algorithms by …

  • Writing them! (More precisely, identifying problems, generalizing them, and finding algorithms that might solve them.)
  • Analyze them to understand their running time or other resource usage, often without running them (e.g., memory).
  • Prove them correct (or incorrect) mathematically.
  • Applying them to sets of data.
  • Validating them using the scientific method (running experiments on them)
  • Understand them by running them by hand (tracing)
  • Implement them in a computer language.
  • Consider the ethical and social implications of the algorithms.
  • And more … e.g., build languages or hardware systems to provide a platform for implementing algorithms.

Background: Analyzing algorithms

Computer scientists analyze algorithms in many different ways.

  • Readability.
  • Generality.
  • Correctness (essential), mathematically or expermentally
  • Most frequently, resource usage.
    • How long does it take to run an algorithm on input of certain size?
    • “Long” can be wall-clock time or it can be “number of steps”.

Analyzing time

  • We don’t necessarily see a clear function.
  • However, we can often see a clear function if we look at the upper bound values.
  • We try to look for patterns of the bounding function.
  • Common patterns we see
    • Constant, such as car, cdr, vector-ref
    • Logarithmic functions go up by one each time we double the size of the input. Find in a binary search tree, binary search in a vector.
    • Linear: As the input size doubles, the number of steps approximately doubles. map, since it has to work with every element of the list. list-ref, since it has to step through the list until it hits the nth element.
    • nlogn: Between linear and quadratic, much closer to linear
    • Quadratic: When you double the input size, you approximately quadruple the number of steps.
    • Exponential algorithms (e.g., 2^n): When you add one to the input size, you double the work. There are problems for which no known non-exponential algorithm exists; whether such algorithms can exist remains an open question. E.g., traveling salescritter is an n! algorithms

Finding extreme values

;;; (alphabetically-first string) -> string
;;;   strings: both (listof string?) nonempty?
;;; Find the alphabetically first string in the list.
(define alphabetically-first-1
  (lambda (strings)
    (counter-increment! AFC 'alphabetically-first-1)
    (cond
      [(null? (cdr strings))
       (car strings)]
      [(string-ci<=? (car strings) (alphabetically-first-1 (cdr strings)))
       (car strings)]
      [else
       (alphabetically-first-1 (cdr strings))])))

(define alphabetically-first-2
  (lambda (strings)
    (counter-increment! AFC 'alphabetically-first-2)
    (if (null? (cdr strings))
        (car strings)
        (first-of-two (car strings)
                      (alphabetically-first-2 (cdr strings))))))

(define alphabetically-first alphabetically-first-2)
  
;;; (first-of-two str1 str2) -> string?
;;;   str1 : string?
;;;   str2 : string?
;;; Find the alphabetically first of str1 and str2.
(define first-of-two
  (lambda (str1 str2)
    (if (string-ci<=? str1 str2)
        str1
        str2)))

(struct counter-kernel (name counts)
  #:reflection-name 'counter)

;;; (counter name) -> counter?
;;;   name : string
;;; Create a new counter that can be used for the other counter procedures.
(define counter 
  (lambda (name)
    (let ([counts (make-hash)])
      (hash-set! counts "" 0)
      (counter-kernel name counts))))

;;; (counter-get tallies procname) -> integer?
;;;   tallies : counter?
;;;   procname : symbol?
;;; Get the number of times that counter-increment! has been called
;;; with procname since (a) the counter was created or (b) counter-reset!
;;; has been called with that procedure name.
(define counter-get
  (lambda (tallies procname)
    ; hash-ref lets us supply a default value
    (hash-ref (counter-kernel-counts tallies) procname 0)))

;;; (counter-increment! tallies procname) -> void?
;;;   tallies : counter?
;;;   procname : symbol?
;;; Increment the count for a particular procedure.
(define counter-increment!
  (lambda (tallies procname)
    (let ([counts (counter-kernel-counts tallies)])
      (hash-set! counts procname (+ 1 (hash-ref counts procname 0)))
      (hash-set! counts "" (+ 1 (hash-ref counts "" 0))))))

;;; (counter-reset! tallies procname) -> void?
;;;   tallies : counter?
;;;   procname : symbol?
;;; Purpose:
;;;   Reset the counter for the given procedure.
(define counter-reset!
  (lambda (tallies procname)
    (let ([counts (counter-kernel-counts tallies)])
      (hash-set! counts "" (- (hash-ref counts "" 0)
                              (hash-ref counts procname 0)))
      (hash-remove! counts procname))))

;;; (counter-reset-all! tallies) -> void?
;;;   tallies : counter?
;;; Reset the elements of the counter counter.
(define counter-reset-all!
  (lambda (tallies)
    (hash-clear! (counter-kernel-counts tallies))))

;;; (display-counter tallies) -> void?
;;;   tallies : counter?
;;; Print all the values in a counter in a semi-useful way.
(define display-counter
  (lambda (tallies)
    (let ([counts (counter-kernel-counts tallies)])
      (display "Counts for ")
      (display (counter-kernel-name tallies))
      (newline)
      (display "  Total: ")
      (display (hash-ref counts "" 0))
      (newline)
      (hash-for-each counts
                     (lambda (proc count)
                       (when (not (eq? proc ""))
                         (display "  ")
                         (display proc)
                         (display ": ")
                         (display count)
                         (newline)))))))

(define AFC (counter "experiments with alphabetically-first"))

;;; (af-experiment! n) -> void?
;;;   n -> integer? non-negative?
;;; Conduct an experiment with the varous versions alphabetically-first, 
;;;   using (a) a list of length n ordered alphabetically and (b) a list
;;;   of length n ordered reverse-alphabetically.  
;;; n must be less than or equal to 26.
;;; The results of the experiment are printed on the screen.
(define af-experiment!
  (let ([letters (map (o (section make-string 2 <>)
                         integer->char
                         (section + <> (char->integer #\a)))
                      (range 26))])
    (lambda (n)
      (display n) (display " ") (display "ALPHABETICAL") (newline)
      (counter-reset-all! AFC)
      (alphabetically-first-1 (take letters n))
      (alphabetically-first-2 (take letters n))
      (display-counter AFC)
      (display n) (display " ") (display "REVERSE ALPHABETICAL") (newline)
      (counter-reset-all! AFC)
      (alphabetically-first-1 (reverse (take letters n)))
      (alphabetically-first-2 (reverse (take letters n)))
      (display-counter AFC))))

If the words are in order from alphabetically-first to alphabetically-last, af1 and af2 take the same number of steps (linear).

If the words are in order from alphabetically last to alphabetically first, af1 is a lot worse than af2.

If there are ten elements in the list and ordered backwards, af2 takes about ten comparisons, af1 takes about 1000. If there are twenty, af2 takes about twenty comparisons, af1 takes about 1 million.

What is the growth of af1?

  • When you double the input size, the number of steps is multiplied by itself.
  • If you add one to the input size, the number of steps seems to double, which makes it exponential.

Why does it do that?

  • There are a lot of comparisons to do.
  • It feels like we start over, even after finding the first.
  • When we call af1 on a list of size n, it does TWO recursive calls on lists of size n-1
  • You get a doubling every time you add one thing. BAD/DANGEROUS!

Other notes

  • The helper makes things more efficient

We can do without the helper

(define alphabetically-first-3
  (lambda (strings)
    (counter-increment! AFC 'alphabetically-first-3)
    (cond
      [(null? (cdr strings))
       (car strings)]
      [else
       (let ([first (car strings)]
             [assistants-result (alphabetically-first-3 (cdr strings))])
         (if (string-ci<=? firsts assistants-result)
             first
             assisstants-result))])))

Is equivalently fast to af2.

Side note

When you call length to determine whether the length is 1, we get something similar, although not as bad.

10 + 9 + 8 + 7 + ….

The problem of sorting

Given a collection of things, put them in order.

Easy solution: (sort things less-equal?)

But we should be able to write our own sort function.

What is a collection?

  • It could be a vector, in which case we rearrange it.
  • It could be a list.
  • (Also hash, struct, file, tree)

Algorithm design time

Come up with as many algorithms for sorting as you can (for vectors or lists) in five minutes.

Examples of sorting algorithms

In a group of three or four (or maybe five), design at least two sorting algorithms, one for vectors and one for lists.

Algorithm 1: Oh, works on lists (selection sort)

  • Find the greatest element in the list using the algorithm above. (linear)
  • Do it again, consing to the front
    • Note: Consing in the right order is important: If you use tail recursion, you’ll probably cons the largest remaining onto so-far. If you use direct recursion, you’ll cons the smallest remaining onto the recursive call.

Algorithm 2: UM: Universal Masters, works on vectors (bubble sort)

  • Look at neighboring elements, swap if out of order (linear)
  • We have to walk down the line once for every element
  • So approximately n^2 (quadratic)

Algorithm 3: Variant of UM (selection sort)

  • Instead of swapping neighbors, just find the largest and swap to the end.

Algorithm 4: We Are Everybody (quicksort)

  • Find the average
  • Split into smaller than average and greater than average
  • Sort the two halves
  • Shove ‘em back together.

Analysis of sorting algorithms

We’re unlikely to get to this.

Schemification of sorting algorithms

There’s no way we’re getting to this.