CSC152 2004F, Class 49: Lists!

Admin:
* Discussion / Moment of Silence
* About exam 3
* About final
* No homework for tomorrow
* Presents

Overview:
* Quicksort
* Lists, in general
* Iterating lists: Cursors and Positions
* Kinds of lists: Simple, Ordered, and Sorted

/Review/

* Designing sorting algorithms
  * Given a collection (e.g., a Vector or array), put the elements in order
  * "In place" vs. "Not in place"
  * Stable: If a preceded b before the sort and a = b, then a still precedes b
* Sorting algorithms we've discussed:
  * Insertion sort: O(n^2)
  * Selection sort: O(n^2)
  * Merge sort: O(nlogn)
    * Divide the vector into half
    * Sort the two halves
    * Merge the two halves together
  * Heap sort: O(nlogn)

/Today: Quicksort/

* Problems with merge sort and heap sort?
  * Heap sort is not stable
  * Merge sort can be stable (not a problem)
  * Both seem to be "not in place" sorting algorithms: Potentially inefficient in terms of memory
* Design goal: Can we have a fast sorting algorithm that doesn't require us to make a new Vector or array?
* One solution: Quicksort (C.A.R. Hoare, I believe)
  * For speed: Do divide and conquer
* Strategy: Don't divide by position, divide by value
  * Small stuff gets moved to the left
  * Large stuff gets moved to the right
* By "small" and "large", we mean "everything in the left half is less than or equal to everything in the right half"
* If we can do that
  quicksort
    (1) partition: rearrange the subvector so that small stuff is at the left and large stuff is at the right
    (2) sort the two halves
    (3) know that the whole thing is now sorted, because everything in the sorted left is smaller than or equal to the sorted right

Running time:  Define t(x) for "the time quicksort takes on x elements"
Hope that "partition" is c*x

t(x) = c*x + 2*t(x/2)

t(n) = 1*c*n + 2*t(n/2)
	expand t(n/2)
t(n) = 1*c*n + 2*(c*n/2 + 2*t((n/2)/2))
	simplify
t(n) = 2*c*n + 4*t(n/4)
	expand t(n/4)
t(n) = 2*c*n + 4*(c*(n/4) + 2*t((n/4)/2))
	simplify
t(n) = 3*c*n + 8*t(n/8)
	expand t(n/8)
t(n) = 3*c*n + 8*(c*(n/8) + 2*t((n/8)/2))
	simplify
t(n) = 4*c*n + 16*t(n/16)
	pattern
t(n) = k*c*n + 2^k*t(n/(2^k)) // works for k = 1, 2, 3, 4, ...
	let k = log_2(n)	// Experience
t(n) = log_2(n)*c*n + n*t(n/n)
	simplify
t(n) = c*n*log_2(n) + n*t(1)
	assume t(1) is some constant, d
t(n) = c*n*log_2(n) + n*d
t(n) is in O(nlogn)

PROVIDED WE CAN REAARANGE ELEMENTS QUICKLY

Suppose we know the median
Start two cursors: One at the left, one at the right
Repeatedly
  Advance the left cursor until you hit a large element (bigger than the median)
  Retreat the right cursor until you hit a small element (less than or equal to the median)
  Swap

How do we identify the median?  It turns out to be hard.
Solution: Luck!
* Pick *some* element of the vector.  Assume it's the median
* Partition using the strategy above
* Sort as before
* If we're really lucky, we get the middle element
* If we're unlucky, we get the largest or the smallest
* Usually we're in the middle

If we're always perfectly lucky: O(nlogn)
If we're always perfectly unlucky: O(n^2)
Experimentally: O(nlogn)
Experimentally: Faster than any of the other sorting algorithms

/On to lists!/

Scheme Lists: The PM (from the PAMI)
* Philosophy:
  * Collections
  * Typically add to front
  * Get the first and all but the first (typically to "visit each element in order")
  * Delete the first
  * Dynamic

Generalized Lists:
* Philosophy
  * Collection of values
  * That you can visit from first to last (iterate)
  * Dynamic (add and delete)

Variations:
* Simple: Add is "somewhere"; items may not maintain order
* Ordered: Add is "here"; items maintain order [The Standard]
* Sorted: Add is "in the right place"; items are iterated in order (from smallest to largest)
  * Like a priority queue without removal

Problem: How do you iterate (visit elements in turn) lists?
How do we talk about this idea in terms of objects?

Two typical solutions:
* Cursors <---
* Positions

Cursor operations:

Simple list operations:
* add
* delete
* iterate