CSC153 2004S, Class 27: "Fast" Sorting Algorithms
Admin:
* Questions on exam 2?
* When you say "Don't bother to document," do we have to show testing? Yes.
* When you say "Dbtd," do we have to write the six P's? No.
* When you say "Dbtd," do we have to write a short paragraph? No.
* When you say "Dbtd," do we have to write internal comments? Probably.
* With the section one, can you have more than one question mark? Yes.
* Read Experiments in Java, Chapter 1, for Monday.
* Food and "Drink" to celebrate our last formal day of Scheme
* Yes, we're finally doing fast sorting algorithms
* Unless, of course, Saul and Erik find new and creative ways to distract me.
Overview:
* Consider how to apply the "divide and conquer" technique to sorting
* Key ideas in merge sort.
* Key ideas in Quicksort
* Lab
All the sorting algorithms we discussed are O(n^2)
* Insertion sort - Repeatedly insert values into a "sorted" list
* Selection sort - Repeatedly select largest/smallest
* Bubble sort - Repeatedly swap elements (thereby bubbling the largest element tot he end)
Can we do better?
* Approach one: Try other techniques.
Let's try the "divide and conquer" technique
* Divide the list/vector in half (how?)
* First half / second half
* Sort each half
* How? (Recursively)
* Then do what?
* Merge the two together by repeatedly comparing the top elements
Running time: t(n) =
* Split the vector in two parts: 1 +
* Sorting the two halves: t(n/2) + t(n/2)
* Merge: n
Solve the recurrence relation
t(x) = 1 + x + 2*t(x/2)
t(n) = 1 + n + 2*t(n/2)
t(n) = 1 + n + 2*(1 + n/2 + 2*t(n/4))
t(n) = 1 + n + 2 + n + 4*t(n/4)
t(n) = 3 + 2n + 4*t(n/4)
t(n) = 3 + 2n + 4*(1 + n/4 + 2*t(n/8))
t(n) = 3 + 2n + 4 + n + 8*t(n/8))
t(n) = 7 + 3n + 8*t(n/8))
Generalize
t(n) = 2^k-1 + kn + 2^k*t(n/2^k)
Let k = log_2n
t(n) = 2^(log_2n)-1 + log_2n*n + 2^log_2n*t(n/2^log_2n)
t(n) = n-1 + log_2n*n + n*t(n/n)
t(n) = n-1 + nlog_2n + n*t(1)
Assume t(1) = 1
t(n) is in O(nlogn)
That is faster than O(n^2)
Merge sort: The basic/hard idea is merging the two sorted lists
Quicksort
* Divide the list/vector in half (how?)
* Small values and large values O(n)
* Sort each half (recursively) 2*t(n/2)
* Then do what?
* Shove 'em back together O(1)
"Expected" O(nlogn) behavior
* Sometimes O(n^2), but not regularly
Can we do better? O(nlogn)
* Approach one: Try other techniques. Failed.
* Prove that you can't.
* You cannot do better than O(nlogn) for compare-and-swap sorting
* Use Quicksort
Other sorting algorithms that can be faster
* Bucket sort: Set up "buckets" for different values, shove thigns into each bucket, sort each bucket using some simple sort
O(n+m), where n is number of elements and m is number of buckets
* Radix sort
Arrange by rightmost bit, then next rightmost bit, then ...
O(n*#ofbits)