CSC152 2005F, Class 54: Faster Sorts Admin: * Coffee! * No office hours today * Go to Shoba's talk tomorrow (4:15) EC * About the final * Wed 9 p.m., Carols, EC * Cool concert, tonight, 7:30 * Basketball, Tomorrow, 7:00, EC * Other events? Overview: * Sorting, revisited * Strategies for improving algorithms? * Merge sort * Top-down * Bottom-up * Analysis * Quicksort * Partitioning * Analysis /Review/ * What is sorting? Putting something in order * What is "something"? Traditionally, Values in a Vector, Array, or List * How do we determine the order? Traditionally, with a comparator * Techniques * Selection sort: Repeatedly find largest/smallest value and put at end * Insertion sort: Move along and insert each value into a sorted portion * Bubble sort: Look at neighboring values and swap when out of order; big values "bubble" to the top (or small values "bubble" to the front) * Eric hates 'em all because they are all sloooooooooooooooooooooooooooooooooooooooooooooow, that is O(n^2) * Heapsort: Shove 'em into a heap and take 'em out again /Goal: Develop Faster Sorting Algorithms/ * How do we improve the speed of an algorithm? * Divide and conquer * Store previously computed values ("dynamic programming") * Probably not helpful in designing a sorting algorithm * Choose a better data structure (implementation) * Divide and conquer sorting (1) Divide the vector into two halves n (2) Recursively sort the halves 2*t(n/2) (3) Combine the results n t(1) = c t(n) = 2n + 2*t(n/2) t(n) is in O(nlog2(n)) * Different d/c sorting algorithms divide and combine differently * In merge sort, we divide positionally Elements 0 ... n/2 are in the first half, n/2+1 ... n-1 are in the second half * If we have an odd number of elements, just pick a side to put the extra element in * Example: Start with * 3 1 6 4 2 8 5 0 * Split into 3 1 6 4 and 2 8 5 0 * Sort into 1 3 4 6 and 0 2 5 8 * Now what? Repeatedly look at the smallest remaining element from each list and grab the smaller of the two * When a list runs out, just copy the remainder of the other Why is the last step not insertion sort? * (1) Copying values into a new Vector, rather than in place * (2) Finding the correct space for a value is O(1), rather than O(n) Some people rephrase merge sort to emphasize merging * Merge neighboring singletons into sorted pairs * Merge neighboring pairs into sorted quadruplets * Merge neighboring quadruplets into sorted octuplets * ... * Merge neighboring k-lets into 2k-lets * At each "phase", we make a new vector of size N in O(N) steps * log_2(N) such phases --- C.A.R. Hoare Are there other ways to divide the Vector? Can we improve the sorting algorithm by rearranging the values? * If we put the small elements at the left and the large elements at the right, then after recursive sorting we can just "tack 'em together" In Scheme (define sort (lambda (lst compare) (if (null? lst) null (append (sort (select (small compare) lst)) (select (large compare) lst)))))) Unfortunately, determining what is large and what is small is, well, difficult * Technique: Pick a "random" element from the vector, assume that's the middle value; anything smaller is "small", anything larger is "large" * It turns out that this random selection works very well in practice * If you're lucky, O(nlogn) * If you're unlucky, O(n^2) * In practice, O(nlogn), with a smaller constant than the other nlogn sorting algorithms Algorithm called Quicksort Another advantage of Quicksort: You can Quicksort "in place"