CSC152 2004F, Class 46: nlogn sorts Admin: * Math/CS lunch! * Who will not be here on Wednesday? (Everyone will be here, except Leigh and maybe Kyle) Overview: * Review * Divide and conquer strategies * Mergesort * Merging * Alternative formulation * Running time analysis * Quicksort * Partitioning * Running time analysis /Review/ * We were discussing sorting * Insertion sort and selection sort * Both are O(n^2) * Key idea in insertion sort: Insert values from one side (unsorted) into the other (sorted) * Key idea in selection sort: Repeatedly select the smallest or largest * Question: Is either stable (that is, does it preserve the order of equal values)? * Insertion sort should be, provided we insert from left to right and we don't keep moving over equal elements * Selection sort can be, provided you're careful in how you find the largest or smallest * Can we do better than O(n^2)? /Detour: Improving Algorithms/ * Once you come up with an algorithm, you always try to find something faster. * What happens when you fail? * What happens when all of your really smart friends fail? * You form a theorem about the required number of steps: * Theorem: If your sorting algorithm is based upon comparing and swapping elements, you must spend at least c*nlogn steps * Proof: Counting. If you don't do that many comparisons, there will be objects never compared directly or indirectly * The NP complete problems have no known solution that is better than O(2^n) Sample values: 17a, 14, 8a, 10, 13, 4a, 19, 4b, 8b, 17b, 17c, 17d /Better Sorting Algorithms/ * Heap sort: Sort by adding elements to a heap (and then removing them in order) * Is heap sort stable? NO * How fast is heap sort: Insert n elements into a heap : O(nlogn) Remove n elements from a heap : O(nlogn) Because the two "steps" are in sequence, it's just O(nlogn) * Goal: Can we write a sorting algorithm that's fast and stable? * How do we improve algorithms? Sometimes through divide and conquer * Given a vector to sort, how might we divide it in half? * Find the middle index. * First subvector: values at positions start to middle + Second subvector: values at positions middle+1 to end public void mergesort(Vector v) mergesort(v, 0, v.length-1) public void mergesort(Vector v, int start, int finish) { // A one or zero element subvector is already sorted if (finish <= start) return; // Find the midpoint int mid = (start + finish) / 2; // Sort the two halves dac_sort(v, start, mid) dac_sort(v, mid+1, finish) // What else? merge the two halves } 17a, 14, 8a, 10, 13, 4a, | 19, 4b, 8b, 17b, 17c, 17d ... 4a, 8a, 10, 13, 14, 17a, | 4b, 8b, 17b, 17c, 17d, 19 If we think about this as an out-of-place sorting algorithm ... * 4a, 8a, 10, 13, 14, 17a * 4b, 8b, 17b, 17c, 17d, 19 How do we merge these two vectors into one sorted vector? * First: * Second: 17b, 17c, 17d, 19 * Result: 4a, 4b, 8a, 8b, 10, 13, 14, 17a, 17b, 17c, 17d, 19 Strategy: * Repeatedly grab the smallest of the remaining elements (which must be the first value in one of the two vectors) * O(n) steps Alternate formulation: Iterative merge sort * Merge neighboring elements into pairs * Merge neighboring pairs into quadruplets * Merge neighboring quadruplets into octuplets * Merge neighboring octuplets into hextuplets ... 17a, 14, 8a, 10, 13, 4a, 19, 4b, 8b, 17b, 17c, 17d 14, 17a, | 8a, 10, | 4a, 13, | 4b, 19, | 8b, 17b, | 17c, 17d 8a, 10, 14, 17a, | 4a, 4b, 13, 19, | 8b, 17b, 17c, 17d 4a, 4b, 8a, 10, 13, 14, 17a, 19, | 8b, 17b, 17c, 18d .... for (int size = 1; size <= n; size = size*2) for (int pos = 0; pos < n; pos = pos + size*2) merge(values at positions pos to pos+size-1 with values at postions pos+size to pos+2*size-2) Running time of iterative? * One "level" of merges takes O(n) ; Inner for loop * Each value gets copied once * We repeat that "level" for each size between 1 and n, but we double the size each time O(logn) ; Outer for loop * O(nlogn) Alternate analysis of recursive algorithm: Write expressions and solve them Let t(x) be "the amount of time merge sort takes on a vector of size x" * t(x) = 2*t(x/2) + x That is, it takes t(x/2) to sort the first half it takes t(x/2) to sort the second half it takes x to merge the two halves * t(n) = 2*t(n/2) + n <--- Expand t(n/2) * t(n) = 2*(2*t(n/4) + n/2) + n Simplify * t(n) = 4*t(n/4) + 2n <--- Expand t(n/4) * t(n) = 4*(2*t(n/4/2) + n/4) + 2n Simplify * t(n) = 8*t(n/8) + 3n <--- * t(n) = 2^k*t(n/2^k) + kn Let k = log_2n * t(n) = n*t(n/n) + nlog_2n Simplify * t(n) = n*t(1) + nlog_2n Suppose t(1) is a constant, c * t(n) is c*n + nlog_2n * t(n) is in O(nlog_2n) Tomorrow: Attempt to integrate Wednesday: Sorting, concluded