CSC152 2004F, Class 46: nlogn sorts
Admin:
* Math/CS lunch!
* Who will not be here on Wednesday? (Everyone will be here, except Leigh and maybe Kyle)
Overview:
* Review
* Divide and conquer strategies
* Mergesort
* Merging
* Alternative formulation
* Running time analysis
* Quicksort
* Partitioning
* Running time analysis
/Review/
* We were discussing sorting
* Insertion sort and selection sort
* Both are O(n^2)
* Key idea in insertion sort: Insert values from one side (unsorted) into the other (sorted)
* Key idea in selection sort: Repeatedly select the smallest or largest
* Question: Is either stable (that is, does it preserve the order of equal values)?
* Insertion sort should be, provided we insert from left to right and we don't keep moving over equal elements
* Selection sort can be, provided you're careful in how you find the largest or smallest
* Can we do better than O(n^2)?
/Detour: Improving Algorithms/
* Once you come up with an algorithm, you always try to find something faster.
* What happens when you fail?
* What happens when all of your really smart friends fail?
* You form a theorem about the required number of steps:
* Theorem: If your sorting algorithm is based upon comparing and swapping elements, you must spend at least c*nlogn steps
* Proof: Counting. If you don't do that many comparisons, there will be objects never compared directly or indirectly
* The NP complete problems have no known solution that is better than O(2^n)
Sample values: 17a, 14, 8a, 10, 13, 4a, 19, 4b, 8b, 17b, 17c, 17d
/Better Sorting Algorithms/
* Heap sort: Sort by adding elements to a heap (and then removing them in order)
* Is heap sort stable? NO
* How fast is heap sort:
Insert n elements into a heap : O(nlogn)
Remove n elements from a heap : O(nlogn)
Because the two "steps" are in sequence, it's just O(nlogn)
* Goal: Can we write a sorting algorithm that's fast and stable?
* How do we improve algorithms? Sometimes through divide and conquer
* Given a vector to sort, how might we divide it in half?
* Find the middle index.
* First subvector: values at positions start to middle
+ Second subvector: values at positions middle+1 to end
public void mergesort(Vector v)
mergesort(v, 0, v.length-1)
public void mergesort(Vector v, int start, int finish) {
// A one or zero element subvector is already sorted
if (finish <= start)
return;
// Find the midpoint
int mid = (start + finish) / 2;
// Sort the two halves
dac_sort(v, start, mid)
dac_sort(v, mid+1, finish)
// What else?
merge the two halves
}
17a, 14, 8a, 10, 13, 4a, | 19, 4b, 8b, 17b, 17c, 17d
...
4a, 8a, 10, 13, 14, 17a, | 4b, 8b, 17b, 17c, 17d, 19
If we think about this as an out-of-place sorting algorithm ...
* 4a, 8a, 10, 13, 14, 17a
* 4b, 8b, 17b, 17c, 17d, 19
How do we merge these two vectors into one sorted vector?
* First:
* Second: 17b, 17c, 17d, 19
* Result: 4a, 4b, 8a, 8b, 10, 13, 14, 17a, 17b, 17c, 17d, 19
Strategy:
* Repeatedly grab the smallest of the remaining elements (which must be the first value in one of the two vectors)
* O(n) steps
Alternate formulation: Iterative merge sort
* Merge neighboring elements into pairs
* Merge neighboring pairs into quadruplets
* Merge neighboring quadruplets into octuplets
* Merge neighboring octuplets into hextuplets
...
17a, 14, 8a, 10, 13, 4a, 19, 4b, 8b, 17b, 17c, 17d
14, 17a, | 8a, 10, | 4a, 13, | 4b, 19, | 8b, 17b, | 17c, 17d
8a, 10, 14, 17a, | 4a, 4b, 13, 19, | 8b, 17b, 17c, 17d
4a, 4b, 8a, 10, 13, 14, 17a, 19, | 8b, 17b, 17c, 18d
....
for (int size = 1; size <= n; size = size*2)
for (int pos = 0; pos < n; pos = pos + size*2)
merge(values at positions pos to pos+size-1 with
values at postions pos+size to pos+2*size-2)
Running time of iterative?
* One "level" of merges takes O(n) ; Inner for loop
* Each value gets copied once
* We repeat that "level" for each size between 1 and n, but we double the size each time O(logn) ; Outer for loop
* O(nlogn)
Alternate analysis of recursive algorithm: Write expressions and solve them
Let t(x) be "the amount of time merge sort takes on a vector of size x"
* t(x) = 2*t(x/2) + x
That is, it takes t(x/2) to sort the first half
it takes t(x/2) to sort the second half
it takes x to merge the two halves
* t(n) = 2*t(n/2) + n <---
Expand t(n/2)
* t(n) = 2*(2*t(n/4) + n/2) + n
Simplify
* t(n) = 4*t(n/4) + 2n <---
Expand t(n/4)
* t(n) = 4*(2*t(n/4/2) + n/4) + 2n
Simplify
* t(n) = 8*t(n/8) + 3n <---
* t(n) = 2^k*t(n/2^k) + kn
Let k = log_2n
* t(n) = n*t(n/n) + nlog_2n
Simplify
* t(n) = n*t(1) + nlog_2n
Suppose t(1) is a constant, c
* t(n) is c*n + nlog_2n
* t(n) is in O(nlog_2n)
Tomorrow: Attempt to integrate
Wednesday: Sorting, concluded