# Class 46: O(nlogn) Sorts

Back to Quadratic Sorts. On to Project Discussion.

Held: Monday, 22 November 2004

Summary: Today we consider two important divide-and-conquer sorting algorithms, merge sort and Quicksort.

Related Pages:

Notes:

• How many of you plan to miss class on Wednesday?

Overview:

• Review
• Divide and conquer sorting
• Merge sort
• Iterative merge sort
• Asymptotic analysis
• Quick sort
• Asymptotic analysis

## Review

• We've seen two main sorting algorithms so far
• In insertion sort, the key operation is inserting a value into a sorted subvector.
• In selection sort, the key operation is selecting the largest value.
• Both algorithms are O(n2)
• What do we need to do to ensure that both are stable (that they maintain order)?

## Improving Sorts

• Can we do better? Certainly. We've seen that heap sort is O(nlogn).
• Is heap sort stable?
• Are there other alternatives to heap sort?
• Certainly: There are at least two other O(nlogn) sorts.
• Both are based on the divide-and-conquer strategy.
• In merge sort, we divide in half by position.
• In Quicksort, we divide in half by value.
• We'll look at both in some depth.

## Merge Sort

• The merge sort algorithm begins sensibly for a divide-and-conquer algorithm
```if the portion of interest has one or fewer elements
we're done
otherwise
divide the portion of interest into two halves
sort each half
...
```
• For example, given the vector (5, 4, 3, 1, 6, 2)
• We would sort the (5, 4, 3) part, giving us (3, 4, 5)
• We would sort the (1, 6, 2) part, giving us (1, 2, 6)
• What do we do now? We must merge the two halves together
• 1 is smaller than 3, so it is the first value in our result
• 2 is smaller than 3, so it is the second value
• 3 is smaller than 6, so it is the third value
• 4 is smaller than 6, so it is the fourth value
• 5 is smaller than 6, so it is the fifth value
• only 6 remains, so it is the sixth value
• To merge two lists, we repeatedly compare the first value of each list and take the smaller of the two.
• A problem: Where do you put the result of merging?
• Answer: Merge Sort cannot be done in place; you must make a copy.

### An Iterative Merge Sort

• Instead of doing merge sort in a top-down method, recursively dividing and conquering, we can work bottom up.
• First, merge neighboring elements into sorted pairs.
• Next, merge neighboring pairs into sorted four-tuples.
• Next, merge neighboring four-tuples into sorted eight-tuples.
• And so on and so forth.

### Running Time Analysis

• It's easier to analyze the iterative version.
• Doing all the sets of merges for one size takes O(n) (each step of each merge puts one element in "the right place" in its sublist)
• After logn "level merges", we have a sorted list of size n.
• Hence, merge sort is O(nlogn).

## Quicksort

• The naive way to divide elements in merge sort leads to the need to do complex and out-of-place merging.
• Perhaps we can divide the elements more sensibly.
• In Quicksort, we partition the subvector into two parts: the "small" values and the "large" values.
• Every "small" value is smaller than every "large" value.
• The "small" values are at the left (lower indices) and the large values are at the right (higher indices)
• We then sort the small values in place and the large values in place
• The result is therefore sorted.
• For example, given the vector (5, 4, 3, 1, 6, 2)
• We rearrange the elements to (2, 1, 3, 4, 6, 5), where (2,1,3) are the small elements and (4, 6, 5) are the large elements.
• We sort the two halves.
• The result is (1, 2, 3, 4, 5, 6)
• A problem: How do we determine large and small? It turns out that random selections works pretty well.
• Pick some element of the vector and treat it is as the middle value.
• A problem: How do you partition?
• Step through from left to right and right to left, swapping whenever you encounter elements out of place.
• A subtlety: If you can have repeated values, how do you decide when you're done?
• Normal solution: Don't include the pivot (randomly selected element) in recursive sort

### Running Time Analysis

• It's easiest to assume that we divide in half each time.
• We come up with a recurrence relation. It takes about n steps to partition.
• To sort a list of n elements takes n steps for partitioning and some amount of time to sort each sublist of n/2 elements.
• In "math", we would write t(x) = k + 2*t(x/2), where t(x) is "time to sort x elements".
• Can we solve this?
• t(n) = n + 2*t(n/2)
• t(n) = n + 2*(n/2 + 2*t(n/4)) -- substituted n/2 + 2*t(n/4) for t(n/2)
• t(n) = n + n + 2*2*t(n/4) -- simplified
• t(n) = 2*n + 4*t(n/4) -- simplified again
• t(n) = 2*n + 4*(n/4 + 2*t(n/8)) -- substituted n/4 + 2*t(n/8) for t(n/8)
• t(n) = 2*n + n + 4*2*t(n/8)) -- simplified
• t(n) = 3*n + 8*t(n/8)) -- simplified again
• t(n) = k*n + 2k*t(n/2k) -- generalized
• t(n) = log2n*n + n*t(n/n) -- substituted log2n for k
• t(n) = nlog2n + n*t(1) -- simplified
• t(n) is in O(nlog2n)

Back to Quadratic Sorts. On to Project Discussion.

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed Dec 8 10:37:28 2004.
The source to the document was last modified on Thu Aug 26 20:22:24 2004.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2004F/Outlines/outline.46.html`.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu