CS152 2006S : Class 49: Faster Sorts

Class 49: Faster Sorts

Held: Monday, May 1, 2006

Summary: Today we consider two important divide-and-conquer sorting algorithms, merge sort and Quicksort.

Related Pages:

Notes:

Are there questions on Exam 3?

In an attempt at supporting today's protest, I will only hold half a class today.

Overview:

Review.

Divide-and-Conquer Sorting.

Merge Sort.

Iterative Merge Sort.

Asymptotic Analysis of Merge Sort.

Quicksort.

Review

We've seen three main sorting algorithms so far

In bubble sort, the key operation is swapping neighboring values that are out of order.
In insertion sort, the key operation is inserting a value into a sorted subvector.
In selection sort, the key operation is selecting the largest value.

All three algorithms are O(n²)

Improving Sorts

Can we do better? Certainly. We've seen that heap sort is O(nlogn).

But heap sort is not stable.

Are there other alternatives to heap sort?

Certainly: There are at least two other O(nlogn) sorts.

Both are based on the divide-and-conquer strategy.

In merge sort, we divide in half by position.

In Quicksort, we divide in half by value.

We'll look at both in some depth.

Merge Sort

The merge sort algorithm begins sensibly for a divide-and-conquer algorithm

if the portion of interest has one or fewer elements
  we're done
otherwise
  divide the portion of interest into two halves
  sort each half
  ...

For example, given the vector (5, 4, 3, 1, 6, 2)

We would sort the (5, 4, 3) part, giving us (3, 4, 5)
We would sort the (1, 6, 2) part, giving us (1, 2, 6)

What do we do now? We must merge the two halves together

1 is smaller than 3, so it is the first value in our result
2 is smaller than 3, so it is the second value
3 is smaller than 6, so it is the third value
4 is smaller than 6, so it is the fourth value
5 is smaller than 6, so it is the fifth value
only 6 remains, so it is the sixth value

To merge two lists, we repeatedly compare the first value of each list and take the smaller of the two.

A problem: Where do you put the result of merging?

Answer: Merge Sort cannot be done in place; you must make a copy.

Instead of doing merge sort in a top-down method, recursively dividing and conquering, we can work bottom up.

First, merge neighboring elements into sorted pairs.

Next, merge neighboring pairs into sorted four-tuples.

Next, merge neighboring four-tuples into sorted eight-tuples.

And so on and so forth.

It's easier to analyze the iterative version.

Doing all the sets of merges for one size takes O(n) (each step of each merge puts one element in "the right place" in its sublist)

After logn "level merges", we have a sorted list of size n.

Hence, merge sort is O(nlogn).

Quicksort

The naive way to divide elements in merge sort leads to the need to do complex and out-of-place merging.

Perhaps we can divide the elements more sensibly.

In Quicksort, we partition the subvector into two parts: the "small" values and the "large" values.

Every "small" value is smaller than every "large" value.
The "small" values are at the left (lower indices) and the large values are at the right (higher indices)

We then sort the small values in place and the large values in place

The result is therefore sorted.

For example, given the vector (5, 4, 3, 1, 6, 2)

We rearrange the elements to (2, 1, 3, 4, 6, 5), where (2,1,3) are the small elements and (4, 6, 5) are the large elements.
We sort the two halves.
The result is (1, 2, 3, 4, 5, 6)

A problem: How do we determine large and small? It turns out that random selections works pretty well.

Pick some element of the vector and treat it is as the middle value.

A problem: How do you partition?

Step through from left to right and right to left, swapping whenever you encounter elements out of place.

A subtlety: If you can have repeated values, how do you decide when you're done?

Normal solution: Don't include the pivot (randomly selected element) in recursive sort

[Skip to Body]
Primary: [Front Door] [Current] [Glance] - [Honesty] [On Teaching and Learning]
Groupings: [EBoards] [Examples] [Exams] [Handouts] [Homework] [Labs] [Outlines] [Readings] [Reference]
Misc: [SamR] [Java 1.5 API] [Espresso] [TAO of Java] [CS152 2004F] [CS152 2005S] [CS152 2005F]

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Tue May 9 08:31:56 2006.
The source to the document was last modified on Thu Jan 12 14:58:07 2006.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Outlines/outline.49.html.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu

Class 49: Faster Sorts

Review

Improving Sorts

Merge Sort

An Iterative Merge Sort

Running Time Analysis

Quicksort