Fundamentals of Computer Science I (CS151.02 2007S)
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]

[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
This reading is also available in PDF.
Summary: In a recent reading, you explored merge sort, a comparatively efficient algorithm for sorting lists or vectors. In this reading, we consider one of the more interesting sorting algorithms, Quicksort.
Contents:
As you may recall, the two key ideas in merge sort are: (1) use the technique known as of divide and conquer to divide the list into two halves (and then sort the two halves); (2) merge the halves back together. As we saw in both the reading and the corresponding lab, we can divide the list in almost any way.
Are there better sorting algorithms than merge sort? If our primary activity is to compare values, we cannot do better than some constant times nlog_{2}n steps in the sorting algorithm. However, that hasn't stopped computer scientists from exploring alternatives to merge sort. (In the next course, you'll learn some reasons that we might want such alternatives.)
One way to develop an alternative to merge sort is to split the
values in the list in a more sensible way. For example, instead of
splitting into about half the elements
and the remaining
elements
, we might choose the to divide into the smaller
elements
and the larger elements
.
Why would this strategy be better? Well, if we know that every small element precedes every large element, then we can significantly simplify the merge step: We just need to append the two sorted lists together, with the equal elements in the middle.
(define newsort (lambda (lst precedes?) (if (or (null? lst) (null? (cdr lst))) lst (let ((smaller (smallelements lst precedes?)) (same (equalelements lst precedes?)) (larger (largeelements lst precedes?))) (append (newsort smaller) same (newsort larger))))))
You'll note that we used precedes?
rather than the
mayprecede?
that we used in previous sorting algorithms.
That's because we really want to segment out the strictly larger and
strictly smaller values. (Other variants of this sorting algorithm
might include the equal elements in one side or the others. However,
such versions require consideration of some subtleties that we'd
like to avoid.)
It sounds like a great idea, doesn't it? Instead of split
and merge
, we can sort by writing smallelements
,
equal elements
,
and largeelements
. So, how do we write those procedures?
A statistician might tell us that the small elements are all the elements
less than the median
and that the large elements are the
elements greater than the median
. That's a pretty good starting
definition. Now, how do we find the median? Usually, we sort the values
and look at the middle position. Whoops. If we need the median to sort,
and we need to sort to get the median, we're stuck in an overly recursive
situation.
So, what do we do? A computer scientist named C. A. R. Hoare had an
interesting suggestion: Randomly pick some element of the list and
use that as a simulated median. That is, anything smaller than
that element is small
and anything larger than that element is
large
. Because it's not the median, we need another name for
that element. Traditionally, we call it the pivot. Is using
a randomlyselected pivot a good strategy? You need more statistics
than most of us know to prove formally that it works well. However,
practice suggests that it works very well.
We can now write smallelements
, equalelements
,
largeelements
by including the pivot.
(define smallelements (lambda (lst precedes? pivot) (cond ((null? lst) null) ((precedes? (car lst) pivot) (cons (car lst) (smallelements (cdr lst) precedes? pivot))) (else (smallelements (cdr lst) precedes? pivot))))) (define largeelements (lambda (lst precedes? pivot) (cond ((null? lst) null) ((precedes? pivot (car lst)) (cons (car lst) (largeelements (cdr lst) precedes? pivot))) (else (largeelements (cdr lst) precedes? pivot))))) (define equalelements (lambda (lst precedes? pivot) (cond ((null? lst) null) ((and (not (precedes? pivot (car lst))) (not (precedes? (car lst) pivot))) (cons (car lst) (equalelements (cdr lst) precedes? pivot))) (else (equalelements (cdr lst) precedes? pivot)))))
You may note that equalelements
does not use
equal?
to compare the pivot to the car. Why not? Because
our ordering may be more subtle. For example, we may be comparing people
by height (one of the many values we store for each person), and two
different people can still be equal in terms of height.
Once we've defined these three procedures, we then have to update our main sorting algorithm to find the pivot and to use it in identifying small, equal, and large elements. Since we use the sublists only once, we won't even bother naming them. We'll call the new algorithm Quicksort, since that's what Hoare called it.
(define Quicksort (lambda (lst precedes?) (if (or (null? lst) (null? (cdr lst))) lst (let ((pivot (randomelement lst))) (append (Quicksort (smallelements lst precedes? pivot) precedes?) (equalelements lst precedes? pivot) (Quicksort (largeelements lst precedes? pivot) precedes?))))))
How do we select a random element from the list? We've done so many times before that the code should be self explanatory.
(define randomelement (lambda (lst) (listref lst (random (length lst)))))
Are we done? In one sense, yes, we have a working sorting procedure. However, good design practice suggests that we look for ways to simplify or otherwise clean up our code. What are the main principles? (1) Don't name anything you don't need to name. (2) Don't duplicate code.
The only thing we've named is the pivot. We've used it three times, which argues
for naming it. More importantly, since the pivot is a randomly chosen
value, our code will not work the same (nor will it work correctly)
if we substitute (randomelement lst)
for each of the
instances of pivot
. Hence, the naming is a good strategy.
Do we have any duplicated code? Yes, smallvalues
,
equalvalues
, and
largevalues
are very similar. Each scans a list of values
and selects those that meet a particular predicate (in the first case,
those less than the pivot, in the second, those equal to the pivot,
in the third, those greater than
to the pivot). Hence, we might want to write a select
procedure that extracts the similar code.
;;; Procedure: ;;; select ;;; Parameters: ;;; pred?, a unary predicate ;;; lst, a list ;;; Purpose: ;;; Create a list of all values in lst for which pred? holds. ;;; Produces: ;;; selected, a list ;;; Preconditions: ;;; pred? can be applied to each element of lst. ;;; Postconditions: ;;; Every element in selected is in lst. ;;; pred? holds for every element of selected. ;;; If there's a value in lst for which pred? holds, then the value is in ;;; selected. (define select (lambda (pred? lst) (cond ((null? lst) null) ((pred? (car lst)) (cons (car lst) (select pred? (cdr lst)))) (else (select pred? (cdr lst))))))
Now, we can write Quicksort without relying on the helpers
smallelements
and largeelements
.
(define Quicksort (lambda (lst precedes?) (if (or (null? lst) (null? (cdr lst))) lst (let ((pivot (randomelement lst))) (append (Quicksort (select (lambda (val) (precedes? val pivot)) lst) precedes?) (select (lambda (val) (and (not (precedes? val pivot)) (not (precedes? pivot val)))) lst) (Quicksort (select (lambda (val) (precedes? pivot val)) lst) precedes?))))))
Can we make this even shorter? Well, the selection of large elements looks
a lot like a use of leftsection
and the selection of small
elements is a lot like a use of rightsection
.
The selection of equal elements we may just have to leave in its more
complex form.
Now we can write something even more concise.
(define Quicksort (lambda (lst precedes?) (if (or (null? lst) (null? (cdr lst))) lst (let ((pivot (randomelement lst))) (append (Quicksort (select (rs precedes? pivot) lst) precedes?) (select (lambda (val) (and (not (precedes? val pivot)) (not (precedes? pivot val)))) lst) (Quicksort (select (ls precedes? pivot) lst) precedes?))))))
Some designers might focus not on the duplication of code between
smallelements
, equalelements
, and
largeelements
and instead
focus on the issue that all we're really doing is splitting lst
into three lists. They might suggest that instead of writing
select
, we write a partition
procedure that
breaks a list into three parts.
;;; Procedure: ;;; partition ;;; Parameters: ;;; lst, a list ;;; pivot, a value ;;; precedes?, a binary predicate ;;; Purpose: ;;; Partition lst into three lists, ;;; one for which (precedes? val pivot) holds, ;;; one for which (precedes? pivot val) holds, and ;;; one for which neither holds. ;;; Produces: ;;; (smallerelements equalelements largerelements), A two element list ;;; Preconditions: ;;; precedes? can be applied to pivot and any value of lst. ;;; Postconditions: ;;; (append smallerelements equalelements largerelements) ;;; is a permutation of lst. ;;; (precedes? (listref smallerelements i) pivot) ;;; holds for every i, 0 < i < (length smallerelements). ;;; (precedes? pivot (listref largerelements j)) ;;; holds for every j, 0 < j < (length largerelements). ;;; Neither (precedes? (listref equalelements k) pivot) ;;; nor (precedes? pivot (listref equalelements k)) ;;; holds for eery k, 0 < k < (length equalelements) (define partition (lambda (lst pivot precedes?) (letrec ((kernel (lambda (remaining smallerelements equalelements largerelements) (cond ((null? remaining) (list smallerelements equalelements largerelements)) ((precedes? (car remaining) pivot) (kernel (cdr remaining) (cons (car remaining) smallerelements) equalelements largerelements)) ((precedes? pivot (car remaining)) (kernel (cdr remaining) smallerelements equalelements (cons (car remaining) largerelements))) (else (kernel (cdr remaining) smallerelements (cons (car remaining) equalelements) largerelements)))))) (kernel lst null null null))))
Here's yet another version of Quicksort that uses this procedure.
(define Quicksort (lambda (lst precedes?) (display lst) (newline) (if (or (null? lst) (null? (cdr lst))) lst (let* ((pivot (randomelement lst)) (parts (partition lst pivot precedes?))) (append (Quicksort (car parts) precedes?) (cadr parts) (Quicksort (caddr parts) precedes?))))))
You've now seen four versions of Quicksort. Which one is best? The last one is probably the most efficient, since it only scans the list once to identify the small elements and large elements. The other three scan the list twice.
Different readers find different versions clearer or less clear. You'll need to decide which you like the most from that perspective (and you might even want to think about how you'd express the criteria by which you make your decision).
History/Readings/Quicksort.html
.
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]

[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
Disclaimer:
I usually create these pages on the fly
, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.
This document was generated by
Siteweaver on Thu Sep 13 20:55:13 2007.
The source to the document was last modified on Tue Apr 24 08:37:40 2007.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/2007S/Readings/Quicksort.html
.
You may wish to validate this document's HTML ; ;
Samuel A. Rebelsky, rebelsky@grinnell.eduhttp://creativecommons.org/licenses/bync/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.