Fundamentals of Computer Science I (CS151.02 2007S)
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]

[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
This reading is also available in PDF.
Summary: We consider a variety of techniques used to put a list or vector in order, using a binary comparison operation to determine the ordering of pairs of elements.
Contents:
Sorting a collection of values  arranging them in a fixed order, usually alphabetical or numerical  is one of the most common computing applications. When the number of values is even moderately large, sorting is such a tiresome, errorprone, and timeconsuming process for human beings that the programmer should automate it whenever possible. For this reason, computer scientists have studied this application with extreme care and thoroughness.
One of the clear results of their investigations is that no one algorithm for sorting is best in all cases. Which approach is best depends on whether one is sorting a small collection or a large one, on whether the individual elements occupy a lot of storage (so that moving them around in memory is timeconsuming), on how easy or hard it is to compare two elements to figure out which one should precede the other, and so on. In this course we'll be looking at two of the most generally useful algorithms for sorting: insertion sort, which is the subject of this reading, and merge sort, which we'll talk about in another reading. In our general exploration of sorting, we may also discuss other sorting algorithms.
Imagine first that we're given a collection of values and a rule for
arranging them. The values might actually be stored either in a list or
in a vector; let's assume first that they are in a list. The rule for
arranging them typically takes the form of a predicate with two parameters that
can be applied to any two values in the set to determine whether the
first of them could precede the second when the values have been sorted.
(For example, if one wants to sort a set of real numbers into ascending
numerical order, the rule should be the predicate <=
;
if one wants to sort a set of strings into alphabetical order, ignoring
case, the rule should be stringci<=?
, and so on.)
Insertion sort works by taking the values one by one and inserting each
one into a new list that it constructs, constantly maintaining the
condition that the elements of the new list are in the desired order with
respect to one another. Clearly, this condition will not be maintained if
each element is added to the new list at the beginning, using
cons
; instead, the insertion sort algorithm adds each element at a
carefully selected position within the new list, placing the new element
after each previously placed element that precedes it according
to the given precedence rule, but before every such element that
it precedes. The following procedure, insertnumber
, adds a new
element to a list in exactly this way. For the moment, we'll assume that
the elements of the list are real numbers and than we want to sort them
into ascending order; <=
is therefore used as the ordering
predicate.
We begin with the procedure used to insert a new value into a list that is already in order.
;;; Procedure: ;;; insertnumber ;;; Parameters: ;;; newelement, a real numbers ;;; sorted, a list of real numbers ;;; Purpose: ;;; Insert newelement into sorted. ;;; Produces: ;;; newls, a new list of real numbers ;;; Preconditions: ;;; sorted is a list of numbers arranged in increasing order. That is, ;;; (<= (listref sorted i) (listref sorted (+ i 1))) ;;; for all reasonable values of i. [Unverified] ;;; newelement is a number. [Unverified] ;;; Postconditions: ;;; newls is a list of numbers arranged in increasing order. ;;; newls is a permutation of (cons newelement sorted). (define insertnumber (lambda (newelement sorted) (cond ((null? sorted) (list newelement)) ((<= newelement (car sorted)) (cons newelement sorted)) (else (cons (car sorted) (insertnumber newelement (cdr sorted)))))))
In English: If the list into which the new element is to be inserted is empty, return a list containing only the new element. If the new element can precede the first element of the existing list, then, since the existing list is assumed to be sorted already, it must also be able to precede every element of the existing list, so attach the new element onto the front of the existing list and return the result. Otherwise, we haven't yet found the place, so issue a recursive call to insert the new element into the cdr of the current list, then reattach its car at the beginning of the result.
The preceding version of the insertnumber
procedure is not
tailrecursive. When dealing with long lists, you may want to use the
following tailrecursive version, which uses memory more economically.
(A tailrecursive
procedure is one in which the recursive call is
not followed by additional work; implementations of Scheme use less memory
for tail recursion than they use for recursion in which the recursive call
is folled by additional work.)
(define insertnumber2 (lambda (newelement sorted) (let kernel ((rest sorted) (bypassed null)) (cond ((null? rest) (reverse (cons newelement bypassed))) ((<= newelement (car rest)) (append (reverse (cons newelement bypassed)) rest)) (else (kernel (cdr rest) (cons (car rest) bypassed)))))))
Of course, our lists typically have more than just numbers in them.
In the associated laboratory you
will experiment with generalized forms of insert
. How do
we generalize? We make the operation that compares values a parameter
to the procedure. We typically call that procedure mayprecede?
.
Here's one possible version of insert
that accepts the
additional parameter.
(define insert (lambda (newvalue sorted mayprecede?) (let kernel ((rest sorted) (bypassed null)) (cond ((null? rest) (reverse (cons newvalue bypassed))) ((mayprecede? newvalue (car rest)) (append (reverse (cons newvalue bypassed)) rest)) (else (kernel (cdr rest) (cons (car rest) bypassed)))))))
Now let's return to the overall process of sorting an entire list. The insertion sort algorithm simply takes up the elements of the list to be sorted one by one and inserts each one into a new list, initially empty:
;;; Procedure: ;;; insertionsortnumbers ;;; Parameters: ;;; numbers, a list of real numbers ;;; Purpose: ;;; Sorts numbers ;;; Produces: ;;; sorted, a list of real numbers ;;; Preconditions: ;;; (none) ;;; Postconditions: ;;; sorted is a list of real numbers. ;;; sorted is organized in increasing order. That is, ;;; (listref sorted i) is less than or equal to ;;; (listref sorted (+ i 1)) for all reasonable values ;;; of i. ;;; sorted is a permutation of numbers. (define insertionsortnumbers (lambda (numbers) (let helper ((unsorted numbers) ; The remaining unsorted values (sorted null)) ; The sorted values (if (null? unsorted) sorted (helper (cdr unsorted) (insertnumber (car unsorted) sorted))))))
Finally, let's consider the rather different case in which the values that we want to arrange are presented as a vector and the goal of the sorting algorithm is to overwrite the old arrangement of those values with a new, sorted arrangement of the same values. This type of sorting is often called inplace sorting.
Instead of constructing a new vector, we partition the original vector into two subvectors: a sorted subvector, in which all of the elements are in the correct order relative to one another, and an unsorted subvector in which the elements are still in their original positions. The two subvectors are not actually separated; instead, we just keep track of a boundary between them inside the original vector. Items to the left of the boundary are in the sorted subvector; items to its right, in the unsorted one. Initially the boundary is at the left end of the vector. The plan is to shift it, one position at a time, to the right end. When it arrives, the entire vector has been sorted.
Here's the plan for the main algorithm.
;;; Procedure: ;;; insertionsort! ;;; Paramters: ;;; mayprecede?, a binary predicate ;;; vec, a vector ;;; Purpose: ;;; Sorts the vector. ;;; Produces: ;;; [Nothing; sorts in place] ;;; Preconditions: ;;; vec is a vector. ;;; mayprecede? can be applied to any two elements of vec. ;;; mayprecede? is transitive. ;;; Postconditions: ;;; The final state of vec is a permutation of the original state. ;;; vec is sorted. That is, ;;; (mayprecede? (vectorref vec i) (vectorref vec (+ i 1))) ;;; for all reasonable values of i. (define insertionsort! (lambda (mayprecede? vec) (let ((len (vectorlength vec))) (let kernel ((boundary 1)) ; The index of the first unsorted value (if (< boundary len) ; If we have elements left to sort (begin (insert! (vectorref vec boundary) vec boundary mayprecede?) (kernel (+ boundary 1))))))))
The insert!
procedure takes four parameters: an element
to be inserted into the sorted part of the vector, the vector itself,
the current boundary position, and the comparison procedure. The new
element can be inserted at any position up to and including the current
boundary position, but it must be placed in the correct order relative
to elements to the left of that boundary. This means that any elements
that should follow the new one should be shifted one position to the
right in order to make room for the new one. (Elements that precede
the new one can keep their current positions.)
;;; Procedure: ;;; insert! ;;; Parameters: ;;; newelement, a value ;;; vec, a vector of values ;;; boundary, an index into the vector ;;; mayprecede?, a binary predicate ;;; Purpose: ;;; Insert newelement into the portion of vec between 0 and ;;; boundary, inclusive. ;;; Produces: ;;; Nothing; called for side effects. ;;; Preconditions: ;;; 0 <= boundary < (vectorlength vec) ;;; The elements in positions 0..boundary1 of vec are sorted. ;;; That is, (mayprecede? (vectorref vec i) (vectorref vec (+ i 1))) ;;; for all 0 < i < boundary2. ;;; Postconditions: ;;; The elements in positions 0..boundary of vec are sorted. ;;; That is, (mayprecede? (vectorref vec i) (vectorref vec (+ i 1))) ;;; for all 0 < i < boundary. ;;; The elements in positions 0..boundary of vec after insert! finishes ;;; are a permutation of newelement and the elements that were in ;;; positions 0..(boundary1) before the procedure started. (define insert! (lambda (newelement vec boundary mayprecede?) (let kernel ((pos boundary)) (cond ; If we've reached the left end of the vector, we've run out of ; elements to shift. Insert the new element. ((zero? pos) (vectorset! vec pos newelement)) ; If we've reached a point at which the element to the left ; is smaller, we insert the new element here. ((mayprecede? (vectorref vec ( pos 1)) newelement) (vectorset! vec pos newelement)) ; Otherwise, we shift the current element to the right and ; continue. (else (vectorset! vec pos (vectorref vec ( pos 1))) (kernel ( pos 1)))))))
How does this work? We assume that there's a space
at position
pos
of the vector. (That is, that we can safely insert
something there without removing anything from the vector.) We know
that the condition holds at the beginning from the description. That is,
the postcondition specifically ignores the value that was in the
boundary position. We also know that the conditional holds from the way
insert!
was called from insertionsort!
, since
the boundary is initially the position of the value we insert.
Now, what do we do? If the position is at the left end of the vector,
there's nothing smaller in the vector, so we just put the new value there.
If the thing to the left of the current position is smaller, we know we've
reached the right place, so we put the value there. In every other case,
the value to the left is larger than the value we want to insert, so we
shift that value right (into the pos
position) and continue
working one position to the left. Since we've copied the value to the right,
it is remains safe to insert something in the position just vacated
(that is, pos1
).
You will have a chance to explore this procedure further in the laboratory.
http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/History/Readings/sorting.html
.
[Skip to Body]
Primary:
[Front Door]
[Syllabus]
[Glance]
[Search]

[Academic Honesty]
[Instructions]
Current:
[Outline]
[EBoard]
[Reading]
[Lab]
[Assignment]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Projects]
[Readings]
Reference:
[Scheme Report (R5RS)]
[Scheme Reference]
[DrScheme Manual]
Related Courses:
[CSC151 2006F (Rebelsky)]
[CSC151.01 2007S (Davis)]
[CSCS151 2005S (Stone)]
Disclaimer:
I usually create these pages on the fly
, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.
This document was generated by
Siteweaver on Thu Sep 13 20:55:17 2007.
The source to the document was last modified on Sun Apr 22 08:56:57 2007.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS151/2007S/Readings/sorting.html
.
You may wish to validate this document's HTML ; ;
Samuel A. Rebelsky, rebelsky@grinnell.eduhttp://creativecommons.org/licenses/bync/2.5/
or send a letter to Creative Commons, 543 Howard Street, 5th Floor,
San Francisco, California, 94105, USA.