Fundamentals of Computer Science I: Media Computing (CS151.02 2007F)

Merge Sort


Summary: In this laboratory, we consider merge sort, a more efficient technique for sorting lists of values.

Exercise 0: Preparation

Add the code from the end of this lab to your definitions pane.

Exercises

Exercise 1: Merging

a. Write an expression to merge the lists (1 2 3) and (1 1.5 2.3).

b. Write an expression to merge two identical lists of numbers. For example, you might merge the lists (1 2 3 5 8 13 21) and (1 2 3 5 8 13 21)

c. Write an expression to merge two lists of strings. (You may choose the strings yourself. Each list should have at least three elements.)

d. Assume that we represent names as lists of the form (last-name first-name). Write an expression to merge the following two lists

(define mathstats-faculty
  (list (list "Chamberland" "Marc")
        (list "French" "Chris")
        (list "Jager" "Leah")
        (list "Kuiper" "Shonda")
        (list "Moore" "Emily")
        (list "Moore" "Tom")
        (list "Mosley" "Holly")
        (list "Roldan-Santos" "Christian")
        (list "Romano" "David")
        (list "Shuman" "Karen")
        (list "Wolf" "Royce")))

(define more-faculty
  (list (list "Moore" "Chuck")
        (list "Moore" "Ed")
        (list "Moore" "Gordon")
        (list "Moore" "Roger")))

Exercise 2: Reflecting on Merging

a. What will happen if you call merge with unsorted lists as the list parameters?

b. Check your answer by experimentation. To help you understand what is happening, you may wish to modify merge so that it displays the values of sorted1 and sorted2.

c. What will happen if you call merge with sorted lists of very different lengths as the first two parameters?

d. Check your answer by experimentation.

Exercise 3: Splitting

Use split to split:

a. A list of numbers of length 6

b. A list of numbers of length 5

c. A list of strings of length 6

Exercise 4: Sorting

a. Run merge-sort on a list you design of fifteen integers.

b. Run new-merge-sort on a list you design of ten strings.

c. Add the following lines to the repeat-merge helper in new-merge-sort. (Add them directly after the lambda.)

                 (display list-of-lists) (newline)

d. What output do you expect to get if you rerun new-merge-sort on the list from step b?

e. Check your answer experimentally.

f. Rerun new-merge-sort on a list of twenty integers.

Exercise 5: Special Cases

a. Run both versions of merge sort on the empty list.

b. Run both versions of merge sort on a one-element list.

c. Run both versions of merge sort on a list with duplicate elements.

Exercise 6: Is It Sorted?

As you've probably noticed, there are two key postconditions of a procedure that sorts lists: The result is a permutation of the original list and the result is sorted. We're fortunate that the unit test framework lets us test permutations (with test-permutation!). Hence, if we wanted to test merge sort in the unit test framework, we might write

(define some-list ...)
(test-permutation! (merge-sort some-list pred?) some-list)

However, we still need a way to make sure that the result is sorted, particularly if the result is very long.

Write a procedure, (sorted? lst may-precede?) that checks whether or not lst is sorted by may-precede?.

For example,

> (sorted? (list 1 3 5 7 9) <=)
#t
> (sorted? (list 1 3 5 4 7 9) <=)
#f
> (sorted? (list "alpha" "beta" "gamma") string-ci<=?)

Note that we can use that procedure in a test suite for merge sort with

(test! (sorted? (merge-sort some-list may-precede?) may-precede?) #t)

For Those with Extra Time

Extra 1: Splitting, Revisited

Some computer scientists prefer to define split something like the following.

(define split
  (lambda (ls)
    (let kernel ((rest ls)
                 (left null)
                 (right null))
      (if (null? rest)
          (list left right)
          (kernel (cdr rest) (cons (car rest) right) left)))))

a. How does this procedure split the list?

b. Why might you prefer one version of split over the other?

Some Useful Procedures

;;; Procedure:
;;;   merge
;;; Parameters:
;;;   sorted1, a sorted list.
;;;   sorted2, a sorted list.
;;;   may-precede?, a binary predicate that compares values.
;;; Purpose:
;;;   Merge the two lists.
;;; Produces:
;;;   sorted, a sorted list.
;;; Preconditions:
;;;   may-precede? can be applied to any two values from
;;;     sorted1 and/or sorted2.
;;;   may-precede? represents a transitive operation.
;;;   sorted1 is sorted by may-precede? That is, for each i such that
;;;     0 <= i < (length sorted1)
;;;       (may-precede? (list-ref sorted1 i) (list-ref sorted1 (+ i 1)))
;;;   sorted2 is sorted by may-precede? That is, for each i such that
;;;     0 <= j < (length sorted2)
;;;       (may-precede? (list-ref sorted2 j) (list-ref sorted2 (+ j 1)))
;;; Postconditions:
;;;   sorted is sorted by may-precede?.
;;;     For each k, 0 <= k < (length sorted)
;;;       (may-precede? (list-ref sorted k) (list-ref sorted (+ k 1)))
;;;   sorted is a permutation of (append sorted1 sorted2)
;;;   Does not affect sorted1 or sorted2.
;;;   sorted may share cons cells with sorted1 or sorted2.
(define merge
  (lambda (sorted1 sorted2 may-precede?)
    (cond
      ; If the first list is empty, return the second
      ((null? sorted1) sorted2)
      ; If the second list is empty, return the first
      ((null? sorted2) sorted1)
      ; If the first element of the first list is smaller or equal,
      ; make it the first element of the result and recurse.
      ((may-precede? (car sorted1) (car sorted2))
       (cons (car sorted1) 
             (merge (cdr sorted1) sorted2 may-precede?)))
      ; Otherwise, do something similar using the first element
      ; of the second list
      (else
       (cons (car sorted2) 
             (merge sorted1 (cdr sorted2) may-precede?))))))

;;; Procedure:
;;;   split
;;; Parameters:
;;;   lst, a list
;;; Purpose:
;;;   Split a list into two nearly-equal halves.
;;; Produces:
;;;   halves, a list of two lists
;;; Preconditions:
;;;   lst is a list.
;;; Postconditions:
;;;   halves is a list of length two.
;;;   Each element of halves is a list (which we'll refer to as
;;;     firsthalf and secondhalf).
;;;   lst is a permutation of (append firsthalf secondhalf).
;;;   The lengths of firsthalf and secondhalf differ by at most 1.
;;;   Does not modify lst.
;;;   Either firsthalf or secondhalf may share cons cells with lst.
(define split
  (lambda (lst)
    ;;; kernel
    ;;;   Remove the first count elements of a list.  Return the
    ;;;   pair consisting of the removed elements (in order) and
    ;;;   the remaining elements.
    (let kernel ((remaining lst) ; Elements remaining to be used
                 (removed null)  ; Accumulated initial elements 
                 (count          ; How many elements left to use
                  (quotient (length lst) 2)))
      ; If no elements remain to be used,
      (if (= count 0)
          ; The first half is in removed and the second half
          ; consists of any remaining elements.
          (list (reverse removed) remaining)
          ; Otherwise, use up one more element.
          (kernel (cdr remaining)
                  (cons (car remaining) removed)
                  (- count 1))))))

(define merge-sort
  (lambda (stuff may-precede?)
    ; If there are only zero or one elements in the list,
    ; the list is already sorted.
    (if (or (null? stuff) (null? (cdr stuff)))
        stuff
        ; Otherwise, 
        ;   split the list in half,
        ;   sort each half,
        ;   and then merge the sorted halves.
        (let* ((halves (split stuff))
               (some (car halves))
               (rest (cadr halves)))
          (merge (merge-sort some may-precede?)
                 (merge-sort rest may-precede?)
                 may-precede?)))))

(define new-merge-sort
  (lambda (lst may-precede?)
    (letrec (
             ; Merge neighboring pairs in a list of lists
             (merge-pairs
              (lambda (list-of-lists)
                (cond
                  ; Base case: Empty list.
                  ((null? list-of-lists) null)
                  ; Base case: Single-element list (nothing to merge)
                  ((null? (cdr list-of-lists)) list-of-lists)
                  ; Recursive case: Merge first two and continue
                  (else (cons (merge (car list-of-lists) (cadr list-of-lists)
                                     may-precede?)
                              (merge-pairs (cddr list-of-lists)))))))
             ; Repeatedly merge pairs
             (repeat-merge
              (lambda (list-of-lists)
                ; Show what's happening
                ; (write list-of-lists) (newline)
                ; If there's only one list in the list of lists
                (if (null? (cdr list-of-lists))
                    ; Use that list
                    (car list-of-lists)
                    ; Otherwise, merge neighboring pairs and start again.
                    (repeat-merge (merge-pairs list-of-lists))))))
      (repeat-merge (map list lst)))))

Creative Commons License

Samuel A. Rebelsky, rebelsky@grinnell.edu

Copyright © 2007 Janet Davis, Matthew Kluber, and Samuel A. Rebelsky. (Selected materials copyright by John David Stone and Henry Walker and used by permission.)

This material is based upon work partially supported by the National Science Foundation under Grant No. CCLI-0633090. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/2.5/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.