Merge Sort

Summary: In a recent reading and the corresponding laboratory, we've explored the basics of sorting using insertion sort. In this reading, we turn to another, faster, sorting algorithm, merge sort.

The Costs of Insertion Sort

In looking at algorithms, we often ask ourselves how many “steps” the algorithm typically uses. Rather than looking at every kind of step, we tend to focus on particular kinds of steps, such as the number of times we have to call vector-set! or the number of values we look at.

Let's try to look at how much effort the insertion sort algorithm expends in sorting a list of n values, starting from a random initial arrangement. Recall that insertion sort uses two lists: a growing collection of sorted values and a shrinking collection of values left to examine. At each step, it inserts a value into the collection of sorted values.

In the worst case, the value we're inserting should be preceded by all the values in the list we're inserting it into. In that case, we end up comparing the value to all the elements in the list. Since we insert into lists from size 1 to n-1, the average list size is n/2. We do n such insertions, so the number of comparisons is approxmately n²/2. (More precisely, it's the sum 0 + 1 + 2 + ... + n-1, which ends up being n*(n-1)/2.)

We get this effect, for example, when sorting a list of integers that is already in order. We insert the smallest, then the next smallest (which goes to the end, then the next smallest (which goes to the end), and so on and so forth.

> (map (lambda (n) (/ (* n (- n 1)) 2)) (list 10 20 40 80))
(45 190 780 3160)
> (analyze (list-insertion-sort (iota 10) <=) may-precede?)
may-precede?: 45
Total: 45
(0 1 2 3 4 5 6 7 8 9)
> (analyze (list-insertion-sort (iota 20) <=) may-precede?)
may-precede?: 190
Total: 190
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)
> (analyze (list-insertion-sort (iota 40) <=) may-precede?)
may-precede?: 780
Total: 780
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 .....
> (analyze (list-insertion-sort (iota 80) <=) may-precede?)
may-precede?: 3160
Total: 3160
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 .....

In the best case, the value we're inserting precedes all the values in the list we're inserting it into. In that case, we end up only comparing it to the first element of the list. Since there are n-1 times we insert into a non-empty list, there are approximately n-1 comparisons.

We get this effect, for example, when sorting a list of integers that is arranged from smallest to largest, and putting them in the order largest to smallest.

> (analyze (list-insertion-sort (iota 10) >=) may-precede?)
may-precede?: 9
Total: 9
(9 8 7 6 5 4 3 2 1 0)
> (analyze (list-insertion-sort (iota 20) >=) may-precede?)
may-precede?: 19
Total: 19
(19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0)
> (analyze (list-insertion-sort (iota 40) >=) may-precede?)
may-precede?: 39
Total: 39
(39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1 .....
> (analyze (list-insertion-sort (iota 80) >=) may-precede?)
may-precede?: 79
Total: 79
(79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 5 .....

Since there's such a big difference between the worst case and the best case, we should probably consider the “average” case. There are, unfortunately, a variety of definitions of average. We'll chose a simple one. In particular, we'll say that, on average, the insert routine needs to look through about half of the elements in the sorted part of the data structure to find the correct insertion point for each new value it places. The size of that sorted part increases linearly from 0 to n, so its average size is n/2 and the average number of comparisons needed to insert one element is n/4. Taking all the insertions together, then, the insertion sort performs about n²/4 comparisons to sort the entire set. That is, we do an average of n/4 comparisons for each insert, and we do n inserts, giving n²/4.

This function grows much more quickly than the size of the input list. For example, if we have 10 elements, we do about 25 comparisons. If we have 20 elements, we do about 100 comparisons. If we have 40 elements, we do about 400 comparisons. And, if we have 100 elements, we do about 2500 comparisons.

Does that really happen? Let's try it with some “random” lists.

> (define rnum (lambda (n) (if (= 0 n) null (cons (random 1000) (rnum (- n 1)))) .....
> (define random-numbers (rnum 10))
> random-numbers
(577 127 39 10 966 95 649 110 449 805)
> (analyze (list-insertion-sort random-numbers <=) may-precede?)
may-precede?: 35
Total: 35
(10 39 95 110 127 449 577 649 805 966)
> (define random-numbers (rnum 10))
> random-numbers
(347 730 321 367 267 38 886 555 836 623)
> (analyze (list-insertion-sort random-numbers <=) may-precede?)
may-precede?: 34
Total: 34
(38 267 321 347 367 555 623 730 836 886)
> (analyze (list-insertion-sort (rnum 20) <=) may-precede?)
may-precede?: 89
Total: 89
(8 14 16 46 132 201 289 323 415 483 533 537 579 714 875 875 906 942 975 978)
> (analyze (list-insertion-sort (rnum 20) <=) may-precede?)
may-precede?: 129
Total: 129
(8 138 232 308 366 370 377 419 443 452 483 521 553 593 615 864 922 947 960 999)
> (analyze (list-insertion-sort (rnum 40) <=) may-precede?)
may-precede?: 464
Total: 464
(22 36 45 70 95 209 226 227 275 288 298 336 338 344 349 374 435 448 494 496 528  .....
> (analyze (list-insertion-sort (rnum 40) <=) may-precede?)
may-precede?: 344
Total: 344
(3 8 36 55 77 90 146 152 158 193 196 280 294 302 339 362 396 409 440 442 469 500 .....
> (analyze (list-insertion-sort (rnum 100) <=) may-precede?)
may-precede?: 2429
Total: 2429
(7 11 22 30 32 36 45 67 104 112 129 131 161 172 173 174 206 207 223 239 245 252  .....
> (analyze (list-insertion-sort (rnum 100) <=) may-precede?)
may-precede?: 2648
Total: 2648
(1 6 7 7 13 20 24 40 43 44 47 67 68 99 112 120 124 135 148 150 150 179 185 215 2 .....

So, it's not exactly n²/4 steps, for some lists (and we wouldn't expect it to be) but it's close enough for us to be confident that it has a shape fairly similar to n²/4 steps.

Because this function grows so quickly, it becomes quite slow to sort larger lists (say, with more than 1000 values). Hence, it is valuable to find a sorting method that performs fewer comparisons per value in the list, even if it takes more effort to preprocess the list or to write the procedure. In this reading, we explore one such procedure.

Divide and Conquer

What techinques do we know for making algorithms faster? As we saw in the case of binary search, it is often profitable to divide an input in half. We call this technique divide-and-conquer. The strategy works somewhat differently for different domains. For binary search, once dividing the list in half, we could throw away half and then recurse on the other half. Clearly, for sorting, we cannot throw away part of the list. However, we can still rely on the idea of dividing in half. That is, we'll divide the list into two halves, sort them, and then do something with the two result lists.

Here's a sketch of the algorithm in Scheme:

(define new-sort
  (lambda (stuff may-precede?)
    ; If there are only zero or one elements in the list,
    ; the list is already sorted.
    (if (or (null? stuff) (null? (cdr stuff)))
        stuff
        ; Otherwise, split the list in half
        (let* ((halves (split stuff))
               (firsthalf (car halves))
               (secondhalf (cadr halves))
               ; And sort each half
               (sortedfirst (new-sort firsthalf))
               (sortedsecond (new-sort secondhalf)))
           ; Do some more stuff
           ???))))

Merging

But what do we do once we've sorted the two sublists? We need to put them back into one list. Traditionally, we refer to the process of joining two sorted lists as merging. It is relatively easy to merge two lists: You repeatedly take whichever element of the two lists should come first. When do you stop? When you run out of elements in one of the lists, in which case you use the elements of the remaining list. Putting it all together, we get the following:

;;; Procedure:
;;;   merge
;;; Parameters:
;;;   sorted1, a sorted list.
;;;   sorted2, a sorted list.
;;;   may-precede?, a binary predicate that compares values.
;;; Purpose:
;;;   Merge the two lists.
;;; Produces:
;;;   sorted, a sorted list.
;;; Preconditions:
;;;   may-precede? can be applied to any two values from
;;;     sorted1 and/or sorted2.
;;;   may-precede? represents a transitive operation.
;;;   sorted1 is sorted by may-precede? That is, for each i such that
;;;     0 <= i < (length sorted1)
;;;       (may-precede? (list-ref sorted1 i) (list-ref sorted1 (+ i 1)))
;;;   sorted2 is sorted by may-precede? That is, for each i such that
;;;     0 <= j < (length sorted2)
;;;       (may-precede? (list-ref sorted2 j) (list-ref sorted2 (+ j 1)))
;;; Postconditions:
;;;   sorted is sorted by may-precede?.
;;;     For each k, 0 <= k < (length sorted)
;;;       (may-precede? (list-ref sorted k) (list-ref sorted (+ k 1)))
;;;   sorted is a permutation of (append sorted1 sorted2)
;;;   Does not affect sorted1 or sorted2.
;;;   sorted may share cons cells with sorted1 or sorted2.
(define merge
  (lambda (sorted1 sorted2 may-precede?)
    (cond
      ; If the first list is empty, return the second
      ((null? sorted1) sorted2)
      ; If the second list is empty, return the first
      ((null? sorted2) sorted1)
      ; If the first element of the first list is smaller or equal,
      ; make it the first element of the result and recurse.
      ((may-precede? (car sorted1) (car sorted2))
       (cons (car sorted1) 
             (merge (cdr sorted1) sorted2 may-precede?)))
      ; Otherwise, do something similar using the first element
      ; of the second list
      (else
       (cons (car sorted2) 
             (merge sorted1 (cdr sorted2) may-precede?))))))

Splitting

We know how to sort if we can split a list into two parts, sort the smaller lists, and merge the sorted lists. We can sort the smaller lists recursively. We've just figured out how to merge the two sorted lists. All that we have left to do is to figure out how to split a list into two parts. One easy way is to get the length of the list and then cdr down it for half the elements, accumulating the skipped elements as you go. Since it's easiest to accumulate a list in reverse order, we re-reverse it when we're done. (Merge sort doesn't really care that they're in the original order, but perhaps we want to use split for other purposes.)

;;; Procedure:
;;;   split
;;; Parameters:
;;;   lst, a list
;;; Purpose:
;;;   Split a list into two nearly-equal halves.
;;; Produces:
;;;   halves, a list of two lists
;;; Preconditions:
;;;   lst is a list.
;;; Postconditions:
;;;   halves is a list of length two.
;;;   Each element of halves is a list (which we'll refer to as
;;;     firsthalf and secondhalf).
;;;   lst is a permutation of (append firsthalf secondhalf).
;;;   The lengths of firsthalf and secondhalf differ by at most 1.
;;;   Does not modify lst.
;;;   Either firsthalf or secondhalf may share cons cells with lst.
(define split
  (lambda (lst)
    ;;; kernel
    ;;;   Remove the first count elements of a list.  Return the
    ;;;   pair consisting of the removed elements (in order) and
    ;;;   the remaining elements.
    (let kernel ((remaining lst) ; Elements remaining to be used
                 (removed null)  ; Accumulated initial elements 
                 (count          ; How many elements left to use
                  (quotient (length lst) 2)))
      ; If no elements remain to be used,
      (if (= count 0)
          ; The first half is in removed and the second half
          ; consists of any remaining elements.
          (list (reverse removed) remaining)
          ; Otherwise, use up one more element.
          (kernel (cdr remaining)
                  (cons (car remaining) removed)
                  (- count 1))))))

In the corresponding lab, you'll have an opportunity to consider other ways to split the list. In that lab, you'll work with a slightly changed version of the code.

Putting It Together

We saw most of the merge-sort procedure above, but with a bit of code left to fill in. Here's a new version, with that code filled in (and a few other changes).

(define merge-sort
  (lambda (stuff may-precede?)
    ; If there are only zero or one elements in the list,
    ; the list is already sorted.
    (if (or (null? stuff) (null? (cdr stuff)))
        stuff
        ; Otherwise, 
        ;   split the list in half,
        ;   sort each half,
        ;   and then merge the sorted halves.
        (let* ((halves (split stuff))
               (some (car halves))
               (rest (cadr halves)))
          (merge (merge-sort some may-precede?)
                 (merge-sort rest may-precede?)
                 may-precede?)))))

An Alternative: From Small Lists to Large Lists

There's an awful lot of recursion going on in merge sort as we repeatedly split the list again and again and again until we reach lists of length one. Rather than doing all that recursion, we can start by building all the lists of length one and then repeatedly merging pairs of neighboring lists. For example, suppose we start with sixteen values, each in a list by itself.

((20) (42) (35) (10) (69) (92) (77) (27) (67) (62) (1) (66) (5) (45) (25) (90))

When we merge neighbors, we get sorted lists of two elements. At some places such as when we merge (20) and (42), the elements stay in their respective order. At other places, such as when we merge (35) and (10), we need to swap order to build ordered lists of two elements.

((20 42) (10 35) (69 92) (27 77) (62 67) (1 66) (5 45) (25 90))

Now we can merge these sorted lists of two elements into sorted lists of four elements. For example, when we merge (20 42) and (10 35), we first take the 10 from the second list, then the 20 from the first list, then the 35 from the second list, then the 42 that is all that's left.

((10 20 35 42) (27 69 77 92) (1 62 66 67) (5 25 45 90))

We can merge these sorted lists of four elements into sorted lists of eight elements.

((10 20 27 35 42 69 77 92) (1 5 25 45 62 66 67 90))

Finally, we can merge these sorted lists of eight elements into one sorted list of sixteen elements.

((1 5 10 20 25 27 35 42 45 62 66 67 69 77 90 92))

Now we have a list of one list, so we take the car to extract the list.

(1 5 10 20 25 27 35 42 45 62 66 67 69 77 90 92)

Translating this technique into code is fairly easy. We use one helper, merge-pairs to merge neighboring pairs. We use a second helper, repeat-merge to repeatedly call merge-pairs until there are no more pairs to merge.

(define new-merge-sort
  (lambda (lst may-precede?)
    (letrec (
             ; Merge neighboring pairs in a list of lists
             (merge-pairs
              (lambda (list-of-lists)
                (cond
                  ; Base case: Empty list.
                  ((null? list-of-lists) null)
                  ; Base case: Single-element list (nothing to merge)
                  ((null? (cdr list-of-lists)) list-of-lists)
                  ; Recursive case: Merge first two and continue
                  (else (cons (merge (car list-of-lists) (cadr list-of-lists)
                                     may-precede?)
                              (merge-pairs (cddr list-of-lists)))))))
             ; Repeatedly merge pairs
             (repeat-merge
              (lambda (list-of-lists)
                ; Show what's happening
                ; (write list-of-lists) (newline)
                ; If there's only one list in the list of lists
                (if (null? (cdr list-of-lists))
                    ; Use that list
                    (car list-of-lists)
                    ; Otherwise, merge neighboring pairs and start again.
                    (repeat-merge (merge-pairs list-of-lists))))))
      (repeat-merge (map list lst)))))

The Costs of Merge Sort

At the beginning of this reading, we saw that insertion sort takes approximately n²/4 steps to sort a list of n elements. How long does merge sort take? We'll look at new-merge-sort, since it's easier to analyze. However, since it does essentially the same thing as the original merge-sort, just in a slightly different order, the running time will be similar.

We'll do our analysis in a few steps. First, we will consider the number of steps in each call to merge-pairs. Next, we will consider the number of times repeat-merge calls merge-pairs. Finally, we'll put the two together. To make things easier, we'll assume that n (the number of elements in the list) is a power of two.

Initially, repeat-merge calls merge-pairs on n lists of length 1 to merge them into n/2 lists of length 2. Building a list of length 2 takes approximately two steps, so merge-pairs takes approximately n steps to do its first set of merges.

Next, repeat-merge calls merge-pairs on n/2 lists of length 2 to merge them into n/4 lists of length 4. Building a merged list of length 4 takes approximately four steps, so merge-pairs takes approximately n steps to build n/4 list of length 4.

Each time, repeat-merge calls repeat-merge to merge n/2^k lists of length 2^k into n/2^k+1 lists of length 2^k+1. A little math suggests that this once again takes approximately n steps.

So far, so good. Now, how many times do we call merge-pairs? We go from lists of length 1, to lists of length 2, to lists of length 4, to lists of length 8, ..., to lists of length n/4, to lists of length n/2, to one list of length n. How many times did we call merge-pairs? The number of times we need to multiply 2 by itself to get n. As we've noted before, the formal name for that value is log₂n.

To conclude, merge sort repeats a step of nsteps log₂n times. Hence, it takes approximately nlog₂n steps.

Is this much better than insertion sort, which took approximately n²/4? Here's a chart that will help you compare various running times.

`n`	log₂`n`	`n`²	`n`²/4	`n`log₂`n`
10	3.3	100	25	33
20	4.3	400	100	86
30	4.9	900	225	147
40	5.3	1,600	400	212
80	6.3	6,400	1,600	506
100	6.6	10,000	2,500	660
500	9.0	250,000	62,500	4,483
1000	10.0	1,000,000	250,000	10,000

As you can see, although the two sorting algorithms start out taking approximately the same time, as the length of the list grows, the relative cost of using insertion sort becomes a bigger and bigger ratio of the cost of using merge sort.

Does merge sort really take about nlog₂n? Let's see.

> (define rnum (lambda (n) (if (= 0 n) null (cons (random 1000) (rnum (- n 1)))) .....
> (analyze (merge-sort (iota 10) <=) may-precede?)
may-precede?: 15
Total: 15
(0 1 2 3 4 5 6 7 8 9)
> (analyze (merge-sort (iota 10) >=) may-precede?)
may-precede?: 19
Total: 19
(9 8 7 6 5 4 3 2 1 0)
> (analyze (merge-sort (rnum 10) <=) may-precede?)
may-precede?: 23
Total: 23
(22 246 251 297 440 568 587 760 864 905)
> (analyze (merge-sort (rnum 10) <=) may-precede?)
may-precede?: 22
Total: 22
(14 75 346 424 424 546 613 776 780 885)
> (analyze (merge-sort (iota 20) <=) may-precede?)
may-precede?: 40
Total: 40
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)
> (analyze (merge-sort (iota 20) >=) may-precede?)
may-precede?: 48
Total: 48
(19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0)
> (analyze (merge-sort (rnum 20) <=) may-precede?)
may-precede?: 63
Total: 63
(18 33 193 218 240 269 347 431 556 583 615 630 700 719 757 786 807 809 811 972)
> (analyze (merge-sort (rnum 20) <=) may-precede?)
may-precede?: 64
Total: 64
(19 53 72 83 165 230 247 406 409 436 440 448 503 623 653 691 805 843 886 971)
> (analyze (merge-sort (iota 40) <=) may-precede?)
may-precede?: 100
Total: 100
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 .....
> (analyze (merge-sort (iota 40) >=) may-precede?)
may-precede?: 116
Total: 116
(39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 1 .....
> (analyze (merge-sort (rnum 40) <=) may-precede?)
may-precede?: 164
Total: 164
(16 43 102 146 161 224 259 281 310 325 332 347 412 412 417 446 454 594 603 627 6 .....
> (analyze (merge-sort (rnum 40) <=) may-precede?)
may-precede?: 165
Total: 165
(14 62 66 66 96 135 141 162 179 183 218 252 308 318 333 350 357 401 411 414 424  .....
> (analyze (merge-sort (iota 100) <=) may-precede?)
may-precede?: 316
Total: 316
(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 .....
> (analyze (merge-sort (iota 100) >=) may-precede?)
may-precede?: 356
Total: 356
(99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 7 .....
> (analyze (merge-sort (rnum 100) <=) may-precede?)
may-precede?: 543
Total: 543
(2 12 23 25 28 29 40 61 71 72 82 115 131 138 145 145 149 150 176 200 203 210 218 .....
> (analyze (merge-sort (rnum 100) <=) may-precede?)
may-precede?: 548
Total: 548
(31 32 35 41 43 46 50 85 90 92 100 111 150 173 178 186 207 223 226 234 234 240 2 .....
> (analyze (list-insertion-sort (rnum 500) <=) may-precede?)
may-precede?: 61063
Total: 61063
(4 4 5 6 7 8 9 10 13 13 16 17 18 19 23 24 25 27 28 32 34 36 37 38 39 41 44 44 45 .....
> (analyze (merge-sort (rnum 500) <=) may-precede?)
may-precede?: 3870
Total: 3870
(1 1 1 6 8 8 8 11 11 15 15 16 19 22 22 23 24 25 26 27 29 33 34 35 37 38 43 45 45 .....
> (analyze (list-insertion-sort (rnum 1000) <=) may-precede?)
may-precede?: 241527
Total: 241527
(0 0 2 2 2 4 4 6 8 8 12 13 13 17 18 19 21 21 21 24 25 25 25 25 26 27 28 28 29 29 .....
> (analyze (merge-sort (rnum 1000) <=) may-precede?)
may-precede?: 8662
Total: 8662
(0 1 7 7 11 13 14 16 16 16 17 17 17 18 18 19 20 21 21 22 22 23 23 23 24 24 24 25 .....

So, it does a bit better than predicted for already sorted (or backwards-sorted) lists. (Can you figure out why?) For random lists, the estimate is pretty good. For large lists, merge sort clearly beats insertion sort.

Documenting Merge Sort

You may have noted that we have not yet written the documentation for merge sort. Why not? Because it's basically the same as the documentation for any other sorting routine.