Assignment 8: Sorting Out Sorting

Due: 10:30 p.m., Wednesday, 5 November 2014

Summary: In this assignment, you will experimentally explore the efficiency of a variety of sorting algorithms, as well as variants of those sorting algorithms. algorithms.

Purposes: To help you better understand the core sorting algorithms. To improve your design skills. To help you understand the effects of different factors in the efficiency of algorithms.

Collaboration: Please work with your assigned group. You may discuss this assignment with anyone, provided you credit such discussions when you submit the assignment.

Wrapper (Prologue): Individually read through this assignment and make sure that you understand what is required. Then use the form available at http://bit.ly/207hw8pro to indicate (a) how long you think this assignment will take and (b) what you think will be the most challenging aspect of this assignment.

Wrapper (Epilogue): When you are done with the assignment, fill out the form available at http://bit.ly/207hw8epi to indicate (a) how long the assignment took, (b) what the most challenging part of the assignment was, and (c) something important you learned from doing the assignment. If you find that the assignment took much less or much more time than you expected, also include (d) a note as to what might have led to that difference.

Submitting: Please put all of your work in a GitHub repository named csc207-hw8. Email the address of that repository t Email the address of that repository to grader-207-01@cs.grinnell.edu. Please use a subject of “CSC207 2014F Assignment 08 (Your Names)”.

Warning: So that this assignment is a learning experience for everyone, we may spend class time publicly critiquing your work.

Preparation

a. Fork the repository at https://github.com/Grinnell-CSC207/sorting. (If you've already forked the repository, figure out how to pull recent changes into your copy.)

b. Rename your fork of the repository to csc207-hw8. (You can use the "Settings" button to make that change.)

c. Clone the repository to your disk.

d. Find your notes from the various labs on sorting.

Background

In many cases, the sorting algorithms I provided are designed for clarity more than efficiency. As the legendary Professor Runner has noted, asymptotic behavior is only part of the picture. We'd rather an algorithm that runs in time .0002*nlogn seconds than an algorithm that runs in 10*nlogn seconds. So, even after doing the informal asymptotic analysis, it's worthwhile to gather precise timing information on different algorithms with different kinds of input.

If you look at the repository for this assignment, you'll see that there are some files that may be useful for exploring the running time of sorting routines, particularly SimpleTimer.java, SorterAnalyzer.java, and SampleAnalysis.java.

There are also tests that you can use to ensure that a sorting routine is correct. We don't care how fast a sorting routine is if it's not correct.

If you skim the code for SampleAnalysis.java, you'll see that it has a fairly straightforward structure: It builds an array of Sorter objects, an array of ArrayBuilder objects, and arrays of corresponding names for those objects. It then calls SorterAnalyzer.combinedAnalysis to do the real analysis.

SorterAnalyzer.combinedAnalysis, in turn, has a loop that pairs the first sorter with each builder, and calls SorterAnalyzer.compoundAnalysis using that pair and a variety of array sizes.

SorterAnalyzer.compoundAnalysis, in turn, is supposed to repeatedly call SorterAnalyzer.basicAnalysis and gather statistics. Right now, all it does is call SorterAnalyzer.basicAnalysis once.

Finally, SorterAnalyzer.basicAnalysis uses the builder to build an array, starts a timer, sorts the array using the sorter, stops the timer, and returns the elapsed time.

You may have noted that a few things are missing in the analysis. Your first step will be to add those missing things.

Part 1: Finish the Analysis Suite

a. SorterAnalyzer.compoundAnalysis does not call SorterAnalyzer.basicAnalysis multiple times, even though it should. And because it does the basic analysis only once, it does not gather statistics. Update compoundAnalysis so that it calls the basic procedure the appropriate number of times and gathers useful statistics (minimally, the minimum, average, and maximum runtimes). You will also need to update combinedAnalysis so that it prints out all of the statistics.

b. SorterAnalyzer.combinedAnalysis only uses the first sorting algorithm. Update it so that it runs all sorting algorithms (with all builders and with all sizes).

c. Right now, we only have two array builders. Create at least two more, one of which builds an array that is "mostly" in order (e.g., build one in order and then randomly swap 10% of the elements) and one of which builds an array that is in reverse order.

d. At some point, you should make sure that we have a wider range of array sizes.

Part 2: Finish Basic Sorting Algorithms

In various labs, you were asked to finish some of the sorting algorithms. You may or may not have done so. It's time to finish them now.

a. Write the Utils.merge method, which is needed for the merge sort algorithm. (You'll see four merge methods. Three of the four directly or indirectly call the fourth, so you should just implement that one.)

b. Write the NewQuicksorter.partition method, which is needed for the Quicksort algorithm to finish. (Note:: You have likely written that method recently, even if you did not write it for the Quicksort lab.)

c. Test your two algorithms to make sure that they work with those updates.

Part 3: Variants of Insertion Sort

a. The insert step of insertion sort repeatedly calls Utils.swap. It turns out that procedure calls involve some overhead. Create a new version of InsertionSorter that inlines the calls to swap. (That is, replace the call with three or four lines that swap the two array elements.) Test your new version to ensure that it is correct.

b. It's somewhat pointless to have to swap the value down. We can instead just shift values until we find the space for the value and write it there. Create another version of InsertionSorter that uses the shift policy. Update the invariant analysis to handle that new policy. Test your new version to make sure that is correct.

c. Write a short main class, based on SampleAnalysis, that compares the running times of the three variants of insertion sort. You should put notes about your observations of comparative running times in the code for the new analyst.

Part 4: Variants of Merge Sort

a. Our current version of merge sort has a lot of overhead building new arrays. Ideally, we should be able to create just one scratch array. The merge sort algorithm can then works something like this:

// Sort all the elements
MergeSort(values)
  MergeSort(values, lb, ub, new scratch array)
// Sort elements in positions [lb..ub), using scratch for temporary
// storage.
MergeSort(values, lb, ub, scratch)
  If the subarray is small
    do nothing
  Otherwise
    Sort the first half (using scratch as necessary)
    Sort the second half (using scratch as necessary)
    Copy the sorted halves into scratch
    Merge back into values

Implement this approach in a new class (e.g., MergeSorterB).

b. While we phrase merge sort as a recursive algorithm, it can also be written iteratively. First, we merge singletons into pairs. Then we merge pairs into sorted quadruples. Then we merge the sorted quadruples into sorted octuples. And so on and so forth. (The reading on merge sort has further details.) Create a new class, IterativeMergeSorter, that implements this approach.

c. Write a short main class, based on SampleAnalysis, that compares the running times of the three variants of merge sort (the original and the two others). You should put notes about your observations of comparative running times in the code for the new analyst.

Part 5: Variants of Quicksort

One of the most important factors in Quicksort is the selection of the pivot. So let's explore that selection.

a. Create a new version of NewQuicksorter that changes the selects the middle element of the subarray as the pivot, rather than the first element. Your code should look something like the following:

public class NewerQuicksorter<T>
  extends NewQuicksorter<T>
{
  @Override
  /**
   * Select the middle element of the subarray as the pivot.
   */
  public T selectPivot(T[] vals, Comparator<T> order, int lb, int ub)
  {
    // STUB
    return vals[lb];
  } // selectPivot(T[], vals, Comparator<T> order, int lb, int ub)
} // NewerQuicksorter

b. Create another new version of NewQuicksorter.java that chooses a random element as the pivot. (You should again use the approach of subclassing and then just overriding the selectPivot method.)

c. Create yet another new version of NewQuicksorter.java that takes the median of three randomly selected elements as the pivot. (And, yes, subclassing and overriding remains a good approach.)

d. Write a short main class, based on SampleAnalysis, that compares the running times of the our variants of Quicksort (using the first element as the pivot, using the middle element as the pivot, using a random element as the pivot, using the median of three random values as the pivot). You should put notes about your observations of comparative running times in the code for the new analyst.

Part 6: The Overhead of Generics

a. Pick one of the better sorting methods you identified above (e.g., the best insertion sort, the best merge sort, or the best Quicksort) and rewrite it so that it works with arrays of int values. (Do not convert to objects; do not use comparators; work directly with the ints and operators like <.)

b. Do some experimental analysis to see how much benefit you get from writing a custom method for these values. (Unfortunately, since we're not working with objects, you probably can't use any of the the procedures in SorterAnalyzer. You'll have to write your own equivalents.)