CSC161 2010F, Class 40: Sorting, Revisited
Overview:
* Templates for sorting algorithms.
* Review: Some sorting algorithms.
* Choosing a sorting algorithm.
* Implementing Mergesort
* Assignment 7.
* No Lab?
Admin:
* For those of you who are planning to declare CS majors ...
We'd like to encourage you to think about study abroad.
* We hope to have prospectives in class on Friday.
Think about what you want to ask them.
* ETHEL concert tonight.
* EC for Thursday extra tomorrow: How did you get placed in Math and
Stats and CS? (4:15 refreshments; 4:30 talk)
* Reading for Friday: K&R 7.5-7.7.
In implementing sorting algorithms, we want to strive for genericity
- algorithms that will work for multiple types so that
(a) We can easily make our algorithm work for a new type
(b) We can easily choose which algorithm to use for a particular probelm.
Ideally, call all of the algorithms "sort"
* But C doesn't allow this
* So we call them all PREFIX(sort)
E.g. int_sort or float_sort or string_sort or student_sort
* We use the same name for all implementations
Simple declaration:
void PREFIX(sort) (TYPE values[], int size);
K&R describe a different kind of genericity (and have a different
signature).
* We pass in the comparison method as a parameter
void sort(char *values[], int size, int compare(char *v1, char*v2));
* Assumes a "compare" function that returns
negative for v1 precedes v2
0 for v1 equals v2
positive for v1 follows v2
* We've traditionally used a boolean may_precede
void sort(char *values[], int size, int may_precede(char *v1, char*v2));
* Which do you prefer?
* may_precede sounds clearer
* compare sounds clearer
* compare lets you more easily ask a wider variety of questions
We'll stick with may_precede for now
How do we turn the K&R code into something that will work for
arbitrary types, not just strings?
void PREFIX(sort)(TYPE values[], int size, int may_precede(TYPE v1, TYPE v2));
Use a generic array!
void *values, int element_size;
---
What ways do we know to sort?
* Merge sort - Divide into two halves, sort each half, merge 'em back together
O(nlogn)
* Quicksort - Pick a random "pivot", divide into smaller/larger, sort each part
O(nlogn)
* Insertion sort - Split into unsorted and sorted parts and repeatedly inset value from unsorted into sorted
O(n*n)
* Selection sort - Select largest/smallest element and move to end
O(n*n)
* Bubble sort - Repeatedly swap neighboring elements; larger values
bubble up, smaller values bubble down
O(n*n)
* Shell sort - Like insertion sort, but starting with bigger "gaps" between
values.
O(n*n)
* Bucket sort - split into "buckets".
Depends
You need to choose a sorting algorithm. What criteria do you use to
decide which one to use?
* Consider the structure of the data; it may be partially sorted already
We have a huge sorted array, and we want to add comparatively few
things and re-sort
We have a huge sorted array, we've changed a few values, and we need
to re-sort.
* Running time. We'd like faster routines!
* Use of space.
* Locality of reference.
* Best, worst, average case performance.
* Stability.
If A comes before B in original and A=B, then A comes before B in sorted
* Why?
* If we have old/new and want to preserve that, even for equal
elements.
* If we want to sort by both novelty and something else
First sort by novelty
Then stable sort by something else
* In general, gives you the opportunity to sort by two keys
* Sort by first anme
* Then sort by last name
* Effectively sorts by last+first
Jane Smith
Joe Smith
Jenna Smith
Joe Doe
Jane Doe
Jack Smith
Jane Daniels
Jack Smith
Jane Daniels
Jane Doe
Jane Smith
Jenna Smith
Joe Doe
Joe Smith
Quicksort is not stable
Let's write mergesort