CSC161 2010F, Class 40: Sorting, Revisited Overview: * Templates for sorting algorithms. * Review: Some sorting algorithms. * Choosing a sorting algorithm. * Implementing Mergesort * Assignment 7. * No Lab? Admin: * For those of you who are planning to declare CS majors ... We'd like to encourage you to think about study abroad. * We hope to have prospectives in class on Friday. Think about what you want to ask them. * ETHEL concert tonight. * EC for Thursday extra tomorrow: How did you get placed in Math and Stats and CS? (4:15 refreshments; 4:30 talk) * Reading for Friday: K&R 7.5-7.7. In implementing sorting algorithms, we want to strive for genericity - algorithms that will work for multiple types so that (a) We can easily make our algorithm work for a new type (b) We can easily choose which algorithm to use for a particular probelm. Ideally, call all of the algorithms "sort" * But C doesn't allow this * So we call them all PREFIX(sort) E.g. int_sort or float_sort or string_sort or student_sort * We use the same name for all implementations Simple declaration: void PREFIX(sort) (TYPE values[], int size); K&R describe a different kind of genericity (and have a different signature). * We pass in the comparison method as a parameter void sort(char *values[], int size, int compare(char *v1, char*v2)); * Assumes a "compare" function that returns negative for v1 precedes v2 0 for v1 equals v2 positive for v1 follows v2 * We've traditionally used a boolean may_precede void sort(char *values[], int size, int may_precede(char *v1, char*v2)); * Which do you prefer? * may_precede sounds clearer * compare sounds clearer * compare lets you more easily ask a wider variety of questions We'll stick with may_precede for now How do we turn the K&R code into something that will work for arbitrary types, not just strings? void PREFIX(sort)(TYPE values[], int size, int may_precede(TYPE v1, TYPE v2)); Use a generic array! void *values, int element_size; --- What ways do we know to sort? * Merge sort - Divide into two halves, sort each half, merge 'em back together O(nlogn) * Quicksort - Pick a random "pivot", divide into smaller/larger, sort each part O(nlogn) * Insertion sort - Split into unsorted and sorted parts and repeatedly inset value from unsorted into sorted O(n*n) * Selection sort - Select largest/smallest element and move to end O(n*n) * Bubble sort - Repeatedly swap neighboring elements; larger values bubble up, smaller values bubble down O(n*n) * Shell sort - Like insertion sort, but starting with bigger "gaps" between values. O(n*n) * Bucket sort - split into "buckets". Depends You need to choose a sorting algorithm. What criteria do you use to decide which one to use? * Consider the structure of the data; it may be partially sorted already We have a huge sorted array, and we want to add comparatively few things and re-sort We have a huge sorted array, we've changed a few values, and we need to re-sort. * Running time. We'd like faster routines! * Use of space. * Locality of reference. * Best, worst, average case performance. * Stability. If A comes before B in original and A=B, then A comes before B in sorted * Why? * If we have old/new and want to preserve that, even for equal elements. * If we want to sort by both novelty and something else First sort by novelty Then stable sort by something else * In general, gives you the opportunity to sort by two keys * Sort by first anme * Then sort by last name * Effectively sorts by last+first Jane Smith Joe Smith Jenna Smith Joe Doe Jane Doe Jack Smith Jane Daniels Jack Smith Jane Daniels Jane Doe Jane Smith Jenna Smith Joe Doe Joe Smith Quicksort is not stable Let's write mergesort