CSC152 2006S, Class 32: Heaps and Heapsort Admin: * No readings for break! No coding for break! No nothing! * Of course, those of you who have trouble with Java might want to try to learn a little more by, say, doing the labs again. * Grammar question * Do you say "There is a quarter and two dimes in the piggybank" or "There are a quarter and two dimes in the piggybank"? * Today is another collaborative class (design, not coding). * Have a great break! Overview: * Detour: Thinking about running times * Heaps: Applying divide-and-conquer to data structure design * Detour: Tree terminology * Heaps basics * Heap operations * Implementation details Detour: Running Times * Programmers generally prefer faster algorithms * Our get() method is relatively slow - it must search the *whole* structure * On the other hand, our put() method is fast - it just shoves on the end * Can we make the algorithm faster? * When searching in 151, we went from sequential search to binary search by making some restrictions on the order of values * Binary search assume that the vector of values you're searching is ordered * At each step, you look in the middle of the remaining things and throw away the unnecessary part. * The curve for sequential search is linear: If you have N things in the array, you have to look at all N things * The curve for binary search is logarithmic: If you have N things in the array, you have to look at approximate log_2(N) things * If you double the number of things in the vector, sequential search takes twice as long, binary search takes one more step * We can't use this technnique for priority queues, because keeping it sorted requires N steps * Idea: Instead of making the algorithm divide and conquer, make the data structure do so Trees: The divide-and-conquer data structures * Trees, like linear linked structures, have links between "Nodes" * Each node may have multiple links * In /binary trees/, each node has two links, traditionally called "left" and "right" * Terminology on real white board: parent, child, root, leaf, interior node, depth, level * Computer scientists like (these) trees because? * We're stuck in front of computers so much that they're the only nature we see * Trees can be organized by what's on the left and what's on the right * If we design our trees well, they are relatively "shallow" * A well-designed tree with N nodes has depth log_2(N) * If the algorithm only needs visit one node on each level, it's wicked fast i * How can we organize the tree so that it's easy to find the smallest thing? * Put it on the top (at the root)? peek() takes one step * Put it at the far left? easy to find smallest by going to the far left; perhaps easy to remove Problem: Design a structure that is easy to keep shallow and that makes insertion and removal easy * Use a binary search tree and rebalance (Christine and Emily) * Binary search tree: Smaller things to the left, larger to the rigth * Keep the smallest at the top (Ian) Property that the value associated with a node is smaller than or equal to values associated with children: Heap property Property that every node at every level but last has two children, and all nodes at last level are leaves: Completeness * Complete trees have sizes: 1, 3, 7, 15, 31, 63 A complete tree of N levels has size 2^N-1: Proof by induction A complete tree with M nodes has log_2(levels) Too strict: Near-completenesss: Complete everywhere but the last level. On the last level, all the leaves are shoved to the left. A heap is a nearly-complete binary tree with the heap property. get() = remove smallest * a grab the thing at the root * put the last thing at the last level at the root * repeatedly swap down with smaller child put(val) * Put the new value at the rightmost position of the last level * repeatedly swap the value up with larger parent get() takes 3 + log_2(N) steps put() takes 1 + log_2(N) steps 1 billions puts followed by 1 billion (American) gets: * Original version: BIG (billion billion) * New (heap) version: < 100 billion