CSC152 2006S, Class 32: Heaps and Heapsort
Admin:
* No readings for break! No coding for break! No nothing!
* Of course, those of you who have trouble with Java might want
to try to learn a little more by, say, doing the labs again.
* Grammar question
* Do you say "There is a quarter and two dimes in the piggybank" or
"There are a quarter and two dimes in the piggybank"?
* Today is another collaborative class (design, not coding).
* Have a great break!
Overview:
* Detour: Thinking about running times
* Heaps: Applying divide-and-conquer to data structure design
* Detour: Tree terminology
* Heaps basics
* Heap operations
* Implementation details
Detour: Running Times
* Programmers generally prefer faster algorithms
* Our get() method is relatively slow - it must search the *whole* structure
* On the other hand, our put() method is fast - it just shoves on the end
* Can we make the algorithm faster?
* When searching in 151, we went from sequential search to binary search by making some restrictions on the order of values
* Binary search assume that the vector of values you're searching is ordered
* At each step, you look in the middle of the remaining things and throw away the unnecessary part.
* The curve for sequential search is linear: If you have N things in the array, you have to look at all N things
* The curve for binary search is logarithmic: If you have N things in the array, you have to look at approximate log_2(N) things
* If you double the number of things in the vector, sequential search takes twice as long, binary search takes one more step
* We can't use this technnique for priority queues, because keeping it sorted requires N steps
* Idea: Instead of making the algorithm divide and conquer, make the data structure do so
Trees: The divide-and-conquer data structures
* Trees, like linear linked structures, have links between "Nodes"
* Each node may have multiple links
* In /binary trees/, each node has two links, traditionally called "left" and "right"
* Terminology on real white board: parent, child, root, leaf, interior node, depth, level
* Computer scientists like (these) trees because?
* We're stuck in front of computers so much that they're the only nature we see
* Trees can be organized by what's on the left and what's on the right
* If we design our trees well, they are relatively "shallow"
* A well-designed tree with N nodes has depth log_2(N)
* If the algorithm only needs visit one node on each level, it's wicked fast
i
* How can we organize the tree so that it's easy to find the smallest thing?
* Put it on the top (at the root)? peek() takes one step
* Put it at the far left? easy to find smallest by going to the far left; perhaps easy to remove
Problem: Design a structure that is easy to keep shallow and that makes insertion and removal easy
* Use a binary search tree and rebalance (Christine and Emily)
* Binary search tree: Smaller things to the left, larger to the rigth
* Keep the smallest at the top (Ian)
Property that the value associated with a node is smaller than or equal to values associated with children: Heap property
Property that every node at every level but last has two children,
and all nodes at last level are leaves: Completeness
* Complete trees have sizes: 1, 3, 7, 15, 31, 63
A complete tree of N levels has size 2^N-1: Proof by induction
A complete tree with M nodes has log_2(levels)
Too strict: Near-completenesss: Complete everywhere but the last
level. On the last level, all the leaves are shoved to the left.
A heap is a nearly-complete binary tree with the heap property.
get() = remove smallest
* a grab the thing at the root
* put the last thing at the last level at the root
* repeatedly swap down with smaller child
put(val)
* Put the new value at the rightmost position of the last level
* repeatedly swap the value up with larger parent
get() takes 3 + log_2(N) steps
put() takes 1 + log_2(N) steps
1 billions puts followed by 1 billion (American) gets:
* Original version: BIG (billion billion)
* New (heap) version: < 100 billion