CSC152, Class 36: Heaps and Heap Sort
Admin:
* Missing: BillupsM
* Syllabus rearranged
* No homework for Friday
* Yesterday's eboard attempts to summarize the whiteboard
* New project stuff assigned
* Questions on the exam?
* Will we have at least five points of "Sam makes spelling mistakes"? Almost certainly.
* Should TestNABQ.java compile? Yes.
* Will Sam be in his office tomorrow? He hopes so.
Overview:
* Implementing trees
* Tree terminology
* Heap basics
* Implementing heap operations
* Storing heaps in arrays
* Heap sort
/Trees/
* A "two-dimensional" version of lists
* A binary tree is either
* Empty
* Combination of value, left subtree, right subtree
* A general tree is either
* Empty
* Combination of value and 0 or more subtrees
* One possible implementation:
public class TreeNode
{
Object value;
TreeNode left;
TreeNode right;
...
}
* Terminology:
* "Binary" no more than two subtrees
* "Root" the starting point of the tree (like car)
* "Leaf" no subtrees
* "Complete Binary Tree": Every non-leaf node has exactly two subtrees
* "Depth" - Distance between the root and a value
* Depth of the root is 0
* Depth of the roots of each subtree of the root is 1
* Depth of the roots of their subtrees is 2
* Etc.
* "Level" - The set of all values at a particular depth
* Claveria's Claim: If there are N values in a tree and the tree is complete (or nearly complete), then the maximum depth is approximately log_2(N)
* Proof? Take 218
* Proof:
* 1 values, 0 depth
* 3 values: 1 depth
* 7 values: 2 depth
* 15 values: 3 depth
* 31 values: 4 depth
* When you see a function increase by one each time N doubles, it is logarithmic (by defn)
* Moral: You can put a lot of values in a fairly shallow tree.
* Moral: If you can make your operations depend on the depth of the tree, you win.
Heap: A tree-based implementation of priority queues
Rules:
* (1) A heap is a nearly-complete binary tree. Every level but the last is complete. Everything at the last level is "shoved to the left"
* (2) The heap property states that every value is of higher priority than the values in the subtrees
* Interesting fact: To check the global heap property, we only need to check local heap property (value vs. values at two subtrees)
What are the methods we need to implement for priority queues?
* peek(): Find the highest-priority (smallest) value
Easy: Look at the root
Time: O(1)
* get(): Find and remove the highest-priority (smallest) value
* put(Object value): Add something
Failed idea for put(): Start at the top and work your way down
New idea for put(): Put it at the right end of the lowest level
and clean up after yourself
* Warning! How do we find "the right end of the lowest level"
* To "clean up after yourself" (that is, restore the heap property), repeatedly swap a value with the larger value above it.
* Time: O(logn)
Idea for get()?
* Failed: Repeatedly pick the smallest of the subtree values and swap it up
* Problem: Leaves a "hole" at an unexpected place; no longer nearly-complete
* Idea from before: FIRST worry about near-complete; then worry about heap
* Take the rightmost element on the last level and shove it at the root
* Repeatedly swap down with the smaller of the subtrees
* Time O(logn)
How do we find "the righmost value on the lower level?"
* Idea: Store our heaps differently!
* Instead: Store our heaps in an array
* Count the values in the heap (starting at 0) across each level. The "count" of a value is its index in the array
* Add a new node at index "number of things"
* How do you get the index of the subtree (for swapping down)?
left subtree: 2*myindex + 1
right subtree: 2*myindex + 2
* How do you get the index of the supertree (for swapping up)?
roundDown((myindex-1)/2)
* Sorting algorithm:
* Shove everything into a heap
* Read it off in order
* Cost O(nlogn)
Clever hack: It is possible to turn an array into a heap "in place"
Friday: Dictionaries (like arrays, but indexed by strings)