Held: Wednesday, 3 November 2004
Summary:
Today we consider heaps, an important implementation of priority
queues, and their use in sorting values.
Related Pages:
Assignments
Notes:
- I've made some changes in the syllabus. Let me know if you have questions or comments.
- Are there final questions on the exam?
- No homework for Friday b/c you are working on the exam (and the project).
Overview:
- Trees.
- Tree Terminology.
- Heaps.
- Implementing Heaps.
- We can use the divide and conquer to design this data structure.
- In algorithms, we divide the problem up into parts.
- In data structures, we can build structures that naturally have
two halves.
- This divided structure
looks
something like a tree (okay, an
upside-down tree or a
unisex family tree). Because each node has two subtrees, we
call it a binary tree.
- We might express that technique with the following Java class
public class BinaryTreeNode
{
// +--------+--------------------------------------------------
// | Fields |
// +--------+
/** The value associated with this node. */
private Object value;
/**
* Half of the remaining elements. Set to null if there are
* no other elements.
*/
private BinaryTreeNode left;
/**
* The other half of the remaining elements. Set to null
* if there are no other elements.
*/
private BinaryTreeNode right;
/**
* The comparator used to determine priorities.
*/
private Comparator prioritize;
// ...
} // BinaryTreeNode
- Trees are common data structures. We'll revisit them a few
times in the coming weeks.
- Heaps are a kind of tree. Hence, it is important that we consider
some basic tree terminology
- The root is the top or beginning of the tree.
- A node is a part of the tree. (While this has the same
name as the nodes we often use to implement trees and lists, you should
think of it as independent of implementation.)
- Most nodes have one or more children.
- Each node other than the root has a parent.
- Nodes without children are called leaves.
- Nodes with children are called interior nodes.
- The level of a node is the number of steps from root to
that node.
- The root is at level 0.
- The direct children of the root are at level 1.
- The children of those nodes are at level 2.
- ...
- The depth of a tree is the largest level of any node in the
tree.
- The size of a tree is the number of nodes in the tree.
- In a binary tree, no node has more than two children.
- The children are typically designated as left and
right.
- In a complete tree, every level is full (all the interior
nodes have the maximum number of children).
- We'll return to trees in the weeks to come. For today, we'll stick
with the simple heaps we've just defined.
- Here's a technique for deleting the highest-priority element:
- Remove that element. This creates a hole at the root of the
tree.
- Put highest-priority root of a subtree in that hole,
creating a hole in that subtree.
- Recurse on the subtree.
- This also gives us a method for adding elements:
- Find a hole at the bottom of the tree.
- Put the element there.
- If it has a higher priority than the value above it, swap
them (again and again and again).
- How long do the operations take?
- How can we make the operations more efficient? (What properties of
the tree help?)
- The structures we just described are similar to a traditional
implementation of priority queues known as the heap.
- Heaps are a particular form of binary tree designed to provide
quick access to the highest-priority element in the tree.
- Unlike our structures, heaps must be balanced (it's part of the
definition).
- A heap is
- a binary tree,
- that is nearly complete in that
- at most one node has one child (the rest have zero or two)
- the nodes on the last level are at the left-hand-side of the
level (that is, if a node has one or two children, all of its left
siblings have two children)
- and that has the heap property: the value stored in
each node is of higher priority (lower value) than the values stored below it.
- An essential aspect of heaps is the heap property.
- We can talk about a global heap property (that the value stored at
one node in the tree is of higher priority than anything stored
below it.
- We can also speak about a local heap property (that the value stored
at one node is of higher priority than the two values stored directly
below it).
- If the local heap property holds everywhere in the tree, then the
global heap property holds everywhere in the tree.
- Consider the path; we can only be getting larger.
- How do we modify heaps? Through insertion (
put
) and deletion (get
).
- How do we do insertion while maintaining the two key properties?
- We'll maintain one property (near-completeness) and then look
to restore the other (heap order).
- It turns out that we add and delete in slightly different ways
than we suggested above if we want to maintain near-completeness.
- When we add an element to the heap, we just need to put the element at
the end of the lowest level.
- If that level is full, we put it at the left end of the next level.
- After removing the smallest element from the heap, we have a
hole
at the top of the heap. We put the last element in the last level in
that hole (which may sound like an odd idea, but at least it maintains
the near-completeness property).
- Both of these operations may violate the heap order (in fact, the second
one is almost guaranteed to violate heap order), so we need to rearrange
the tree.
- This rearrangement is often called percolating. When we put
an element at the bottom, we percolate up. When we put an
element at the top, we percolate down.
- To add an element to a heap we,
- put the element at the end of the lowest level and
- percolate up.
- Percolating up is fairly simple
- At each step, we compare the percolated node to its parent.
- If the percolated node is smaller (violating the heap property), we swap
the two and continue up the tree.
- Otherwise, we're done, since the heap property is no longer violated.
- When we do the swap,
- the subtree that contains the old parent is clearly in heap order
(the old parent was an ancestor to all the nodes in that
subtree, and therefore smaller) and
- the present node is clearly smaller
than both of its new subtrees (it's smaller than the old parent, and
the old parent was smaller than everything else below it).
- Eventually, we stop (either because we no longer violate heap property
or because we reach the root).
- Here's an example, based on inserting the values 5, 6, 4, 4, 7, 2
- How much time does this take? Well, the depth of a complete binary
tree with n nodes is O(log2n), and the algorithm may
require swapping up from leaf to root, so the running time is also
O(log2n).
- When considering lists and other structures, we found ways to implement
the structures with both arrays and nodes.
- It seems likely that we can implement heaps with special nodes
(as we did at the beginning of class).
- Can we also implement heaps with arrays?
- It turns out to be relatively easy to implement binary heaps and
other binary trees, particularly complete binary trees with arrays.
- How?
- This provides a very convenient way of figuring out where children belong.
- The root of the tree is in location 0.
- The left child of an element stored at location i can
be found in location 2*i+1.
- The right child of an element stored at location i can
be found in location 2*i+2 (also representable as
2*(i+1)).
- The parent of an element stored at location i can be
found at location
floor
((i-1)/2).
- Can we prove all this? Yes, but that's an exercise for another day.
- These properties make it simple to move the cursor around the tree
and to get values.
- Note that we have an interesting
double indirection
here.
- We've decided to implement priority queues with this divide-and-conquer
structure.
- We've decided to implement this divide-and-conquer structure with
arrays.
- That is, we've given an implementation of an implementation of a
data structure.
- We can use the heap structure to provide a fairly simple and quick
sorting algorithm. To sort a set of n elements,
- insert them into a heap, one-by-one.
- remove them from the heap in order.
- What's the running time?
- There are n insertions. Each takes O(log2n) time by our
prior analysis.
- There are n
delete least
operations.
Each takes O(log2n) by our prior analysis.
- Hence, we've developed yet another O(nlog2n)
sorting algorithm.
- Can we do this in place (provided, of course, that our original information
was in an array)? You'll need to think about it.
- Most people implement heap sort with array-based heaps. Some even
define heap sort completely in terms of the array operations, and
forget the origins.
- Here's the core of heap sort.
for (int i = 1; i < stuff.length; ++i) {
percolateUp(stuff[i]);
} // for
- You still need to extract the values from the heap.