CSC152 2005S, Class 33: Heaps and Heap Sort
Admin:
* Welcome back
* Please submit your homework via email by 5 p.m. today
* Or a protocol or a strategy or otherwise solve the problem
* Exam distributed later this week, due Friday the 15th
Overview:
* Review: Linear Structures and Priority Queues
* Applying "Divide and Conquer" to data structures
* Terminology
* Heaps
* Implementing heaps
* Heapsort
Review: What is a linear structure?
* "Data like in a row" (is that right?)
* "Data like in a sequence where you, uh, don't have more than one piece of data at any given level or perhaps all of your data at any given level or perhaps I should stop now."
* When describing an abstract data type, what do we talk about?
* The purpose: What you want it to do
* The methods: What it does in slightly more formal notation
* The application: What you want to use it for
* The data structures used to implement it
* Purpose of Linear Structures: Gather data "in order"
* Key methods:
* put: Add a piece of data
* get: Remove the next piece of data "in order" (policy)
* Other useful methods
* peek: get without removing
* isEmpty: To check precondition for get and peek
* Applications: Vary
* Implementations: Depends on the particular linear structure, but usually "linked nodes" or an array
A priority queue is a linear structure with the policy:
* Return the smallest thing next
(determined by compareTo or compare)
How can we implement priority queues?
* As a sorted linked-structure
* put is O(n) b/c we have to look at everything to find the right place
* get is O(1)
* As an unsorted array
* put is O(1) to shove it at the end
* get is O(n) b/c we have to look at everything to determine smallest
* As a sorted array
* O(n) to add (you'll have to shift stuff over to make room)
* O(n) to get (you'll have to shift stuff over to remove room)
* Whoops
Can we do better?
/Doing Better: Divide and Conquer/
* In algorithms: Given a large collection of data, divide the collection in half, process one or both halves, and combine your results
* Binary trees
* Binary search
* In data structures: Given a large collection of data, reprseent the collection as a combination of two "half collections"
* Represent each half collection as two halves
* Etc.
* This data structure is typically called a tree
/Terminology for Trees/
* The thing that joins the top-level parts is the "root"
* Each part is called a "subtree" (or child)
* The individual parts of the tree are called "nodes"
* The nodes with no subtrees are called "leaves"
* The distance from the root to a node is the "level" of the node
* The greatest level of any leaf is the "depth" of a tree
* In a "binary" tree, each node has 0, 1, or 2 subtrees
* In a "complete" tree, all the leaves are at the same level
* Complete trees are relatively shallow
* A complete tree with k levels has 1 + 2 + 4 + ... + 2^(k-1) + 2^k = 2^(k+1)-1 nodes
* A complete tree with n nodes has depth approximately log_2(n)
/Heaps/
* A heap is a binary tree that
(a) is nearly complete: Leaves may be at two subsequent levels, but "shoved to the left" at the final level
(b) Has the "heap property": The value in a node is smaller than or equal to the values of its children
Why are heaps useful? Because they're good for priority queues
* The root is the smallest thing
* Heaps are shallow (depth is O(log_2(n))), so if we can make put and get dependent only on depth, we have fast put and get
How do you "put" into a heap?
* Two key issues:
* Need to maintain/restore near-completeness
* Need to maintain/restore heap property
* Solution: Put to maintain near-completeness (at the right end of the lowest level), then "swap up" to restore heap property
Adding to end of lowest level: O(1)
Swapping up: O(depth) = O(log_2(n))
How do you "get" from a heap?
* The root is the smallest thing, so remove it
* Shove the rightmost thing at the last level in the root
* Swap down: Repeatedly swap with smaller of children (provided you're larger than the smaller child)
* Removing O(1)
* Mvoing rightmost O(1)
* Swapping down O(log_2(n))
Also gives us a cool sorting routine:
* To sort a buncha stuff:
* Put everything into the heap O(n*log_2(n))
* Remove it in order O(n*log_2(n))