CSC152 2005S, Class 33: Heaps and Heap Sort Admin: * Welcome back * Please submit your homework via email by 5 p.m. today * Or a protocol or a strategy or otherwise solve the problem * Exam distributed later this week, due Friday the 15th Overview: * Review: Linear Structures and Priority Queues * Applying "Divide and Conquer" to data structures * Terminology * Heaps * Implementing heaps * Heapsort Review: What is a linear structure? * "Data like in a row" (is that right?) * "Data like in a sequence where you, uh, don't have more than one piece of data at any given level or perhaps all of your data at any given level or perhaps I should stop now." * When describing an abstract data type, what do we talk about? * The purpose: What you want it to do * The methods: What it does in slightly more formal notation * The application: What you want to use it for * The data structures used to implement it * Purpose of Linear Structures: Gather data "in order" * Key methods: * put: Add a piece of data * get: Remove the next piece of data "in order" (policy) * Other useful methods * peek: get without removing * isEmpty: To check precondition for get and peek * Applications: Vary * Implementations: Depends on the particular linear structure, but usually "linked nodes" or an array A priority queue is a linear structure with the policy: * Return the smallest thing next (determined by compareTo or compare) How can we implement priority queues? * As a sorted linked-structure * put is O(n) b/c we have to look at everything to find the right place * get is O(1) * As an unsorted array * put is O(1) to shove it at the end * get is O(n) b/c we have to look at everything to determine smallest * As a sorted array * O(n) to add (you'll have to shift stuff over to make room) * O(n) to get (you'll have to shift stuff over to remove room) * Whoops Can we do better? /Doing Better: Divide and Conquer/ * In algorithms: Given a large collection of data, divide the collection in half, process one or both halves, and combine your results * Binary trees * Binary search * In data structures: Given a large collection of data, reprseent the collection as a combination of two "half collections" * Represent each half collection as two halves * Etc. * This data structure is typically called a tree /Terminology for Trees/ * The thing that joins the top-level parts is the "root" * Each part is called a "subtree" (or child) * The individual parts of the tree are called "nodes" * The nodes with no subtrees are called "leaves" * The distance from the root to a node is the "level" of the node * The greatest level of any leaf is the "depth" of a tree * In a "binary" tree, each node has 0, 1, or 2 subtrees * In a "complete" tree, all the leaves are at the same level * Complete trees are relatively shallow * A complete tree with k levels has 1 + 2 + 4 + ... + 2^(k-1) + 2^k = 2^(k+1)-1 nodes * A complete tree with n nodes has depth approximately log_2(n) /Heaps/ * A heap is a binary tree that (a) is nearly complete: Leaves may be at two subsequent levels, but "shoved to the left" at the final level (b) Has the "heap property": The value in a node is smaller than or equal to the values of its children Why are heaps useful? Because they're good for priority queues * The root is the smallest thing * Heaps are shallow (depth is O(log_2(n))), so if we can make put and get dependent only on depth, we have fast put and get How do you "put" into a heap? * Two key issues: * Need to maintain/restore near-completeness * Need to maintain/restore heap property * Solution: Put to maintain near-completeness (at the right end of the lowest level), then "swap up" to restore heap property Adding to end of lowest level: O(1) Swapping up: O(depth) = O(log_2(n)) How do you "get" from a heap? * The root is the smallest thing, so remove it * Shove the rightmost thing at the last level in the root * Swap down: Repeatedly swap with smaller of children (provided you're larger than the smaller child) * Removing O(1) * Mvoing rightmost O(1) * Swapping down O(log_2(n)) Also gives us a cool sorting routine: * To sort a buncha stuff: * Put everything into the heap O(n*log_2(n)) * Remove it in order O(n*log_2(n))