CSC152 2006S, Class 38: Dictionaries (2): Binary Search Trees Admin: * Are there questions on the exam? Please keep them short today. * The hint about dealing with removal in the random vector can be found in the work we did in the first day on priority queues. * Talk at 4:30 p.m. "Worthy Questions". EC. * EC for percussion ensemble at 2 p.m. * No reading for tomorrow. * I'll take notes for today in the eboard. Overview: * Classifying Data Structures and ADTs * Why Trees? * BST Basics * Insertion * Size Info * Dictionaries ---- Classifying Data Structures and ADTs * Set or Bag * No real relationship between elements. * Probably not appropriate for implementing dictionaries * Linear (Queues and Stacks and Lists) * One-to-one relationship between elements * Can be used to implement dictionaries. * Tree * One-to-many relationship between elements * Hierarchical structure? * Graph * Many-to-many relationship Why Trees? * Early days of CS: Faster way to search files * Contrast to linear/sequential search * Works * But inefficient * Particularly inefficient when you have to read from a tape or disks, which is slow. * General goals: * Dynamic * Efficient to search Binary Search Trees * Policy on structure * Everything in the left subtree is smaller than the root * Everything in the right subtree is larger than the root * Same applies for subtrees. * Example G / \ E K / \ / \ A F H P * Easy to search: Look at current node and then search in appropriate subtree. * These trees are binary: Nodes have a maximum of two children. * The height of a tree: The length of the longest path from a leaf to a root node. * Reminder: Leaf is a node with no children. * Reminder: Computer scientists are backwards - We draw trees upside-down * Important lesson: The order in which you insert values affects the structure of the tree * Three terms: * Full trees - Leaves on only one level, all other nodes have two children. (Sam calls these "Complete") * Balanced trees - Leaf nodes are on no more than two different levels (Sam says "A tree is balanced if (1) the depth of the left subtree differs by no more than one from the depth of the right subtree; (2) both subtrees are balanced.") * Complete - Leaves on two levels, leaves on last level are all at the left (Sam calls these "Nearly complete") Inserting elements into binary search tree * use same general strategy as searching; when you run out of tree, put it there. * Example: A E F G H K P A \ E \ F \ G \ H \ K \ P * Whoops! Not very balanced! * Strategy? If you get them sorted, put the middle element at the root, and then recurse on left and right subtrees. * What if the values aren't in order? * Sort them into an array * Build a tree and then rebalance it Other properties of trees * The number of nodes on level k is 2^k * The minimum height of a tree with n nodes is the height of the full tree * About floor(log_2(n)) * The maximum height of a tree with n nodes is n-1 Dictionaries * Basic methods: * void put(K key, V value) * V get(K key) * Also useful * remove(K key) * Observation: Key is somewhat arbitrary (at least unrelated to the value except by the dictionary) * How do we represent binary search trees? * A student suggests "in an array" * Some students remember the technique described in the exam (good students) left child of key/value pair is at 2*i+1 right child of key/value pair is at 2*i+2 * But this technique wastes a lot of space for not-very-balanced trees Alternative Implementation * Use a linked structure with references! Question: What is the association between the key and the value? * Not clear; Depends on the use of the dictionary * But keys need to be unique How do we remove from binary search trees? * Background question: How do we remove from heaps? * Grab the top thing * Take the last thing and put it at the root * Swap down * We do something much different with BSTs * If it's a leaf, we're done * What about interior nodes? * Replace with rightmost child of left subtree * (Or leftmost child of right subtree) * You then need to "slide up" the right child of that node. [Extended example that is hard to replicate in the eboard]