EBoard 10: BSTs, revisited
Approximate overview
- Admin
- Tree basics & terminology
- Trees, formalized
- Binary trees
- Traversing trees
- Binary search trees
- The Recurrence Formula Theorem
Administrative stuff
- I forgot my hearing aids again.
- Random assignment through magic cards.
- Advance notice: Prof. Eikmeier will be teaching balanced trees on
Monday and Wednesday next week.
- Enough people had trouble with assignment 3 that I’m giving you
another week.
- Problem 2 was complex, and Scheme is less familiar.
- Problem 1 should have been more straightforward, given that
you’ve already implemented Quicksort in Java. (At least
I hope you have.)
- I’m also trying to make assignment four (also due Thursday) much shorter.
- If you finished assignment 3, you’ll have a quieter week.
Upcoming token-generating activities
- Football, 1pm, Saturday
- Grinnell Vegans Potluck, Saturday, 7pm. Email or DM the person
who is hosting.
Upcoming other activities
Upcoming work
- Think about how to add something to a BST and to remove something.
- Read sections 5.1–5.5 and 6.1–6.2. That finishes book 1.
- Assignment 3 due next Thursday.
- Assignment 4 due next Thursday.
Q&A
If I offered up a token as sacrifice for a late assignment 3, does it come
back?
Yes.
Trees, introduced
Talk with your group about the following questions.
Is “Tree” an ADT (Abstract Data Type) or a Data Structure? Both?
Neither?
Detour: List is an ADT (we are organizing elements for iteration
and support the following operations: …). LinkedList or ArrayList
is a Data Structure (each element has a link to the next; the elements
are stored in an array)
- ADT: Almost every one. “When I think about a tree, I think about
the general structure and what it can provide, rather than how
I implement it.” (You can store binary trees in arrays or as
pointers.)
- Data Structure: We conceive of them as linked structures (and draw
them that way), and we use them to implement ADTs, such as using
BSTs to implement dictionaries.
A conclusion: Maybe it’s a not-useful distinction, but I appreciate that
you all said ADT.
Definition of lists
- null is a list.
- if x is an X and L is a list of X’s, then cons(x, L) is a list of X’s.
Basic definition
We will denote “a tree of values of type X” as Tree.
- If x is an X,
Leaf(x) is a Tree
- If x is an X and T1 … Tn are all Trees, then so is
`Node(x,T1,T2,...,Tn)`.
Empty leaves
We will denote “a tree of values of type X” as Tree.
Leaf is a Tree
- If x is an X and T1 … Tn are all Trees, then so is
`Node(x,T1,T2,...,Tn)`.
Values only at leaves
- If x is an X, Leaf(x) is a Tree
- If T1 … Tn are all Trees, then so is Node(T1,T2,...,Tn).
Summary
- There are at least three common ways people think of trees.
- They are all usually defined recursively
- They differ in how one thinks of the leaves and interior nodes.
(Do leaves have values or are they NULLs?) (Do nodes have values?)
- How you implement your algorithm is affected by how you think of
your tree.
Terminology
Are there any of these terms that you don’t know?
Binary trees
Root
Node
Arity: Is the number of children a node has.
Leaf
Parent
Child
Subtree
Path
Depth (of node): Distance from the root.
Height (of tree): Depth of the furthest node.
Level
Ancestor - Generalization of parent
Descendent - Generalization of child. It’s a property. (“a is a
descendent of “b). (For all a s.t. a is a descendent of b …)
Forest - A collection of trees
Complete tree - All internal nodes have all children.
BSTs
A binary search tree (BST) is a tree (usually binary) of values
(or key/value pairs) with the property that
- The keys of all the nodes in the left subtree are less than the key
of the root.
- The keys of all the nodes in the right subtree are greater than the key
of the root.
- The same property applies for all subtrees.
- We assume each key appears only once.
Detour: Dictionaries
Dictionary (also known as “Table”, “Hash”, and “Map”) is an ADT in which
we associate values with keys. Traditionally a Dictionary<K,V> implements
two primary methods:
set(K key, V value) - Remember that value is associated with key
get(K key) - Retrieve the last value associated with key
Detour: Why BSTs?
Group discussions
Given that a balanced BST is still O(logn) for set and get, while a
hash table is expected amortized O(1), why would I use BSTs rather than
hash tables to implement dictionaries?
Some notes …
- Expected = usually
- Amortized = (add all costs) then divide by all operations.
- Hash tables sometimes have to get larger. That’s expensive.
But if we spread out that cost over all the operations (and
expand sensibly, it’s slow).
Some answers …
- BSTs have a natural ordering, so, say, if we wanted to iterate the
elements in increasing order, it’s useful to do so.
- Also for finding “nearby” elements in the ordering.
- Next largest thing is leftmost part of right tree.
- Next smaller thing is rightmost part of left tree.
- It seems that hash tables waste space; for hash tables to work well,
we generally need about half of the cells to be empty. (Depending
on how you store your BSTs, the pointers may make up for the empty
cells.)
- There is some debate about which is easier to iterate.
- Amortized time is not acceptable in all cases. (Slow predictability
beats fast amortized unpredictability.)
- It can be hard to come up with good hash functions.
On the other hand …
- BSTs do require comparable/orderable elements.
- And they are slower.
Traversing trees
Group discussions
There are approximately eight standard orderings for traversing binary trees.
What are they?
- Breadth-first: Visit all of the values on each level before progressing
to the next.
- Left-to-right vs Right-to-left
- Top-to-bottom vs Bottom-to-top
- That’s four!
- Depth-first: Visit all of one subtree before visiting the other subtree
- Preorder: Visit/process/print parent before children
- Inorder: Visit/process/print one subtree , then parent, then other subtree
- Postorder: Visit/process/print both subtrees before parent
- Left-to-rigth or right-to-left
- That’s six!
- Generally, we do top-down left-to-rigth breadth-first or one of the
three left-to-right depth-first.
How do you implement breadth-first, top-down, left-to-right iteratively?
- add the root to a queue
- While the queue is not empty …
- Grab the first thing in the queue
- Process it
- Add its left child (if it has one)
- Add its right child (if it has one)
How do you implement depth-first, preorder-left-to-right iteratively?
- add the root to a stack
- While the stack is not empty
- Grab the first thing in the stack
- Process it
- Add its right child (if it has one)
- Add its left child (if it has one)
You should be able to write generic code for traversal (but I’m not going
to make you do so).
- A general way to express T(n) <= a*T(n/b) + O(n^d)
- I don’t know why c doesn’t appeaer.
- We want to have a formula for the value of T(n). As you’d expect,
it depends on a, b, and d.
- There are three basic possibilities.
- Case 1: T(n) is in O((n^d) logn) if a = b^d
- Case 2: T(n) is in O(n^d) if a < b^d
- Case 3: T(n) Is in O(n^(logb(a))) if a > b^d
- Does this match our analyses from earlier?
Let’s try it
- T(n) <= 2T(n/2) + n. a = 2, b = 2, d = 1. Case 1: O(nlogn).
- T(n) <= 2T(n/2) + c. a = 2, b = 2, d = 0. Case 3: O(n^(log2(2))), so O(n)
- T(n) <= 2T(n/2) + n^2. a = 2, b = 2, d = 2. Case 2: O(n^2)
Proving the theorem
- How might/do we prove this?