CSC-152 98S : Outline of Class 46: Dictionaries and Hash Tables

Dictionaries

We've talked somewhat abaout dictionaries in a previous class. Recall that dictionaries are a lot like arrays that can be indexed by objects instead of numbers.

How can we implement dictionaries?

We could define a Pair object that joins a key and value.

We could make an unordered array, vector, or list of those objects.

To insert an element, we put it at the beginning or end of the array, vector or list. This is generally O(1), depending on the structure we use.
To look up an element, we step through the pairs one by one, comparing the key we are searching for to the key in each pair. This is O(n), where n is the number of elements.

We could make an ordered array of those objects (an ordered array being one that is always sorted).

To insert an element, we put it at the appropriate place, shifting as necessary. This is O(n) and is again limited by the size of the array.
For lookup an element, we do binary search on the sorted array.

Another strategy for building dictionaries is to use search trees. These are binary trees that are ordered in a particular way. Everything to the left of a node is smaller than or equal to the node. Everything to the right is greater than or equal to the node.

To find an element in a search tree, you compare the object you're looking for to the current node. If the object you're looking for is smaller, you follow the left branch. If the object you're looking for is bigger, choose the right branch.

We can use a similar process to insert an element.

If we can make the tree relatively balanced, this gives an O(log_2(n)) algorithm. Unfortunately, keeping the tree balanced is difficult. That is a topic for another course.

Hash Tables

Surprisingly, if you're willing to sacrifice some space and increase your constant, it is possible to build an expected O(1) dictionary.

How? By using an array, and numbering your keys in such a way that

all numbers are between 0 and array.length-1
no two keys have the same number (or at least few have the same number).

If there are no collisions, the system is simple

To insert a value, determine the number corresponding to the key and put it in that place of the array. This is O(1+cost of finding that number).
To lookup a value, determine the number corresponding to the key and look in the appropriate cell. This is O(1+cost of finding that number).

Implementations of dictionaries using this strategy are called hash tables.

The function used to convert an object to a number is the hash function.

To better understand hash tables, we need to consider

The hash functions we might develop.
What to do about collisions.

The goal in developing a hash function is to come up with a function that is unlikely to map two objects to the same position.

Now, this isn't possible (particularly if we have more objects than positions).
We'll discuss what to do about two objects mapping to the same position later.

Hence, we sometimes accept a situation in which the hash function distributes the objects more or less uniformly.

It is worth some experimentation to come up with such a function.

In addition, we should consider the cost of computing the hash function. We'd like something that is relatively low cost (not just constant time, but not too many steps within that constant).

We'd also like a function that does (or can) give us a relatively large range of numbers, so that we can get fewer collisionss by increasing the size of the hash table.

We might want to make the size of the table a parameter to the hash function.
We might strive for a hash function that uses the range of positive integers, and mod it by the size of the table.

[Instructions] [Search] [Current] [Changes] [Syllabus] [Handouts] [Outlines] [Labs] [Assignments] [Examples] [Bailey Docs] [SamR Docs] [Tutorial] [API]

Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.

This page may be found at http://www.math.grin.edu/~rebelsky/Courses/CS152/98S/home/rebelsky/public_html/Courses/CS152/98S/Outlines/outline.46.html

Source text last modified Tue Jan 12 11:52:31 1999.

This page generated on Mon Jan 25 09:50:00 1999 by SiteWeaver. Validate this page.

Contact our webmaster at rebelsky@math.grin.edu

Outline of Class 46: Dictionaries and Hash Tables

Administrivia

Dictionaries

A Simple Implementation

Search Trees

Hash Tables

Hash Functions