CS152 2004F : Class 40: Dictionaries (3): Hash Tables

Class 40: Dictionaries (3): Hash Tables

Held: Wednesday, 10 November 2004

Summary: Today we consider one of the most important implementations of dictionaries, Hash tables.

Related Pages:

Overview:

The Idea of Hashing.

Hash Functions.

Hashing in Java.

Handling Hashing Conflicts.

Hash Tables

As you may have noticed, although dictionaries are a lot like arrays (except that you index dictionaries by object and arrays by number).

However, in terms of implementation, that claim hasn't held.

The unordered list implementation has O(n) set and O(n) get.
The ordered array implementation has O(n) set and O(log₂n) get.
The binary search tree implementation has O(logn) set and O(logn) get, if we're lucky.

The corresponding array operations are O(1).

Is there a dictionary implementation with O(1) get and set?

Surprisingly, if you're willing to sacrifice some space and increase your constant, it is possible to build a dictionary that is likely to have O(1) get and set.

How? Well, we know that arrays provide O(1) get and set, so use arrays.

How do we use an array? We "number" the keys in such a way that

all numbers are between 0 and array.length-1
no two keys have the same number (or at least few have the same number).

If there are no collisions (keys with the same number), the system is simple

To set a value, determine the number corresponding to the key and add it in that place of the array. This is O(1+cost of computing that number).
To get a value, determine the number corresponding to the key and look in the appropriate cell. This is O(1+cost of finding that number).

Implementations of dictionaries using this strategy are called hash tables.

The function used to convert an object to a number is the hash function.

To better understand hash tables, we need to consider

The hash functions we might develop.
What to do about collisions (when two keys map to the same index).

Hash Functions

The goal in developing a hash function is to come up with a function that is unlikely to map two objects to the same position.

Now, this isn't possible (particularly if we have more objects than positions).
We'll discuss what to do about two objects mapping to the same position later.

Hence, we sometimes accept a situation in which the hash function distributes the objects more or less uniformly.

It is worth some experimentation to come up with such a function.

In addition, we should consider the cost of computing the hash function. We'd like something that is relatively low cost (not just constant time, but not too many steps within that constant).

We'd also like a function that does (or can) give us a relatively large range of numbers, so that we can get fewer collisions by increasing the size of the hash table.

We might want to make the size of the table a parameter to the hash function.
We might strive for a hash function that uses the range of positive integers, and mod it by the size of the table.

What are some hash functions you might use for strings?

Sum the ASCII values in the string
constant1*firstLetter + constant2*secondLetter
...

An Exercise in Hashing

Let's try an exercise. We'll come up with a hash value for everybody's first name. We'll then put things in the hash table.

We'll use sum the values of the letters in the name.

We'll use the following table:

A: 1   F: 6   K: 11  P: 16  U: 21  Z: 26
B: 2   G: 7   L: 12  Q: 17  V: 22
C: 3   H: 8   M: 13  R: 18  W: 23
D: 4   I: 9   N: 14  S: 19  X: 24
E: 5   J: 10  O: 15  T: 20  Y: 25

There are about 10 of you in the class. Typically, our hash tables are somewhat bigger than the size of the collection we're working with, so we'll use a hash table of size 20.

Once you've computed your result, mod it by 20 (compute the remainder after dividing by 20

For my first name (Samuel),

The sum is 19 (S) + 1 (A) + 13 (M) + 21 (U) + 5 (E) + 12 (L) = 71

The index is therefore 11.

Is this a good hash function?

Why?
Why not?

Hashing in Java

Hash tables are so useful that Java includes them as a standard library class, java.util.Hashtable.

Let's look over the

documentation for java.util.Hashtable.

Why are there three constructors?

What methods are there other than get and set?

Where's the hash function?

Answer: Every class is expected to provide its own.

Handling Conflicting Keys

Since we typically have many more possible keys than places in the array, we will eventually hit a point in which two keys hash to the same value.

What should we do when this conflict happens?

It is not reasonable to disallow the second key, since it's a different key.

We could put a list of key/value pairs, rather than a singleton, at every position.

We could step through the array until we find an empty space.

Removing Elements from Hash Tables

Our analysis of Hash Tables to date has been based on two simple operations: get and set.

What happens if we want to remove elements? This operation can significantly complicate matters.

If we've chosen the shift into a blank space technique for resolving collisions, what do we do when it comes time to remove elements?

Do we shift everything back? If so, think about how far we may have to look.
Do we leave the thing there as a blank? We might then then remove it later when it's convenient to do so.
Do we do something totally different?

Note also that there are different ways of specifying remove. We might remove the element with a particular key. We might instead remove elements based on their value. The second is obviously a much slower operation than the first (unless we've developed a special way to handle that problem - see if you can think of one).

[Skip to Body]
Primary: [Front Door] [Current] [Glance] - [Honesty] [Links] [Syllabus]
Groupings: [EBoards] [Examples] [Exams] [Handouts] [Homework] [Labs] [Outlines] [Readings] [Reference]
Misc: [SamR] [Java 1.4.2 API] [EIJ] [CS152 2000F] [CS153 2004S]

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed Dec 8 10:37:25 2004.
The source to the document was last modified on Thu Aug 26 20:22:24 2004.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2004F/Outlines/outline.40.html.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu