Fork and clone the repository at https://github.com/Grinnell-CSC207/lab-hashtables..
Driver: A
Scan through the code for probed hash tables so that you understand what methods are available and what approaches are used. Make notes on areas that are likely to be problematic.
a. What methods are not yet implemented?
b. What parts of the code are likely to have problems? Why?
Driver: A
As you may have noted, the code for set
does not check to see if the key is already being used. The introductory notes therefore observe that this is a potential bug that should be fixed.
a. Why is the failure to check whether the key is already used a potential bug? What effect might that failure to check have on the behavior of the program?
b. Fill in the body of repeatedSetExpt
with some experiments that help you see this effect.
c. If you’d like, you can start with the following simple set of experiments.
htab.reportBasicCalls(true);
htab.set("alpha", "alpha");
htab.dump(pen);
htab.set("beta", "beta");
htab.dump(pen);
htab.set("bravo", "bravo");
htab.dump(pen);
htab.set("beta", "max");
htab.dump(pen);
htab.reportBasicCalls(false);
checkGet(pen, htab, "bravo");
pen.println();
d. Check the end of the lab for a description of the potential bug that we hope you observed.
e. Correct the bug.
Driver: B
As you may have noted, the code assumes that find
returns a cell that has a pair with a matching key.
a. Is that the case? Why or why not?
b. In HashTableExpt
, uncomment the call to matchingKeysExpt
to see what happens when two keys hash to the same location.
c. Squash that second bug by updating get
to check whether the key in the given cell matches the expected key.
Driver: B
As you may have noted, the code makes no attempt to deal with collisions. Hey, it’s even in the “bugs to squash” section. (Some of you may have been tempted to fix the bug in a previous exercise.)
a. Uncomment the call to setExpt
so that you can see other potential effects of the unimplemented collision detection.
b. Update find
to use linear probing. If the current cell is full, and the keys don’t match, try the cell that is PROBE_OFFSET
away from the current cells (modulo the capacity of the table). For example, if the table capacity is 20, the hash code gives us position 3, and the offset is 6, you should look in positions 3, 9 (3 + 6), 15, 1 (21 mod 20), 7, ….
Driver: B
Here’s a subtle bug that many programmers miss: For some combinations of table capacity and offset, we may cycle back before we’ve looked at every cell in the table. For example, if the table capacity is 20, the hash code gives us position 3, and the offset is 5, we would look in positions 3, 8, 13, 18, 3 (23 mod 20), 8, 13, 18, …. So, even if the table is only 20% full, we might miss the empty cells.
How do we fix this problem? There are at least three approaches.
Which do you prefer? Be prepared to explain your decision.
We’ll come back and implement one of these choices later.
Driver: A
As the previous exercise reminded us, at some point we have to figure out how to expand the table. Most frequently, we expand the table when it reaches a certain percentage full. (That percentage is given by LOAD_FACTOR
in the current implementation.)
Here’s a sketch of a technique some students use to expand the table. (It’s a sketch, in part, because I haven’t checked that the code compiles correctly.)
newCapacity = this.pairs.length * 2 + random(20);
// Create a new table of that capacity
this.pairs = new Object[newCapacity];
// Move all the values from the old table to their appropriate
// location in the new table.
for (int i = 0; i < old.length; i++) {
this.pairs[i] = old[i];
} // for
a. What do you think about this approach? (Don’t critique the failure to use Arrays.copyElts
, or whatever it’s called.)
b. Try adding these lines to expand
and resolve any syntax errors. Shrink the initial capacity of the array a bit so that we know expand
gets called. Run the setExpt
method to see whether this technique for expansion works successfully.
c. Correct any errors that you’ve identified.
d. You may have noted that one approach we came up with for cycle problem associated with linear hashing problem was to make sure that the table capacity is not a multiple of the probe offset. Update your code to ensure this result. (Since we’ve made the probe offset a prime, all you have to do is to make sure that the table capacity is not a multiple of the probe offset.)
In case you’ve forgotten: The cycle problem is that we might cycle back to the same index without visiting every valid index. For example, if the probe offset is 3, the table has size 9, and we start at position 2, we’ll look at positions 2, 5, 8, and then 2 again. So, we’ll never see most of the positions in the table.
Driver: B
We’re making good progress in our implementation of hash tables. What’s next? We should add support for removing elements. Unfortunately, as we’ve learned, removing elements is often the most difficult aspect of implementing data structures. So let’s think a bit about how we might approach the problem.
a. Write some experiments that allow us to see what happens when we remove elements.
b. Here’s one approach to removing elements: Find the index of the key/value pair. Put a null
there to mark it as unused. Decrement the size field.
What do you think about this approach?
c. Implement the approach and see what happens.
d. Hopefully, you’ve found that this is a dangerous approach, since it breaks important assumptions for linear probing. Come up with an alternate approach and discuss it with your teacher or class mentor.
Driver: A
Let’s take a short break from thinking about how to remove elements and consider another issue. The containsKey
method is implemented a bit strangely.
a. Read the code for containsKey
.
b. Do you expect this approach to work? Why or why not?
c. Conduct a few experiments to check your conclusion.
d. Rewrite containsKey
to use a more sensible approach.
Driver: A
Here’s a potentially better mechanism for removing elements: Find the index of the key/value pair. Set the key to null. Decrement the size field.
This approach is likely to significantly affect the find
method, which now has to skip over the cell when searching but should probably return the cell if the search fails. Why? Because we use find
for two reasons: to look things up for get
and containsKey
, and to find an empty location for use with set
. For the observers, it’s okay that we look until we reach the end. However, for set
, we really want to use the first available spot.
Implement this approach, updating find
, set
, get
, and anything else you deem appropriate.
If you find that you have extra time, implement the iterator.
The bug that you should have observed is that when we use set
to
update the value associated with a key, the size of the table still
increases. It should not.
For this exercise, you need not deal with the other conflicts.