Algorithms and OOD (CSC 207 2014F) : Labs
Primary: [Front Door] [Schedule] - [Academic Honesty] [Disabilities] [Email] - [Learning Outcomes] [FAQ] [Teaching & Learning] [Grading] [Rubric] - [Calendar]
Current: [Assignment] [EBoard] [Lab] [Outline] [Reading]
Sections: [Assignments] [EBoards] [Examples] [Handouts] [Labs] [Outlines] [Readings]
Reference: [Student-Curated Resources] [Java 8 API] [Java 8 Tutorials] [Code Conventions]
Related Courses: [CSC 152 2006S (Rebelsky)] [CSC 207 2014S (Rebelsky)] [CSC 207 2014F (Walker)] [CSC 207 2011S (Weinman)]
Misc: [Submit Questions] - [SamR] [Glimmer Labs] [CS@Grinnell] [Grinnell] - [Issue Tracker (Course)] [Issue Tracker (Textbook)]
Summary: In this laboratory, you will explore some issues in the design and implementation of hash tables.
Fork and clone the repository at https://github.com/Grinnell-CSC207/hashtables.
Scan through the code so that you understand what methods are available and what approaches are used. Make notes on areas that are likely to be problematic.
a. What methods are not yet implemented?
b. What parts of the code are likely to have problems? Why?
As you may have noted, the code for set
does
not check to see if the key is already being used. The introductory
notes therefore observe that this is a potential bug that should be
fixed.
a. Why is the failure to check whether the key is already used a potential bug? What effect might that failure to check have on the behavior of the program?
b. Fill in the body of repeatedSetExpt
with
some experiments that help you see this effect.
c. If you'd like, you can start with the following simple set of experiments.
dict.reportBasicCalls(true); dict.set("alpha", "alpha"); dict.dump(pen); dict.set("beta", "beta"); dict.dump(pen); dict.set("bravo", "bravo"); dict.dump(pen); dict.set("beta", "bravo"); dict.dump(pen); dict.reportBasicCalls(false);
d. Correct the bug.
As you may have noted, the code assumes that find
returns a cell with a matching key.
a. Is that the case? Why or why not?
b. In HashTableExpt
, uncomment the call to
matchingKeysExpt
to see what happens when
two keys hash to the same location.
c. Squash that second bug by updating get
to check whether the key in the given cell matches the expected key.
As you may have noted, the code makes no attempt to deal with collisions. Hey, it's even in the “bugs to squash” section. (And, in some cases, you may have been tempted to fix the bug in a previous exercise.)
a. Uncomment the call to setExpt
so that
you can see other potential effects of the unimplemented collision
detection.
b. Update find
to use linear probing. If
the current cell is full, and the keys don't match, try the cell that
is PROBE_OFFSET
away from the current cells (modulo the
capacity of the table). For example, if the table capacity is 20, the hash
code gives us position 3, and the offset is 6, you should look in
positions 3, 9 (3 + 6), 15, 1 (21 mod 20), 7, ....
Here's a subtle bug that many programmers miss: For some combinations of table capacity and offset, we may cycle back before we've looked at every cell in the table. For example, if the table capacity is 20, the hash code gives us position 3, and the offset is 5, we would look in positions 3, 8, 13, 18, 3 (23 mod 20), 8, 13, 18, .... So, even if the table is only 20% full, we might miss the empty cells.
How do we fix this problem? There are at least three approaches.
Which do you prefer? Be prepared to explain your decision.
We'll come back and implement one of these choices later.
As the previous exercise reminded us, at some point we have to figure out
how to expand the table. Most frequently, we expand the table when it
reaches a certain percentage full. (That percentage is by
LOAD_FACTOR
in the current implementation.)
Here's a sketch of a technique some students use to expand the table. (It's a sketch, in part, because I haven't checked that the code compiles correctly.)
newCapacity = this.pairs.length * 2 + random(20); // Create a new table of that capacity this.pairs = new Object[newCapacity]; // Move all the values from the old table to their appropriate // location in the new table. for (int i = 0; i < old.length; i++) { this.pairs[i] = old[i]; } // for
a. What do you think about this approach? (Don't critique the failure
to use Arrays.copyElts
, or whatever it's called.)
b. Try adding these lines to expand
and
resolve any syntax errors. Shrink the initial capacity of the array a
bit so that we know expand
gets called. Run the
setExpt
method to see whether this technique
for expansion works successfully.
c. Correct any errors that you've identified.
d. You may have noted that one approach we came up with for cycle problem associated with linear hashing problem was to make sure that the table capacity is not a multiple of the probe offset. Update your code to ensure this result. (Since we've made the probe offset a prime, all you have to do is to make sure that the table capacity is not a multiple of the probe offset.)
In case you've forgotten: The cycle problem is that we might cycle back to the same index without visiting every valid index. For example, if the probe offset is 3, the table has size 9, and we start at position 2, we'll look at positions 2, 5, 8, and then 2 again. So, we'll never see most of the positions int he table.
We're making good progress in our implementation of hash tables. What's next? We should add support for removing elements. Unfortunately, as we've learned, removing elements is often the most difficult aspect of implementing data structures. So let's think a bit about how we might approach the problem.
a. Write some experiments that allow us to see what happens when we remove elements.
b. Here's one approach to removing elements: Find the index of the
key/value pair. Put a null
there to mark it as unused.
Decrement the size field.
What do you think about this approach?
c. Implement the approach and see what happens.
d. Hopefully, you've found that this is a dangerous approach, since it breaks important assumptions for linear probing. Come up with an alternate approach and discuss it with your teacher or class mentor.
Let's take a short break from thinking about how to remove elements
and consider another issue. The containsKey
method is implemented a bit strangely.
a. Read the code for containsKey
.
b. Do you expect this approach to work? Why or why not?
c. Conduct a few experiments to check your conclusion.
d. Rewrite containsKey
to use a more
sensible approach.
Here's a potentially better mechanism for removing elements: Find the index of the key/value pair. Set the key to null. Decrement the size field.
This approach is likely to significantly affect the
find
method, which now has to skip over.
the cell when searching. but should probably return the cell if the
search fails. Why? Because we use find
for two reasons: to look things up for get
and containsKey
, and to find an empty
location for use with set
. For the
observers, it's okay that we look until we reach the end. However,
for set
, we really want to use the first
available spot.
Implement this approach, updating find
,
set
, get
, and anything
else you deem appropriate.
If you find that you have extra time, try any of the following problems.
As we noted in our early discussions of dictionaries, there are at least three models for iterating over dictionaries:
Iterating over the values corresponds most closely to our traditional sense of iterators - they give us the values stored in the structure without information of how they are stored. For example, when we iterate over an array, we don't get information about where in the array each value is stored.
Implement an iterator over the values in a hash table.
Of course, it turns out that there are a variety of use cases for getting the keys in a dictionary, not least of which is printing key/value pairs. (Yes, iterating over key/value pairs would also work, but that may require us to create a new data type for the pairs.) We might also want to remove all elements whose key matches a certain criterion.
Implement an iterator over the keys in a hash table.