Exam 3: ADTs, Algorithms, and Data Structures

This examination is in draft form and will remain in draft form until class time on Wednesday, 30 April 2014.

Assigned: Monday, 28 April 2014

Electronic version due: 10:30 p.m., Wednesday, 7 May 2014.

Printed version due: 10:00 a.m., Friday, 9 May 2014.

Preliminaries

Exam Format

This is a take-home examination. You may use any time or times you deem appropriate to complete the exam, provided you return it to me by the due date.

There are four problems on this examination. You must do your best to answer all of them. The problems are not necessarily of equal difficulty. Problems may include subproblems. If you complete four problems correctly or mostly correctly, you will earn an A. If you complete three problems correctly or mostly correctly, you will earn a B. If you complete two problems correctly or mostly correctly, you will earn a C. If you complete one problem correctly or mostly correctly, you will earn a D. If you complete fewer than one problem correctly or mostly correctly, you will earn an F. If you do not attempt the examination, you will earn a 0. Partially correct solutions may or may not earn you a partial grade, depending on the discretion of the grader.

Read the entire examination before you begin.

I expect that someone who has mastered the material and works at a moderate rate should have little trouble completing the exam in a reasonable amount of time. In particular, this exam is likely to take you about six hours, depending on how well you've learned the topics and how fast you work. (When I do the problems, I will report how long each one took me.)

Academic Honesty

This examination is open book, open notes, open mind, open computer, and open Web. However, it is closed person. That means you should not talk to other people about the exam. Other than as restricted by that limitation, you should feel free to use all reasonable resources available to you.

As always, you are expected to turn in your own work. If you find ideas in a book or on the Web, be sure to cite them appropriately, even if you do not directly copy code. If you use code that you wrote for a previous lab or homework, cite that lab or homework and the other members of your group. If you use code that you found on the course Web site, be sure to cite that code. You need not cite the code provided in the body of the examination or on the GitHub site for the examination.

Although you may use the Web for this exam, you may not post your answers to this examination on the Web. (You certainly should not post them to a public GitHub repository.) And, in case it's not clear, you may not ask others (in person, via email, via IM, via IRC, by posting a “please help” message, or in any other way) to put answers on the Web.

Because different students may be taking the exam at different times, you are not permitted to discuss the exam with anyone until after I have returned it. If you must say something about the exam, you are allowed to say “This is among the hardest exams I have ever taken. If you don't start it early, you will have no chance of finishing the exam.” You may also summarize these policies. You may not tell other students which problems you've finished. You may not tell other students how long you've spent on the exam.

You must include both of the following statements on the cover sheet of the examination.

I have neither received nor given inappropriate assistance on this examination.
I am not aware of any other students who have given or received inappropriate assistance on this examination.

Please sign and date each statement separately. Note that the statements must be true; if you are unable to sign either statement, please talk to me at your earliest convenience. You need not reveal the particulars of the dishonesty, simply that it happened. Note also that inappropriate assistance is assistance from (or to) anyone other than Professor Rebelsky (that's me).

Presenting Your Work

You must present your exam to me in two forms, physically and electronically. If you fail to turn in both versions, you are unlikely to receive credit for the examination.

Physical copy: You must write all of your answers using the computer, print them out, number the pages; put your name on the top of every page, write, sign, and date each of the academic honesty statements (provided you are able to do so); and hand me the printed copy or put it under my office door. If you fail to name and number the printed pages, you may suffer a penalty. If you fail to turn in a legible version of the exam, you are also likely to suffer some sort of penalty.

Electronic copy: You must also submit an electronic copy of your exam. You should create the electronic version by making a tarball of any relevant code and emailing me the tarball. Here are the normal steps for making a tarball.

Remove any cruft (needless files) from your directory structure. I don't want to see .class files, editor backups, or anything similar. If you see a bin directory, please remove that directory, too. Please leave your .git directory.
Switch to the parent directory of your exam directory.
Issue the command tar cvzf username.tgz directory.
If you have done things correctly, you should see the file username.tgz.

Make sure that the tar file contains the appropriate contents using the command tar tvf username.tgz. For example, if I were to check my tarball, I might see something like the following.

> tar tvf rebelsky.tgz
rebelsky/
rebelsky/Makefile
rebelsky/README.md
rebelsky/.git/
rebelsky/.git/CONFIG_EDITMSG
rebelsky/.git/config
...
rebelsky/.gitignore
rebelsky/Problem1/
rebelsky/Problem1/Makefile
rebelsky/Problem1/merge-invariants.md
rebelsky/Problem1/merge-invariants.pdf
rebelsky/Problems23/
rebelsky/Problems23/.classpath
rebelsky/Problems23/.project
rebelsky/Problems23/src/
rebelsky/Problems23/src/BST.java
rebelsky/Problems23/src/BSTExpt.java
rebelsky/Problems23/src/BSTTrace.java
...

I've added a Makefile to the repository. If you edit that Makefile properly and type make, it should automatically build the tarball. You should still issue the tar tvf command to make sure it contains the right data.

Code: In many problems, I ask you to write code. Unless I specify otherwise in a problem, you should write working code and include examples that show that you've tested the code. You should do your best to format that code to the class formatting standards. I am likely to subtract at least half the value of any problem in which your code does not adhere closely to the standards. I will assign a zero to any work in which you appear not to have used the class formatter.

Documentation: You should document classes, interfaces, fields, and methods using Javadoc-style comments. You might also specify preconditions and postconditions for each method.

Care: Just as you should be careful and precise when you write code and documentation, so should you be careful and precise when you write prose. Please check your spelling and grammar. Since I should be equally careful, the whole class will receive one point of extra credit for each error in spelling or grammar you identify on this exam. I will limit that form of extra credit to five points.

Partial Credit: I may give partial credit for partially correct answers. I am best able to give such partial credit if you include a clear set of work that shows how you derived your answer. You ensure the best possible grade for yourself by clearly indicating what part of your answer is work and what part is your final answer.

Getting Help

I may not be available at the time you take the exam. If you feel that a question is badly worded or impossible to answer, note the issue you have observed and attempt to reword the question in such a way that it is answerable. You should also feel free to send me electronic mail at any time of day.

I will also reserve time at the start of each class before the exam is due to discuss any general questions you have on the exam.

Preparation

Clone the repository using the following command, which will help ensure that you have the correct directory structure. Note that I've set things up so that you can treat each problem as a separate Eclipse project. Please do not make one project for the whole exam.. Note also that the command will clone the repository into a directory that corresponds to your user name.

$ git clone https://github.com/Grinnell-CSC207/exam3-2014S *username*

Problems

Problem 1: Invariants for `merge`

Topics: Sorting, Loop Invariants, Arrays

Marge and E. Gram have been exploring merge sort, and, while their merge routine works correctly most of the time, it sometimes fails. They think they should start again. Like all good programmers, the first thing they do is write the documentation and procedure signature.

  /**
   * Merge the values in subarrays of a1 and a2 into a new array.
   * The subarray of a1 takes on indices lb1 (inclusive) to ub1 (exclusive).
   * The subarray of a2 takes on indices lb2 (inclusive) to ub2 (exclusive).
   *
   * @return
   *    m, an array
   *
   * @pre
   *    0 <= lb1 <= ub1 <= a1.length.
   * @pre
   *    0 <= lb2 <= ub2 <= a2.length.
   * @pre
   *    For all i, lb1 < i < ub1, order.compare(a1[i-1],a1[i]) <= 0
   * @pre
   *    For all i, lb2 < i < ub2, order.compare(a2[i-1],a2[i]) <= 0
   * @post
   *    For all i, 0 < i < m.length, order.compare(m[i-1],m[i]) <= 0
   * @post
   *    m is a permutation of the concatenation of the given subarrays of 
   *    a1 and a2.
   */
  public static <T> T[] merge(Comparator<T> order, 
                              T[] a1, int lb1, int ub1,
                              T[] a2, int lb2, int ub2)

Lupe reminds them that the next thing to do is to write the loop invariants. But they are having trouble doing so. They know that they need to have indices for a1, a2, and m. They realize that those indices implicitly break each array into multiple sections. They know that when the loop terminates, the invariant plus the state of the system needs to be enough to guarantee both postconditions (that is, that the array is sorted and that it is a permutation of the concatenation of the given arrays). They know that the typical merge copies one element at a time, so they need the invariant to show that it's correct to copy into the expected location.

a. Sketch the loop invariants for the merge procedure. ASCII art would be great, since it's what we normally use, but if you want to use a drawing program or even scan a hand-drawn sketch, that's fine.

b. Express your invariant using a similar level of formality to that we use in the preconditions and postconditions above.

c. Explain how we can ensure that the loop invariant holds at the beginning of merge.

d. Explain, in English or in Java code, what should happen in each iteration of the main loop.

e. Give a loop termination condition.

f. Do you need to do any work after the loop terminates? If so, describe that work. You may find it useful to explain it in terms of the loop invariants.

g. Explain why the postconditions are met. Your explanation should involve the loop invariants, the postconditions, and any additional work you describe.

I would prefer that you format your answer nicely before turning it in. If you use Markdown, the accompanying Makefile should do the work for you.

Problem 2: Deletion in Binary Search Trees, Mark 1

Topics: Linked structures, Trees

We've started to explore the design of binary search trees. For convenience, we decided to make binary search trees “add only” structures. Right now, you can't remove a value from a BST.

As you might expect, that decision has frustrated your colleagues Remy and Delbert. They think that it's reasonable to want to get rid of items.

Their sensible colleagues suggest that there's a simple technique for doing deletion - you can simply put a null in for the value, and, when the value is null, insert throws an exception. Unfortunately, Remy and Delbert are extremists, and say “Simulated deletion is not deletion. Your approach clogs the tree with pointless nodes. And if the tree is clogged with pointless nodes, we can't guarantee O(logn) insertion and deletion.” Of course, they ignore the fact that we've yet to figure out how to keep the trees balanced, and that we probably won't learn how to do so until we take an upper-division course in algorithms and data structures (or until we search on Wikipedia). But, hey, they're annoying enough that we're going to listen to them.

Fortunately, they suggest a reasonably straightforward approach for removing values.

To delete the node containing key
If the node has no left subtree,
  Replace the node by its right subtree.
Otherwise, if the node has no right subtree,
  Replace the node by its left subtree.
Otherwise, the node has two subtrees
  Shuffle the tree so that the node with the largest key in the
    left subtree replaces the node.

Why does this work? We know that the largest key in the left subtree is larger than every other key in the left subtree. And, since it's in the left subtree, we know that it's smaller than every key in the right subtree. Hence, it can be at the root after we delete the old root.

Asa, Sam, Cam, Ida, and Ina worry a bit about that “Shuffle” instruction. And they also want to design some interesting unit tests. They draw a bunch of trees to help themselves understand what should happen in each situation. In drawing these trees, they use lowercase letters to represent nodes, and uppercase letters to represent tree-shaped groups of nodes. And their pictures all illustrate deletion at the root.

Original	After delete/shuffle
00 - leaf
m / \
01 - no right subtree
m / \ / G	G
02 - no left subtree
m / \ \ N	N
03 - both subtrees, left subtree has no right subtree
m / \ / \ g N / \ / D	g / \ / \ D N
04 - both subtrees, left subtree has right subtree with no right subtree
m / \ / \ g N / \ / \ D j / \ / I	j / \ / \ g N / \ / \ D I
05 - both subtrees, left subtree has right subtree with right subtree (`l` is the rightmost key in the left subtree, and has no left subtree)
m / \ / \ g N / \ / \ D J / \ / \ I l / \	l / \ / \ g N / \ / \ D J / \ / I
06 - both subtrees, left subtree has right subtree with right subtree (`l` is the rightmost key in the left subtree, but has a left subtree)
m / \ / \ g N / \ / \ D J / \ / \ I l / \ / K	l / \ / \ g N / \ / \ D J / \ / \ I K

They note that they aren't sure that there's a real implementation difference between the last four cases, but they thought it was easiest to think about them separately. Similarly, they note that case 00 is covered by each of the next two cases.

When they show their examples to Remy and Delbert, Remy and Delbert note that they would find it useful to find the parent of the rightmost node. But they don't say why.

Your classmates, Una and Tessa, suggest that before we start to write the code, we should write the unit tests. (Yes, this is a large class, you have lots of classmates.) Fortunately, Randy has written some random tests, which you can find in the exam repository. But we should also have some more systematic tests, including white box testing.

a. Write systematic tests, including a few tests for each of the seven cases above. Note that the illustrations show the item being removed at the top of the tree. However, you should also delete items from the middle of the tree (using each of the orientations of the deleted item).

b. Randy's randomized testing is only partially useful. In particular, when testing fails, we'd like to be able to see what series of operations created the failure. Better yet, we'd like to generate code for an experiment that we can run to better understand what went wrong. For example, if the randomized test found that adding “aardvark” and “bear”, and then removing the entry with key “a” caused an error, it might generate the following code, which we could paste into the body of BSTTrace.java..

    pen.println("adding aardvark");
    dict.set("a", "aardvark");
    dict.dump(pen);
    pen.println("adding bear");
    dict.set("b", "bear");
    dict.dump(pen);
    pen.println("removing a");
    dict.remove("a");
    dict.dump(pen);

Rewrite RandomBSTTests.java so that when one of the tests fails, it prints out code similar to that above. Of course, the output code should match the failed test.

Problem 3: Removing from Binary Search Trees

Remy and Delbert are reasonably depressed that all we've done is design and testing. They say “Where's the working code?” They note that we still need to implement the tests. Your professor, Hugh DeWitt, not realizing that Remy and Delbert delegated the R&D on the previous problem to their classmates, suggests that you finish the work.

Implement the remove method for the BST class using the strategy given above.

Note that, in addition to your tests and the provided randomized tests, I am likely to run my own set of tests on your code and may also run your classmates' tests.

Hint: Just as we found it convenient to implement the insert recursively, you will find it equally convenient to implement the remove method recursively. You'll find that BST.java already includes a bit of skeleton code to support that approach.

Hint: You will probably find it necessary to change the left and right references for a variety of nodes. However, you should generally avoid changing the value stored in a node. You may find it useful to create new nodes, although you should be able to write this without creating any new nodes.

Problem 4: Iterating Hash Tables

Topics: Hash tables, Iterators, Dictionaries, Association lists, Generics

Your classmates, Ida, Ray, and Tor, note that good Java practice requires that any collection should also implement iterators. Val suggests that since the iterators for lists and arrays return the values in those lists and arrays, the basic iterator method should return the values in the dictionary, and not the keys. However, Kay pipes up to suggest that we should also be able to iterate the keys in the dictionary. Fortunately, when we designed our Dictionary interface, Dick, Tina, and Ari included such iterators. Guess what? Good old Hugh DeWitt thinks you should implement them for hash tables.

We've seen two mechanisms for implementing hash tables: We can use chained (a.k.a. bucketed) hash tables, in which each cell in the hash table contains an association list, or we can use probed (a.k.a. open) hash tables, in which we look elsewhere when the location in the table contains a key other than the expected key.

The code accompanying this examination provides both implementations of hash tables. Unfortunately, both implementations lack iterators. Implement a keys iterator and a values iterator for OpenHashTable.java.

You are fortunate that Prof. DeWitt has realized that Remy and Delbert skirted work on the previous problems, and has decided that you need not implement the remove method. Hence, you should only implement the next and hasNext methods.

You may iterate the elements in any order, so long as you iterate all of the elements. That is, the values iterator must visit all of the values and the keys iterator must iterate all of the keys.

Your iterator should make sensible use of memory. In particular, it should not use more than a constant amount of space.

Extra Credit

a. You may earn a small amount of extra credit if you show good Git behavior (other than pushing your work to a public repository). In particular, I'd like to see regular commits (presumably a few commits per problem) that have good commit messages.

b. Unit tests for problem 2 that identify errors in my code will earn extra credit. Since that situation is unlikely to happen, unit tests for problem 2 that identify errors in the code for a large number of your classmates will also earn extra credit.

c. I had originally planned to have you implement iterators for both open hash tables and bucketed hash tables, but am not only requiring iterators for open hash tables. You may earn a small amount of extra credit for implementing the iterators for ChainedHashTable.java.

Questions and Answers

Problem 1

How long did this problem take you?

About fifteen minutes.

You seem to have broken each array into five sections. Should we have five sections in our invariants, too?

Five was just a convenient number for setting up a template. You may find that you need fewer or more sections.

Problem 2

How long did this problem take you?

Writing the systematic tests for 2a took me under ten minutes because I planned for common methodname and because buildTree and check are really useful. I covered approximately 49 cases: 7 different patterns (from the exam), which I put (a) at the root of the tree, (b) at the root of the left subtree, (c) at the root of the right subtree, (d) at the root of the left subtree of the left subtree, (e) at the root of the right subtree of the left subtree, (f) at the root of the left subtree of the right subtree, and (g) at the root of the right subtree of the right subtree.

Writing the “generate code” for 2b took me under five minutes. (Amazingly, there's a lot of infrastructure in the tests already.) And, when a random test identified a bug, I could copy and paste the code into BSTTrace.java.

Can you explain how you did one of those systematic tests?

For example, for test 05/f, I used buildTree to build a tree from the string "azmgndjil", removed the m, sure that the tree no longer contained m, and used check to make sure that the tree still contained all of azgndjil.

Problem 3

How long did this problem take you?

I started to solve this problem before starting problem 2. (Whoops!) My untested code for removing from the tree took me ten minutes to write. I hadn't thought about the problem in five months, so I mostly used the diagram. Once I found a bug in my code with the randomized tests, I was able to paste the code into BSTTrace.java, run the program, and see which of the cases my code failed on. Running the test, copying-and-pasting code, and fixing the bug took under five minutes.

Problem 4

How long did this problem take you?

Implementing the iterator for open hash tables took me fifteen minutes, most of which was spent cleaning up code and identifying a subtle bug in the open hash table implementation. (There's a reason I try to do the exam before I distribute it to you.)

Implementing the iterator for chained hash tables took me thirty minutes. Because of that time, I updated the exam to make this iterator extra credit. (As I said, there's a reason I try to do the exam before I distribute it to you.)

Miscellaneous

I see a .git directory in the sample tarball table of contents. Do you really want us to include our .git directory in the tarball?

Yes. That way I can look at your Git habits and maybe even give you some extra credit for good Git habits.

Suppose I get all of the problems right as well as all three extra-credit questions. Suppose also that we accumulate five points of extra-credit for Sam's errors. What grade will I get?

You'll get an A+, which translates to 100/100. I don't go higher than that in computing averages.

What do you consider the hardest problem on the exam?

Past evidence suggests that implementing the remove method for binary search trees is the most challenging. But some of you have had difficulty writing sufficiently detailed loop invariants and sufficiently comprehensive unit tests, so those problems may also be challenging. Writing the iterators for ChainedHashTable.java took me about twice as long as any other problem, which is why I made those iterators optional.

Errata

Here you will find errors of spelling, grammar, and design that students have noted. Remember, each error found corresponds to a point of extra credit for everyone. I usually limit such extra credit to five points. However, if I make an astoundingly large number of errors, then I will provide more extra credit.

I will not accept corrections for credit until after I have taken the examination out of draft mode.

I will not credit corrections for text in the Q&A and errata sections, since those are often written quickly to get information out to students. I may not credit corrections to grammatical mistakes in code.

The first five.

Sam says like “Lupe reminds them that that”. Only one “that” is necessary. [SZ, 1 point]
I'm not sure what a “lop invariant” is. Probably code for “loop invariant”. [MH, 1 point]
“The realize that” should be “They realize that”. [SZ and KN, 1 point]
“They know that they typical merge” should be “They know that the typical merge”. [SZ, 1 point]
Missing end quotation marks in sample calls to dict.set. [SZ, 1 point]

The next few.

"may also run your classmates tests" should be "may also run your classmates' tests". [GB]
In ChainedHashTableTest the name of the factory should be chtFactory instead of ohtFactory. [VB]
The opening documentation in ChainedHashTable.java seems to be from OpenHashTable.java. [SZ]

Citations

The problems on deletion in binary search trees are closely based on a problem I wrote in a previous semester. Most of the text is identical, but for this exam I broke the problem into two parts - one for the unit testing and one for the implementation. The code for that problem also comes from that semester.

The documentation for merge comes from the code for merge sort in the Tao of Java laboratory on sorting, available at https://github.com/Grinnell-CSC207/sorting.

Some of the code for the hash tables problem comes from the code for hash tables in the Tao of Java laboratory on hash tables, available at https://github.com/Grinnell-CSC207/hashtables.

Some of the code for the tests in the hash tables problem comes from the BST code form a previous semester.

The toArray method comes from unit tests I wrote for exam 2 this semester, although it has been modified for this exam.

The file AssociationList.java is closely based on LinkedDictionary.java from the Tao of Java lab on association lists, available at https://github.com/Grinnell-CSC207/association-lists.

Exam 3: ADTs, Algorithms, and Data Structures

Preliminaries

Exam Format

Academic Honesty

Presenting Your Work

Getting Help

Preparation

Problems

Problem 1: Invariants for merge

Problem 2: Deletion in Binary Search Trees, Mark 1

Problem 3: Removing from Binary Search Trees

Problem 4: Iterating Hash Tables

Extra Credit

Questions and Answers

Problem 1

Problem 2

Problem 3

Problem 4

Miscellaneous

Errata

Citations

Problem 1: Invariants for `merge`