Final Examination

Assigned: Tuesday, 10 December 2013

Prologue due: 5:00 p.m., Monday, 16 December 2013. Electronic version due: noon, Thursday, 19 December 2013. Signed academic honesty statement due: noon, Thursday, 19 December 2013.

Preliminaries

Exam Format

This is a take-home examination. You may use any time or times you deem appropriate to complete the exam, provided you return it to me by the due date.

This examination has a required prologue that must be completed by Monday afternoon. The prologue is intended to help you think about the examination.

There are five problems on this examination. You must do your best to answer all of them. The problems are not necessarily of equal difficulty. Problems may include subproblems. If you complete five problems correctly or mostly correctly, you will earn an A. If you complete four problems correctly or mostly correctly, you will earn a B. If you complete three problems correctly or mostly correctly, you will earn a C. If you complete two problems correctly or mostly correctly, you will earn a D. If you complete fewer than two problems correctly or mostly correctly, you will earn an F. If you do not attempt the examination, you will earn a 0. Partially correct solutions may or may not earn you a partial grade, depending on the discretion of the grader.

Read the entire examination before you begin.

I expect that someone who has mastered the material and works at a moderate rate should have little trouble completing the exam in a reasonable amount of time. In particular, this exam is likely to take you about ten hours, depending on how well you've learned the topics and how fast you work.

Academic Honesty

This examination is open book, open notes, open mind, open computer, and open Web. However, it is closed person. That means you should not talk to other people about the exam. Other than as restricted by that limitation, you should feel free to use all reasonable resources available to you.

As always, you are expected to turn in your own work. If you find ideas in a book or on the Web, be sure to cite them appropriately. If you use code that you wrote for a previous lab or homework, cite that lab or homework and the other members of your group. If you use code that you found on the course Web site, be sure to cite that code. You need not cite the code provided in the body of the examination.

Although you may use the Web for this exam, you may not post your answers to this examination on the Web. (You certainly should not post them to GitHub.) And, in case it's not clear, you may not ask others (in person, via email, via IM, via IRC, by posting a “please help” message or question on StackOverflow or on any other site, or in any other way) to put answers on the Web.

Because different students may be taking the exam at different times, and because some students will choose to take a makeup examination that involves redoing the problems, you are not permitted to discuss the exam with anyone until after I have indicated that it is acceptable to do so. If you must say something about the exam, you are allowed to say “This is among the hardest exams I have ever taken. If you don't start it early, you will have no chance of finishing the exam.” You may also summarize these policies. You may not tell other students which problems you've finished. You may not tell other students how long you've spent on the exam.

You must turn in a sheet that includes both of the following statements.

I have neither received nor given inappropriate assistance on this examination.
I am not aware of any other students who have given or received inappropriate assistance on this examination.

Please sign and date each statement separately. Note that the statements must be true; if you are unable to sign either statement, please talk to me at your earliest convenience. You need not reveal the particulars of the dishonesty, simply that it happened. Note also that inappropriate assistance is assistance from (or to) anyone other than Professor Rebelsky (that's me).

Presenting Your Work

You need only present your exam to me electronically. You should create the electronic version by making a tarball of any relevant code and emailing me the tarball. If you don't know how to make a tarball, let me know. If you really don't want to make a tarball, you can make a zip file. In either case, please make sure that the file unpacks into a single folder with your username. That folder should four subfolders, one each for problems 1, 2, and 3, and a combined folder for problems 4 and 5.

Unless specified otherwise in the problem, the .java files for each problem should be in a src subfolder of the problem folder, and should not have a package declaration. You can also include .txt or .md files in each folder that have notes you would like me to read, answers to non-coding questions, and sample output from any experiments you conduct. You may include a .txt or .md file in the top-level directory with general notes.

In many problems, I ask you to write code. Unless I specify otherwise in a problem, you should write working code and include examples that show that you've tested the code. You should do your best to format that code to the Sun/Oracle Java coding standards. I am likely to take off one point (on a 100-point scale) for each formatting error that I note, although I am also likely to allow up to three formatting errors without penalty.

You should document classes, interfaces, fields, and methods using Javadoc-style comments.

Just as you should be careful and precise when you write code and documentation, so should you be careful and precise when you write prose. Please check your spelling and grammar. Since I should be equally careful, the whole class will receive one point of extra credit for each error in spelling or grammar you identify on this exam. I will limit that form of extra credit to five points.

I may give partial credit for partially correct answers. I am best able to give such partial credit if you include a clear set of work that shows how you derived your answer. You ensure the best possible grade for yourself by clearly indicating what part of your answer is work and what part is your final answer.

Getting Help

I may not be available at the time you take the exam. If you feel that a question is badly worded or impossible to answer, note the issue you have observed and attempt to reword the question in such a way that it is answerable. You should also feel free to send me electronic mail at any time of day.

I will also reserve time at the start of each class before the exam is due to discuss any general questions you have on the exam.

Preparation

Clone the repo using the following command, which will help ensure that you have the correct directory structure. Note that I have tried to set things up so that you can treat each problem as a separate Eclipse project.

$ git clone https://github.com/Grinnell-CSC207/final-2013F *username*

Problems

Problem 1: Invariants

Topics: Loops, Invariants, Arrays, Partitioning, Heaps

Varun and Ina understand why I like invariants, but have trouble writing their own, or at least maintaining them, so they've turned to you.

a. They've started working on an alternate version of partition in which elements equal to the pivot can either be on the left side or the right side. They don't like my standard approach, because they note that we sometimes swap a large element out of the large side and then back again. They also get bothered that I don't include the upper-bound in the subarray. Finally, they don't like that I indicate borders of cells, rather than the cells themselves.

They decide that their invariant should be:

+-+--------+---------+--------+
|p| <= piv | unknown | >= piv |
+-+--------+---------+--------+
 ^        ^           ^      ^
 |        |           |      |
 lb       small       large  ub

That is,

The pivot is in vals[lb].
For all i, lb+1 <= i <= small, vals[i] is less than or equal to the pivot.
For all i, large <= i <= ub; vals[i] is greater than or equal to the pivot.

They put all their work together as follows.

    /**
     * Partition the elements between lb and ub, inclusive.
     *
     * @pre lb <= ub.
     */
    static <T> int partition(T[] vals, int lb, int ub, Comparator<T> order) {
        // Put a pivot at the start of the subarray.
        swap(vals, lb, lb + (ub-lb)/2);

        // Set up the bounds
        int small = lb+1;
        int large = ub;

        while (small <= large) {
            // Skip over small elements
            while ((small <= large) 
                     && (order.compare(vals[small], vals[lb]) <= 0)) {
                 ++small;
            } // while
            // Observation: At this point, vals[small] is large

            // Skip over large elements
            while ((small <= large)
                     && (order.compare(vals[large], vals[lb]) >= 0)) {
                 --large;
            } // while
            // Observation: At this point, vals[large] is small

            // We've reached large/small elements, swap 'em
            swap(vals, small++, large--);
        } // while

        // The element at position small is the last small element,
        // so we can swap it with the pivot.
        swap(vals, lb, small);
        return small;
   } // partition(T[], int, int, Comparator<T>)

Unfortunately, there are flaws in their analysis and implementation. Lupe, their class mentor, suggests that you help them by answering the following questions.

i. Are small and large initialized correctly to ensure that the invariant holds at the start of the outer loop?

ii. Is the test in the outer loop consistent with this loop invariant?

iii. If the invariant holds at the start of the first inner loop, how, if at all, is it violated at the end of the first inner loop?

iv. Given that expectation, how, if at all, might the invariant be violated at the end of the second inner loop?

v. Given those expectations, does the final portion of the outer loop restore the loop invariant?

vi. Within the outer and inner loops, are the array elements accessed in a way consistent with the loop invariant?

vii. If we assume that the loop invariant holds at the end of the main loop, is vals[lb] being swapped with the correct element?

b. After you answer those questions, Lupe suggests that we “think more like SamR”. In particular, Lupe suggests that we treat small and large more like borders.

+-+--------+---------+--------+
|p| <= piv | unknown | >= piv |
+-+--------+---------+--------+
 ^          ^       ^        ^
 |          |       |        |
 lb         small   large    ub

The pivot is in vals[lb].
For all i, lb+1 <= i < small, vals[i] is less than or equal to the pivot.
For all i, large < i <= ub; vals[i] is greater than or equal to the pivot.

Answer the same questions i to vii for this new invariant.

c. Find as simple a correction as possible to fix the given code.

d. Ina decides we all need more practice and notes that we have not bothered to write an invariant for our iterative Heap.swapUp method. Write that invariant.

e. Varun thinks we need even more practice and notes that we have not bothered to write an invariant for our iterative swapDown method. Lupe reminds Varun that we've been using a recursive Heap.swapDown method. Write or find an iterative Heap.swapDown method.

f. Write an invariant for the Heap.swapDown method.

Problem 2: Adding Information to Data Structures

Topics: Dynamic Data Structures, Trees, Lists, Nodes, Recursive Processing

Your classmates, Tate and Anna, note that we often want to calculate information about data structures. They suggest that we extend the nodes in our favorite linked data structures, the list and the tree, with an additional field, info, that can hold an arbitrary piece of information. You can find their updated code in the exam repository.

a. Anna notes that it might be useful to identify the distance of each node from the end of the list. For example, in the list { a, b, c }, the node that contains a is two from the end of the list, the node that contains b is one from the end of the list, and the node that contains c is 0 from the end of the list, since it's at the end of the list.

To the linked list class, add a method, computeDistances(), that fills in the info field for each node with an Integer that represents the distance of that node from the end of the list.

You may only iterate the list once in doing this computation. In particular, you may only set the info field of each node once and you may only access the info field of each node once.

In addition, you may not declare any variables that hold arrays or dynamic data structures and you may not create any new nodes.

b. Tate, in response, suggests that we do something similar for trees. Instead of filling in the distance to the end, he wants you to fill in the height of the node, the distance to the farthest leaf.

To the BST class, add a method, computeHeight(), that fills in the info field for each node with an Integer that represents the distance of that node to the farthest leaf that falls below that node.

Once again, you may visit each node at most once in doing this computation. In addition, you may not declare any variables that hold arrays or dynamic data structures and you may not create any new nodes.

c. Anna, not to be outdone, suggests something a bit more interesting. Rather than adding a height, we should create a string that gives the keys in the subtree, in alphabetical order, separated by commas.

To the BST class add a method, computeKeySequences(), that fills in the info field with a String that represents all of the keys in the subtree, in order, separated by commas.

Once again, you should visit each node at most once. And, once again, you may not declare any variables that hold arrays or dynamic data structures and you may not create any new nodes.

Problem 3: Running Algorithms by Hand

Topics: Heaps, Hash Tables, Hash Codes

I was talking about this course to my colleagues, Theodore and Tessa, at a recent conference. While they approved of our regular use of unit tests, they also noted that they prefer to see more “white box” testing, in which we look inside the algorithm to make sure that things are okay along the way. “After all,” they suggest, “a student could write something that gives the same output as a hash table, but does not organize the data correctly”. They suggested that I therefore give you some practice running algorithms “by hand”.

a. Suppose we are working with strings, that our non-expandable hash table has a capacity of 20 elements, and we have chosen the primitive hash function of “add the value of the first and last letter of the string, using the following chart”.

a  b  c  d  e  f  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z
1  3  5  7  9  11 13 15 17 19 21 23 25 26 24 22 20 18 16 14 12 10 8  6  4  2

Show the state of the hash table after inserting the following ten strings in that order: "raynard", "russell", "charlie", "pamela", "george", "richard", "glenn", "howard", "samuel", and "john". Use each of the following table construction policies (so show four different tables). If a policy makes it impossible to insert a particular string, note that impossibility.

i. Bucketed (more than one value can be in the same location).

ii. No buckets, linear probing, offset of subsequent multiples of 7 for linear probe.

iii. No buckets, quadratic probing (offsets of 1, 4, 9, 16, 25, 36, 49, ... from the original location).

iv. No buckets, linear probing, using the value of the second character to determine the offset. (E.g., for "charlie", the offset would be multiples of 15 and for "pamela" the offset would be multiples of 1.)

b. You may recall that the Heapsort algorithm works by turning an array into a heap, and then repeatedly swapping the element at the top of the heap to the end of the heap and restoring the heap.

i. Show the state of the following array after we've run the heapify step, but before we've started swapping elements. Assume that smallest values have highest priority (which means we will probably be sorting the array from largest to smallest).

5, 1, 8, 2, 3, 7, 9, 4, 4, 6

ii. Suppose we start with the heap given in the following array.

1, 2, 3, 2, 4, 8, 4, 6, 9, 5

Show the state of the array after we've swapped the three smallest values to the end, and restored the heap after each swap.

Problem 4: Parsing JSON

Topics: JSON, ArrayLists

Ray and Ari have almost finished the JSON homework, but are stumped on arrays. Finish their implementation.

Problem 5: Printing JSON Trees

Topics: Trees, Output, Recursion

Prince Theresa has appreciated all of the lovely tree output we've generated with the approach of “print the node, then print all the children indented slightly more”. PT would like to see us do the same with JSON. PT suggests that each field of an object should be indented, and the value should be on a separate line, indented again.

Extend Ray and Ari's code to print an appropriate tree. Examples follow. You may find that you present the fields in a different order.

Input	Output
{"Name":"Sam"}	Object Name: Sam
{"Name":"Sam","Job":{"Employer":"Grinnell","Title": "Professor of Comp Sci","StartYear":1997}}	Object Name: Sam Job: Object Employer: Grinnell Title: Professor of Comp Sci StartYear: 1997
{"Values":[5,1,2,9,3]}	Object Values: [ 5 1 2 9 3 ]
{"Students":[{"First":"John","Last":"Doe"}, {"First":"Jane", "Last":"Doe","Grades":[]}]}	Object Students: [ Object First: John Last: Doe Object First: Jane Last: Doe Grades: [ ] ]

Citations

Many problems on this examination are based closely on problems Henry Walker wrote for past semesters of CSC 207. However, they are rephrased in my own style.

Questions and Answers

Problem 1

For 1C, are we trying to fix the code given in 1A to match the invariants and such in 1A or 1B?

You are trying to fix the code so that it correctly partitions the array. You should determine which invariant helps you guarantee that result.

For 1E, what are the parameters that the swapDown procedure should take?

You have a sample swapDown procedure, don't you? Traditionally, it takes the array, the size of the array, and the comparator. In the recursive formulation, it was useful to take the position to swap down from as a parameter. In the iterative version, you can use position 0, or you can include it as a parameter.

Could you explain again exactly what swapUp and swapDown do, so we have that information to reference in the actual exam?

swapUp restores a heap by taking an element that is potentially too low in the heap and moving it up in the heap.

swapDown restores a heap by taking an element that is potentially too high in the heap and moving it down in the heap.

For 1F, should we write the invariant for the iterative or recursive Heap.swapDown method?

Iterative. Remember, it's a loop invariant.

Suppose I've found one issue in the partition method. Do I really need to analyze the other issues?

Certainly. Fixing that one issue may not fix the whole problem, and a good analyst tries to identify as many problems as possible. Why? In part because understanding all of the problems will help you identify which problem is the most important, and will likely give you some ideas on how to fix that problem.

Please remimd me what you want for an invariant.

An invariant is a statement (well, a set of statements) about the state of the program such that if the invariant holds at the start of one iteration of the loop, it holds at the end of the iteration. Knowing that the iteration holds and that the loop has terminated should let you conclude that the postcondition of your procedure has been met.

Problem 2

For 2A, are we supposed to create a separate test/experiment class that actually calls the method we wrote, computeDistances?

That would probably be a good idea. You can also just extend the current experiment.

Problem 3

Can you remind me of the loop invariant for heap sort?

There are two primary invariants. First, the array is divided into a heap portion and a sorted portion.

  +------+--------+
  | Heap | Sorted |
  +------+--------+

Second, everything in the heap section is no higher priority than everything in the sorted section. If we're sorting smallest to largest, all the things in the heap are less than or equal to the things in the sorted section. If, as in the problem, we're sorting from largest to largest, all the things in the heap are greater than or equal to the things in the sorted section.

I understand that the algorithm sorts largest to smallest. For b.ii, Can you show the state of the heap after, say, 8 swaps and reheaps?

+---+---+---+---+---+---+---+---+---+---+
| 8 | 9 | 6 | 5 | 4 | 4 | 3 | 2 | 2 | 1 | 
+---+---+---+---+---+---+---+---+---+---+
  Heap  |        Sorted

Problem 4

I worry about the call to info.next() in the switch statement in parseArray.

I would agree with you that info.next() may not be the best idea. However, the code in parseArray is just a stub to get it to go through the array in a reasonable way during testing of other methods. I didn't expect you to keep any of that code.

I approached the homework much differently. Do you have any suggestions?

1. Throw away the stub. It's intended as a stub, and not as anything resembling a solution.

2. Take a look at the code for parsing an object, which should be similar to what you need to do for an array.

Problem 5

The object fields in my tree appear in a different order than they appear in the sample output. Is that okay?

Certainly. Since we're storing the fields in a hash table, we'll get them out of the hash table in whatever order the table uses. (We could also get them out in alphabetical order by getting all the keys, putting them in an array, and sorting that array, but that seems a bit beyond the scope of this problem.)

How should the indent of subtrees relate to the indent of their parent?

Each additional indentation should be an additional two spaces.

What about those weirdo dashes that you put in the experiment?

That was just a convenient way to set off the tree.

Errata

Here you will find errors of spelling, grammar, and design that students have noted. Remember, each error found corresponds to a point of extra credit for everyone. I usually limit such extra credit to five points. However, if I make an astoundingly large number of errors, then I will provide more extra credit.

In general, I don't give credit for errors in the Q&A section, the Errata section, or the experiments.

“Annotate” is misspelled in LLExpt.java [JB, 0 points] (fixed, but in an experiment).
Missing word in documentation in LinkedList.java. [JB, 1 point]
The exam is unclear about whether we can build additional structures for 2C. (Clarified.) [JB, 1 point]
Typo in comment in sanity check in parseObject in parseObject. [ML, 1 point]
Problem 3a has four different tables, not five. [EW, 1 point]
In 1ai.vii, a[lower] appears instead of vals[lower]. [AT, JB, 1 point]
In 1, the topics include “the Partitioning” rather than “Partitioning” [JB, 0 points]
In 1, the vals parameter to swap is missing. [JB, AK, 0 points]