Fund. CS II (CS152 2006S)

Exam 3: Advanced Data Structures and Algorithms

Distributed: Friday, April 28, 2006
Due: 11:00 a.m., Monday, May 8, 2006
Extensions in extreme circumstances only.

This page may be found online at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Exams/exam.03.html.

Contents

Preliminaries

The instructions on this exam are slightly different than the instructions on the other exam. Those who correctly summarize the differences on the cover page of their exam will earn two extra points on this exam.

There are four problems on the exam. Some problems have subproblems. Those who correctly or mostly-correctly answer four problems will earn an A. Those who correctly or mostly-correctly answer three problems will earn a B. Those who correctly or mostly-correctly answer two problems will earn a C. Those who correctly or mostly-correctly answer one problem will earn a D. Those who fail to answer any problems will earn an F.

Experience shows that different people find different problems complex. Hence, if you get stuck on an early problem, try moving on to another problem and let your subconscious work on the early problem. (You'll also get a better sense of accomplishment if you can find at least one problem that you can solve early.)

This examination is open book, open notes, open mind, open computer, open Web. However, it is closed person. That means you may not talk to other people about the exam. Other than as restricted by that limitation, you should feel free to use all reasonable resources available to you. As always, you are expected to turn in your own work. If you find ideas in a book or on the Web, be sure to cite them appropriately.

Although you may use the Web for this exam, you may not post your answers to this examination on the Web (at least not until after I return exams to you). And, in case it's not clear, you may not ask others (in person, via email, via IM, by posting a please help message, or in any other way) to put answers on the Web.

This is a take-home examination. You may use any time or times you deem appropriate to complete the exam, provided you return it to me by the due date. Experience from the first two exams suggests that you should begin this exam early, or at least look at the problems early.

I expect that someone who has mastered the material and works at a moderate rate should have little trouble completing the exam in a reasonable amount of time. In particular, this exam is likely to take you about four to six hours, depending on how well you've learned topics and how fast you work. You should not work more than eight hours on this exam. Please stop at eight hours. I would recommend that after you have spent about three hours on the examination, you pick the two problems you are most likely to be able to solve, and come to speak to me about them. I am fairly confident that, with my help, you will be able to solve at least those two problems in the remaining eight hours.

I would also appreciate it if you would write down the amount of time each problem takes. Each person who does so will earn two points of extra credit. Since I worry about the amount of time my exams take, I will give two points of extra credit to the first two people who honestly report that they've spent at least five hours on the exam or completed the exam. (At that point, I may then change the exam.)

You must include both of the following statements on the cover sheet of the examination. Please sign and date each statement. Note that the statements must be true; if you are unable to sign either statement, please talk to me at your earliest convenience. You need not reveal the particulars of the dishonesty, simply that it happened. Note that inappropriate assistance is primarily assistance from anyone other than Professor Rebelsky (that's me). Inappropriate assistance also includes assistance given to another member of the class.

1. I have neither received nor given inappropriate assistance on this examination.
2. I am not aware of any other students who have given or received inappropriate assistance on this examination.

Because different students may be taking the exam at different times, you are not permitted to discuss the exam with anyone until after I have returned it. If you must say something about the exam, you are allowed to say This is among the hardest exams I have ever taken. If you don't start it early, you will have no chance of finishing the exam. You may also summarize these policies (but not the changes since the previous exam). You may not tell other students which problems you've finished. You may not tell other students how long you've spent on the exam.

You must both answer all of your questions electronically and turn in a printed version of your exam. That is, you must write all of your answers on the computer, print them out, number the pages, put your name on every page, and hand me the printed copy. You must also email me a copy of your exam by copying the various parts of your exam and pasting it into an email message. Put your answers in the same order as the problems. Please write your name at the top of each sheet of the printed copy. Failing to do so will lead to a penalty of two points. Turning your exam in in reverse order will also lead to a penalty, albeit an unspecified one.

In many problems, I ask you to write code. Unless I specify otherwise in a problem, you should write working code and include examples that show that you've tested the code. Unless I specify otherwise, you should document your code (using javadoc-style comments for classes, fields, and methods and slash-slash comments for particular algorithm details and end braces).

Just as you should be careful and precise when you write code and documentation, so should you be careful and precise when you write prose. Please check your spelling and grammar.

I will give partial credit for partially correct answers. You ensure the best possible grade for yourself by emphasizing your answer and including a clear set of work that you used to derive the answer.

I may not be available at the time you take the exam. If you feel that a question is badly worded or impossible to answer, note the problem you have observed and attempt to reword the question in such a way that it is answerable. If it's a reasonable hour (before 10 p.m. and after 8 a.m.), feel free to try to call me in the office (269-4410) or at home (236-7445). I also respond well to email questions.

I will also reserve time at the start of classes next week to discuss any general questions you have on the exam.

Preparation

In this laboratory, you will use project named Exam3 with a host of packages, including:

a. In a terminal window, type

/home/rebelsky/bin/exam3

You should see messages about files being copied.

b. Start Eclipse.

c. In Eclipse, build a project named Exam3 from /home/username/CSC152/Exam3.

d. You are now ready to begin the examination.

Problems

Problem 1: Removing Values from Binary Search Trees

Topics: Binary search trees; Recursion.

Ren Remove has reprimanded me for emphasizing the process of insertion into binary search trees, rather that discussing removal. Nonetheless, in class, we devised a strategy for removing nodes from binary search trees, based on the key.

Implement that strategy in BST.java.

Problem 2: Text Analysis, Revisited

Topics: Dictionaries; Sorting

Anna and Andy Analyst have argued about the homework I gave you regarding text analysis. They note that, although they appreciate the use of text analysis in that assignment, they are concerned that I asked you to use a binary search tree to store the word/counter pairs. They note that for larger documents, it probably makes sense to use a hash table, rather than a binary search tree, since the difference between expected-linear time and logarithmic time becomes significant. They've written something that fills in the hash table, but they have not yet finished the part that gets the ten most frequent words.

a. Finish writing the utility class, Analyst, that takes a BufferedReader as a parameter and returns an array of WordFrequency pairs of the twenty most common words. You can test Analyst with AnalyzeFile.

I would recommend that you use a technique like insertion sort or selection sort to create the sorted array.

b. You can find twenty sample files as /home/rebelsky/Web/Courses/CS152/2006S/Examples/Exam3/Texts/##.txt (where ## is a number between 00 and 19. Determine the frequencies of the most common words for each and make some observations (texts likely to be by the same author, other interesting patterns you noted, etc.).

Problem 3: Finding the Median

Topics: Divide-and-conquer algorithms; Searching and sorting; Quicksort.

Minnie and Mickie Middle also recall our discussion of binary search trees. They particularly remember that we can build better binary search trees if we make the median value the root. In class, we noted that one way to find the median value is to sort the set of values we want to put in the tree. However, that strategy is not very efficient.

Can we do better? Yes, we can use a divide-and-conquer strategy. How do we decide how to divide? We use a key idea from Quicksort: When you want to divide and conquer, but don't know how to divide equally, pick some element (the pivot) and use it to divide the collection into smaller and larger elements. As in all divide-and-conquer algorithms, we will then recurse.

Of course, it's not quite that simple. Once we've guessed a pivot and partitioned the collection, how do we recurse? It turns out that the best way to answer that question is to solve a variant of the median problem: Instead of finding the median, find the ith smallest value.

Here is a header for a method that might just do that.

a. Here is a header for such a method

    /**
     * Find the ith-smallest value in a vector.  The ith-smallest
     * value is one for which there are i smaller values.
     *
     * @param vec
     *   The vector
     * @param i
     *   The "position" of the element to find.
     * @param c
     *   A comparator used to determine ordering.
     * @return ith
     *   The ith smallest value.
     * @pre
     *   The vector contains at least one value.
     *   No two values in the vector are equal.
     * @post
     *   There are exactly i values for which
     *     c.compare(vec.get(j),ith) < 0
     */
    public static <T> T ithSmallest(Vector<T> vec, int i, Comparator<T> c)

How do we implement the method? We return to the variation of Quicksort (divide-and-conquer using a randomly selected pivot).

a. Use this strategy to implement ithSmallest. You can find the header for ithSmallest in Median.java.

b. Use your implementation of ithSmallest to implement a median method with the following signature

public static <T> T median(Vector<T> vec, Comparator<T> c)

You may assume that vec contains no duplicates. Suppose there are n elements in vec. When n is odd, the median is value for which there are (n-1)/2 smaller elements and (n-1)/2 larger elements. When n is even, the median is the value for which there are n/2 smaller elements and n/2-1 larger elements.

You may find TestMedian.java helpful in testing your code.

c. Carefully document the median method, including preconditions and postconditions.

d. A divide-and-conquer algorithm that discards half of the data set at each step should be O(n). However, since there's no guarantee that the pivot splits the data in half, this algorithm may not take On). Gather data on the number of comparisons this algorithm takes and see whether it supports the assertion that the algorithm is O(n) in most cases.

Problem 4: Genetic Matching

Topics: Dynamic Programming, String Matching, Polymorphism

Gene and Gena Geneticist note that they like the dynamic-programming string-matching algorithm, but that it has a few significant problems:

They propose that you rewrite the ec method to take a cost metric function as a parameter, rather than a simple insertion cost and deletion cost. They have even written the CostMetric interface and two implementations, SimpleMetric, a simple cost metric, and SampleMetric, a more interesting cost metric.

Rewrite Editor.ec to take a CostMetric as a parameter.

The remainder of this problem is optional.

Gene and Gena also note that insertion or removal of triplets is generally much cheaper than insertion or removal of singletons. For five points of extra credit, update Editor.ec and the remaining files to accommodate this change.

Some Questions and Answers

These are some of the questions students have asked about the exam and my answers to those questions.

General Questions

Errors

Here you will find errors of spelling, grammar, and design that students have noted. These errors carry no credit, but remind all of us to be more careful.

 

History

Thursday, 20 April 2006 [Samuel A. Rebelsky]

Wednesday, 26 April 2006 [Samuel A. Rebelsky]

Thursday, 27 April 2006 [Samuel A. Rebelsky]

Friday, 28 April 2006 [Samuel A. Rebelsky]

Monday, 1 May 2006 [Samuel A. Rebelsky]

Tuesday, 2 May 2006 [Samuel A. Rebelsky]

Thursday, 4 May 2006 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Tue May 9 08:30:57 2006.
The source to the document was last modified on Thu May 4 09:54:06 2006.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Exams/exam.03.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu