Fund. CS II (CS152 2006S)

Homework 14: Counting Words

Assigned: Monday, April 17, 2006
Due: 8:00 a.m., Wednesday, April 19, 2006

Summary: In this assignment, you will use binary search trees to implement a simple text analysis system.

Purposes: To give you more experience using binary search trees, in terms of implementation and use.

Contents:

Background: Text Analysis

Literary theorists employ a number of devices to study texts and to consider authorship. One of the simpler, but perhaps more interesting ones, is a simple word frequency analysis. A word fequency analysis simply counts the number of time each word appears in the document. Surprisingly, different authors have very different patterns of word usage, even for the number (or percent) of times they use common words.

Preparation

Code Files:

Create a new package for this laboratory entitled username.analysis.

Copy those code files into the package and update the package for each.

The Assignment

In this assignment, you will use a dictionary, implemented by a binary search tree, to analyze word frequency in a text. The words will serve as the keys in the dictionary and you will use Counter objects to count frequences. You will read the words from a file that contains one word per line.

a. Write a main class, Analyst, that prompts the user for the name of a file that contains one word per line and then populates the dictionary as follows:

create a general counter
for each word
  increment
  if (the word is not yet in the dictionary)
    put the word with a counter set to 1
  otherwise
    get the counter associated with the word and increment it by 1

You can find sample files in http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Examples/text/. Do your initial testing with the ones that include SHORT in the file name as those are only 1000 lines long.

b. Print out the frequency of five words of your choice.

c. Update BST with a report method that prints every key/value pair in order, from smallest key to largest key. As is the case with the other functions in BST, You will need a local helper function that takes a node as a parameter. That helper function will behave as follows:

if the node is null
  do nothing
otherwise
  recurse on the left subtree
  print the (key,value) pair of the current node
  recurse oon the right subtree

Print a report on one or two of the files.

Submitting Your Work

Email me your answer. Make sure that the answer is in the body of the email and is not an attachment.

 

History

Early April 2006 [Samuel A. Rebelsky]

Sunday, 16 April 2006 [Samuel A. Rebelsky]

 

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Tue May 9 08:31:13 2006.
The source to the document was last modified on Mon Apr 17 19:56:42 2006.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Homework/hw.14.html.

You may wish to validate this document's HTML ; Valid CSS! ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu