[Skip to Body]
Primary:
[Front Door]
[Current]
[Glance]
-
[Honesty]
[On Teaching and Learning]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Readings]
[Reference]
Misc:
[SamR]
[Java 1.5 API]
[Espresso]
[TAO of Java]
[CS152 2004F]
[CS152 2005S]
[CS152 2005F]
Assigned: Monday, April 17, 2006
Due: 8:00 a.m., Wednesday, April 19, 2006
Summary: In this assignment, you will use binary search trees to implement a simple text analysis system.
Purposes: To give you more experience using binary search trees, in terms of implementation and use.
Contents:
Literary theorists employ a number of devices to study texts and to consider authorship. One of the simpler, but perhaps more interesting ones, is a simple word frequency analysis. A word fequency analysis simply counts the number of time each word appears in the document. Surprisingly, different authors have very different patterns of word usage, even for the number (or percent) of times they use common words.
Code Files:
BST.java
(binary search trees)
BSTNode.java
(the nodes for BST)
CompareStringsAlphabetically.java
(a helpful comparator)
Counter.java
(the nodes for BST)
TestBST.java
(a simple tester)
Create a new package for this laboratory entitled
username.analysis
.
Copy those code files into the package and update the package for each.
In this assignment, you will use a dictionary, implemented by a
binary search tree, to analyze word frequency in a text. The
words will serve as the keys in the dictionary and you will use
Counter
objects to count frequences. You will
read the words from a file that contains one word per line.
a. Write a main class, Analyst
, that prompts the user for
the name of a file that contains one word per line and then populates
the dictionary as follows:
create a general counter for each word increment if (the word is not yet in the dictionary) put the word with a counter set to 1 otherwise get the counter associated with the word and increment it by 1
You can find sample files in http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Examples/text/.
Do your initial testing with the ones that include
SHORT
in the file name as those are only
1000 lines long.
b. Print out the frequency of five words of your choice.
c. Update BST
with a report
method
that prints every key/value pair in order, from smallest key
to largest key. As is the case with the other functions in
BST
, You will need a local helper function that takes
a node as a parameter. That helper function will behave as follows:
if the node is null do nothing otherwise recurse on the left subtree print the (key,value) pair of the current node recurse oon the right subtree
Print a report on one or two of the files.
Email me your answer. Make sure that the answer is in the body of the email and is not an attachment.
Early April 2006 [Samuel A. Rebelsky]
Sunday, 16 April 2006 [Samuel A. Rebelsky]
[Skip to Body]
Primary:
[Front Door]
[Current]
[Glance]
-
[Honesty]
[On Teaching and Learning]
Groupings:
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Readings]
[Reference]
Misc:
[SamR]
[Java 1.5 API]
[Espresso]
[TAO of Java]
[CS152 2004F]
[CS152 2005S]
[CS152 2005F]
Disclaimer:
I usually create these pages on the fly
, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.
This document was generated by
Siteweaver on Tue May 9 08:31:13 2006.
The source to the document was last modified on Mon Apr 17 19:56:42 2006.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS152/2006S/Homework/hw.14.html
.
You may wish to
validate this document's HTML
;
;
Check with Bobby