Mini-Project 4: Freestyle text visualization

Summary
In this project, you will employ a variety of tools you have learned to visualize some aspects of texts.
Collaboration
Each student should submit their own set of solutions to this project. You may certainly consult others in developing the project, although you should cite or acknowledge them if you have done so.

Over the past few weeks, you have learned and employed a variety of types, tools, and techniques. For example, you have written programs that analyze texts to determine characteristics of those texts (often using regular expressions) and you have written programs that make images. You have also learned recursion, testing, documentation, style, and more.

In this project, you will employ these techniques to develop something others might use to quickly understand some characterisics of a text.

The basics

Although the tools we have created to analyze texts produce results, the results are only in the form of numbers and words. However, some people seem to better undestand information provided in visual form. (Some people find it more difficult to understand information provided in visual form; for example, those with visual impairments will be better off with textual or numeric summaries. Ideally, we would support multiple kinds of users.)

Write a program that reads an arbitrary text file (e.g., one of the texts from Project Gutenberg) and produces a useful visualization of some charactersitics of that text. You may choose what you want to visualize and how you want to visualize it.

Your goal is that the visualization provide useful information about texts in a way that others might easily understand. And, as noted, it should work with any reasonable text file.

Once you’ve written the program, use it to comparatively “analyze” two texts of your choice.

Please spend between three and four hours on this project.

Required components

You must employ the following tools in your project in non-trivial ways.

  • Regular expressions. That is, you should write at least one non-trivial regular expression.
  • Text analysis. That is, you should read from a file and compute one or more attributes of the text (Dale-Chall score, average sentence length, common words, etc.)
  • Images. That is, you should generate images using the tools we learned at the start of the course.
  • Recursion. That is, you should find some natural way to recurse in this problem. It might be in building your visualization. It might be in processing the text. It might be somewhere else.
  • Testing. That is, you must test some of your procedures. In particular, you should provide tests that demonstrate that your regular expression behaves in the way you intend it to.

As always, you should document your procedures and follow our conventions for style.

What to submit

  • Your Racket code, in a file named text-visualization.rkt
  • Two or more text files that you have analyzed. You may use whatever name you consider appropriate.
  • The results of a sample analysis, in a file named “analysis.png”. You can create that file with (save-image img "analysis.png").

Additional notes

  • Sam will almost certainly display the results of your work (the images) in class. He may even ask you to talk about those results.