Disclaimer: In this assignment, you will read and generate some utterly horrible “poetry”. Please accept my apologies if any of the work included herein offends your expectations, cultural or otherwise.
Note: In doing this mini-project, you will likely want to refer back to the lab on random language generation.
Please name your file language.rkt.
Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.
a. Create the following lists of words (strings), each of which contains at least five different words of the stated form.
one-syllable-words, a list of words with one syllabletwo-syllable-words, a list of words with two syllablesthree-syllable-words, a list of words with three syllablesfour-syllable-words, a list of words with four syllablesfive-syllable-words, a list of words with five syllablesCombine those into a list or vector in which the value at position
i is a list of words with i syllables. Your code will look
something like
(define words-by-syllable
(list '()
one-syllable-words
two-syllable-words
three-syllable-words
four-syllable-words
five-syllable-words))
b. Document, design, and implement a recursive procedure, (phrase
n words) that randomly generates a string containing a phrase of
n syllables given a list or vector (your choice) of the form
above. Make sure that you accommodate different “patterns” of n
syllables. For example, if n is 4, you might use
It is difficult to write tests for random procedures, so you will likely have to conduct experiments instead. Please include a record of your experiments.
c. Document and write a procedure, (haiku), that generates a Haiku
of the appropriate form, with each line terminated by the strange value
"\n".
> (haiku)
"exit dog dragon\nbaseball dog television\nelephant eat car\n"
> (display (haiku))
exceeding dog car
ant dog eat car baseball ball
exit ball eat car
d. As you explore your haiku procedure, you may discover that
there seems to be a bias toward short words. Write a new procedure
(perhaps with some additional helper procedures), (haiku2), that
generates Haiku that are more likely to have longer words.
In generating some kinds of text it can be useful to have a large corpus of words. And, in many cases, we achieve “interesting” results by using the words of others. Let’s consider how we might make a list of all the different words that appear in a book.
While you may have recently written a procedure that removes duplicates from a list, it’s possible that there were infelicities in that procedure. Here is a procedure that claims to remove duplicates from a sorted list.
;;; (remove-duplicates lst) -> list?
;;; lst : list?
;;; Remove duplicates from lst.
;;; Precondition: lst must be sorted, with duplicates next to each other.
(define remove-duplicates-x
(lambda (lst)
(cond
[(or (null? lst) (null? (cdr lst)))
lst]
[(equal? (car lst) (cadr lst))
(remove-duplicates-x (cdr lst))]
[else
(cons (car lst) (remove-duplicates-x (cdr lst)))])))
Verify that the procedure appears to work as advertised. (There’s nothing to turn in for this part.)
Once you’ve verified that this procedure works, you’re ready for the real work.
a. Write a tail-recursive version of remove-duplicates-x, which
you should call remove-duplicates.
b. Document, write tests for, design, and implement a procedure,
(unique-words string) that
string-split, or you can use something fancier]string-downcase],sort], andremove-duplicates].In generating some kinds of text, such as those in a previous problem, it is useful to have a large corpus of words in different categories. One set of categories are words with a certain number of syllables.
a. Document and write a procedure, (syllables word), that attempts
to determine how many syllables are in the string word. You can assume
that word consists of only lowercase letters.
How do you decide how many syllables are in a word? One technique that works in many cases is to identify how many sequences of vowels there are. In many instances, that strategy provides a good rough estimate. However, there are also many cases in which that estimate fails (potentially, it fails for “syllables”, although we could argue that the internal “y” serves as a vowel). So try to be creative in figuring out other special patterns. It is likely that you will need one or more conditionals in your procedure.
It is fine if your procedure does not work perfectly, or even all that well. We’d simply like to see some thought and creativity.
b. Make a copy of The Project Gutenberg version of Jane Eyre, available at http://www.gutenberg.org/files/1260/1260-0.txt. Please name it 1260.txt and place it in the same directory as your Racket program.
c. Using syllables, filter, and any other procedures you deem
appropriate, generate lists of the one-syllable, two-syllable,
three-syllable, four-syllable, and five-syllable words in Jane
Eyre.
d. Use those lists to generate some interesting pattern of text, such as a Haiku.
What makes a poem? While there is no requirement that poetry rhyme, many people associate rhyme with poetry. It is also certainly the case that many forms of poetry, such as a quatrain make use of rhyme.
As we think about generating or analyzing text, it may be useful to to be able to identify rhymes. Of course, we appear to be working in the wonderfully inconsistent language known as English, so precise definition of rhymes are difficult.
a. One possible metric for rhyming is the end of the word. Write
a procedure, (might-rhyme? word1 word2), that takes two strings
that represent words (e.g., all lowercase letters plus potential
apostrophes) and returns true if the two words share the last three
characters.
Note: Your procedure should work correctly if one or both of the words has fewer than three characters.
b. Identify a dozen or so pairs of words that do not rhyme, but
pass that test. You might, for example, pick some random words
and then use filter to look through a larger list of words
to see which seem to rhyme.
c. Identify a dozen or so pairs of words that do rhyme, but do not pass that test.
d. Using your additional analysis, write a better (rhymes? word1 word2)
procedure. You are free to make this as simple or as complicated as
you like, provided it is at least as successful as might-rhyme?.
(You should, of course, document rhymes?.)
e. Using rhymes?, write a procedure, (rhymes-with word words),
that takes a string and a list of strings as input and finds all
of the words in words that appear to rhyme with word. (You
should, of course, document rhymes-with.)
f. Write a procedure (abab words) that takes as input a corpus (list)
of words and generates a “random” quatrain of four lines of four
words. The last words of the first and third lines must rhyme, as
must the last words of the second and fourth lines.
As you’ve likely realized, generating actual language is hard, and writing programs that “interpret” language is often even harder. One of the legendary challenges of language generation has to do with the differences between two very similar statements.
Time flies like an arrow.
Fruit flies like an apple.
Can you tell why that pair is complex? If not, ask your faculty member or mentor.
In looking for ways to generate somewhat realistic text, one approach that has shown some promise relies on a relatively straightforward analysis of an existing text.
This approach sometimes works surprising well and sometimes works relatively poorly. We can often improve it by working with pairs or triplets of words. But for now, we’ll stick with single words.
We’re also going to try a variant of this technique, in which we work from the back of a sentence to the front, rather than the front to the back.
a. Document and write a procedure, (sentence-ends str), that finds
all of the words in str that end sentences. For example,
> (sentence-ends "The cat ate the hat. The rat sat.")
'("hat" "sat")
> (sentence-ends "Do you like blue mac and cheese? No I don't, it makes me sneeze!")
'("cheese" "sneeze")
b. Document and write a procedure, (left-neighbors word str), that finds
all of the words that immediately precede word in str. For example,
> (left-neighbors "hat" "The cat sat on the hat. 'Where is my hat?' asked the rat. It's now a flat hat. How 'bout that? Will the fat rat jump on that brat cat?")
'("the" "my" "flat")
With those two procedures, we should be able to generate things that appear to be similar sentences. Let’s see.
That’s probably enough. We’ve now generated the phrase “my hat asked the hat”. While it’s not Shakespeare, it is potentially promising.
c. Document and write a procedure, (random-sentence words) that
sentence-ends],random-elt],left-neighbors],random-elt] (if there are no left neighbors, uses a random word),left-neighbors],random-elt] (if there are no left neighbors, uses a random word),After selecting six words, you should then combine them together into
a single sentence, using string-append.
d. It may be worth comparing this “backwards” approach to a more forwards
approach. To get ready, document and write a procedure (right-neighbors
word str) that finds all the words that immediately follow word in
str. (We’re not going to have you do the rest of that experiment,
but you might find right-neighbors useful elsewhere in this assigment.)
You’ve explored a variety of issues in analyzing and generating text. It’s now time to explore creative ways to use what you have learned.
Poets.org provides details on a wide variety of poetic forms, such as limericks. (You can see the different poetic forms in hte list of poems.
Pick a non-trivial poetic form and write a program to generate (or approximate) poetry of that form. You may choose to work from fixed sets of words or words you draw from another source.
We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it acheives its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly). In a few cases, we will also consider the creativity of your result.