Topics: randomness, strings, language generation
Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.
a. Create the following lists of words, each of which contains at least five different words of the stated form.
one-syllable-words
, a list of words with one syllabletwo-syllable-words
, a list of words with two syllablesthree-syllable-words
, a list of words with three syllablesfour-syllable-words
, a list of words with four syllablesfive-syllable-words
, a list of words with five syllablesb. Document and write a procedure, (two-syllable-group)
, that
randomly generates a two-syllable group of words, using either two
one-syllable words or a single two-syllable word. For example, if
one-syllable-words
contains '(ant ball car dog eat)
and
two-syllable-words
contains (aardvark baseball cocoon dragon
exit)
, we might see the following behavior.
> (two-syllable-group)
"car ant"
> (two-syllable-group)
"dragon"
> (two-syllable-group)
"ball eat"
c. Document and write a procedure, (three-syllable-group)
, that
randomly generates a three-syllable group of words, using either
(i) a one-syllable word followed by a two-syllable group, (ii) a
two-syllable group followed by a one-sylalble word, or (iii) a
three-syllable word.
d. Document and write a procedure, (four-syllable-group)
, that
randomly generates a four-syllable group of words, using either (i)
a one-syllable word followed by a three-syllable group, (ii) a
three-syllable group followed by a one-syllable word, (iii) two
two-syllable groups, or (iv) a four-syllable word.
e. Document and write a procedure, (five-syllable-group)
, that
randomly generates a five-syllable group of words, using either (i)
a one-syllable word followed by a four-syllable group, (ii) a
two-syllable group followed by a three-syllable group, (iii) a
three-syllable group followed by a two-syllable group, (iv) a
four-syllable group followed by a one-syllable word, or (v) a
five-syllable word.
f. Document and write a procedure, (seven-syllable-group)
, that
randomly generates a seven-syllable group of words using either (i)
a two-syllable group followed by a five-syllable group or (ii) a
five-syllable group followed by a two-syllable group.
g. Document and write a procedure, (haiku)
, that generates a Haiku
of the appropriate form.
> (haiku)
"exit dog dragon\nbaseball dog television\nelephant eat car\n"
> (display (haiku))
Output! exceeding dog car
Output! ant dog eat car baseball ball
Output! exit ball eat car
h. As you explore your haiku
procedure, you may discover that
there seems to be a bias toward short words. Write a new procedure
(perhaps with some additional helper procedures), (haiku2)
, that
generates Haiku that are more likely to have longer words.
Topics: files, strings, regular expressions
In generating some kinds of text it can be useful to have a large corpus of words. And, in many cases, we achieve “interesting” results by using the words of others. Let’s consider how we might make a list of all the different words that appear in a book.
While you may have recently written a procedure that removes duplicates from a list, it’s possible that there were infelicities in that procedure. Here is a procedure that claims to remove duplicates from a sorted list. (This procedure is another in the category of “procedures for which you may understand the what but not the *how”.)
;;; Procedure:
;;; remove-duplicates
;;; Parameters:
;;; lst, a sorted list of values
;;; Purpose:
;;; Remove duplicates from lst.
;;; Produces:
;;; unique, a sorted list of values
;;; Preconditions:
;;; [No additional]
;;; Postconditions:
;;; * Every element in unique appears in lst.
;;; * Every element in lst is equal to some element in unique.
;;; * unique is sorted in the same way as lst.
(define remove-duplicates
(lambda (lst)
(cond
[(or (null? lst) (null? (cdr lst)))
lst]
[(equal? (car lst) (cadr lst))
(remove-duplicates (cdr lst))]
[else
(cons (car lst) (remove-duplicates (cdr lst)))])))
Verify that the procedure appears to work as advertised. (There’s nothing to turn in for this part.)
Once you’ve verified that this procedure works, you’re ready for the real work.
Document and write a procedure, (unique-words file)
that
file
as a string [using file->string
],string-downcase
],sort
], andremove-duplicates
].Topics: strings, randomness, text analysis, conditionals
In generating some kinds of text, such as those in the previous problem, it is useful to have a large corpus of words in different categories. One set of categories are words with a certain number of syllables.
a. Document and write a procedure, (syllables word)
, that attempts
to determine how many syllables are in the string word
. You can assume
that word
consists of only lowercase letters.
How do you decide how many syllables are in a word? One technique that works in many cases is to identify how many sequences of vowels there are. In many instances, that provides a rough estimate. However, there are also many cases in which that estimate fails (potentially, it fails for “syllables”, although we could argue that the internal “y” serves as a vowel). So try to be creative in figuring out other special patterns. It is likely that you will need one or more conditionals in your procedure.
b. As you may recall, the file /home/rebelsky/Desktop/pg1260.txt
contains the Project Gutenberg version of Jane Eyre. Using
syllables
, filter
, and any other procedures you deem appropriate,
generate lists of the one-syllable, two-syllable, three-syllable,
four-syllable, and five-syllable words in Jane Eyre.
c. Use those lists to generate some interesting pattern of text, such as a Haiku.
Topics: strings, text analysis, conditionals, randomness
What makes a poem? While there is no requirement that poetry rhyme, many people associate rhyme with poetry. It is also certainly the case that many forms of poetry, such as a quatrain make use of rhyme.
As we think about generating or analyzing text, it may be useful to to be able to identify rhymes. Of course, we appear to be working in the wonderfully inconsistent language known as English, so precise definition of rhymes are difficult.
a. One possible metric for rhyming is the end of the word. Write
a procedure, (might-rhyme? word1 word2)
, that takes two strings
that represent words (e.g., all lowercase letters plus potential
apostrophes) and returns true if the two words share the last three
characters.
Note: Your procedure should work correctly if one or both of the words has fewer than three characters.
b. Identify a dozen or so pairs of words that do not rhyme, but
pass that test. You might, for example, pick some random words
and then use filter
to look through a larger list of words
to see which seem to rhyme.
c. Identify a dozen or so pairs of words that do rhyme, but do not pass that test.
d. Using your additional analysis, write a better (rhymes? word1 word2)
procedure. You are free to make this as simple or as complicated as
you like, provided it is at least as successful as might-rhyme
.
(You should, of course, document rhymes?
.)
e. Using rhymes?
, write a procedure, (rhymes-with word words)
,
that finds all of the words in words
that appear to rhyme with
word
. (You should, of course, document rhymes?
.)
f. Write a procedure (abab words)
that takes as input a corpus
of words and generates a “random” quatrain of four lines of four
words. The last words of the first and third lines must rhyme, as
must the last words of the second and fourth lines.
Topics: strings, text analysis, regular expressions, conditionals, randomness, local bindings
As you’ve likely realized, generating actual language is hard, and writing programs that “interpret” language is often even harder. One of the legendary challenges of language generation has to do with the differences between two very similar statements.
Time flies like an arrow.
Fruit flies like an apple.
Can you tell why that pair is complex? If not, ask your faculty member or mentor.
In looking for ways to generate somewhat realistic text, one approach that has shown some promise relies on a relatively straightforward analysis of an existing text.
This approach sometimes works surprising well and sometimes works relatively poorly. We can often improve it by working with pairs or triplets of words. But for now, we’ll stick with single words.
We’re also going to try a variant of this technique, in which we work from the back of a sentence to the front, rather than the front to the back.
a. Document and write a procedure, (sentence-ends str)
, that finds
all of the words in str
that end sentences. For example,
> (sentence-ends "The cat ate the hat. The rat sat.")
'("hat" "sat")
> (sentence-ends "Do you like blue mac and cheese? No I don't, it makes me sneeze!")
'("cheese" "sneeze")
b. Document and write a procedure, (left-neighbors word str)
, that finds
all of the words that immediately precede word
in str
. For example,
> (left-neighbors "hat" "The cat sat on the hat. 'Where is my hat?' asked the rat. It's now a flat hat. How 'bout that? Will the fat rat jump on that brat cat?")
'("the" "my" "flat")
With those two procedures, we should be able to generate things that appear to be similar sentences. Let’s see.
That’s probably enough. We’ve now generated the phrase “my hat asked the hat”. While it’s not Shakespeare, it is potentially promising.
c. Document and write a procedure, (random-sentence words)
that
sentence-ends
],random-elt
],left-neighbors
],random-elt
],left-neighbors
],random-elt
],After selecting six words, you should then combine them together into
a single sentence, using string-append
.
d. It may be worth comparing this “backwards” approach to a more forwards
approach. To get ready, document and write a procedure (right-neighbors
word str)
that finds all the words that immediately follow word
in
str
. (We’re not going to have you do the rest of that experiment,
but you might find right-neighbors
useful elsewhere in this assigment.)
Topics: text analysis, text generation, creativity
You’ve explored a variety of issues in analyzing and generating text. It’s now time to explore creative ways to use what you have learned.
Poets.org provides details on a wide variety of poetic forms, such as limericks.
Pick a non-trivial poetic form and write a program to generate (or approximate) poetry of that form.
For this assignment, you should document your procedures using the 6P documentation style. For procedures that randomly generate outputs, you should specify as much as possible about the output and then add something like “the output is difficult to predict”.
We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it acheives its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly). In a few cases, we will also consider the creativity of your result.