Disclaimer: In this assignment, you will read and generate some utterly horrible “poetry” and other text. Please accept my apologies if any of the work included herein offends your expectations, cultural or otherwise.
Note: In doing this mini-project, you will likely want to refer back to the lab on random language generation.
Please name your file language.rkt.
A syllabic lexicon, or syllax for short, is a collection of
words or short phrases arranged by syllables. For the purposes of
this class, a syllax is a vector of vectors, where the vector at
index i contains only the words with i syllables.
For example, here is a syllax for some words related to CSC-151.
(define csc151-syllax
(vector
; 0
(vector)
; 1
(vector "cons" "car" "list" "pair" "Scheme" "sort" "match" "string"
"lab" "map" "fold" "test")
; 2
(vector "vector" "cadr" "cdr" "Racket" "jelly" "sandwich" "syllax"
"image" "recurse" "eboard" "data" "compose" "lambda" "section"
"SoLA" "MP")
; 3
(vector "recursion" "computer" "digital" "confusing" "programming"
"CSC" "abstraction" "decompose" "document" "abstraction"
"boolean" "binary")
; 4
(vector "humanities" "exponential" "collaborate" "one-fifty-one"
"algorithm" "DrRacket" "dictionary" "generalize"
"tail recursion")
; 5
(vector "collaborative" "experiential" "decomposition" "generality")
; 6
(vector)
; 7
(vector "triskaidekaphobia")))
Here’s another syllax for some words related to Grinnell college.
(define grinnell-syllax
(vector
; 0
(vector)
; 1
(vector "Mears" "Noyce" "husk" "train" "corn" "black")
; 2
(vector "self-gov" "Stonewall" "The Bear" "first-year" "scarlet"
"remote" "Webex" "prairie" "need-blind" "soybeans" "Hopkins"
"Younker" "Dibble" "cluster" "scurry")
; 3
(vector "liberal" "JRC" "CLS" "advisor" "FYE" "laurel leaf" "Honor G"
"ARH" "North Campus" "Iowa" "semester" "Women's quad"
"Grinnellian")
; 4
(vector "curriculum" "Mary B. James" "Tutorial" "Green alien" "convocation" "education")
; 5
(vector)
; 6
(vector "Congregationalist")))
In doing this assignment, you may find the following procedures of use.
You need not cite them; since you’re doing this assignment, we’ll assume
that you’re taking ideas from the assignment. You may have to update
your csc151 library to ensure that you have vector-andmap, which
we just added recently.
;;; (random-vector-element vec) -> any?
;;; vec : vector? (nonempty)
;;; Randomly select an element of `vec`
(define random-vector-element
(lambda (vec)
(vector-ref vec (random (vector-length vec)))))
;;; (strvec? val) -> boolean?
;;; val : any?
;;; Determines if val is a vector of strings?
(define strvec?
(lambda (val)
(and (vector? val)
(vector-andmap string? val))))
Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.
a. Our principles of decomposition suggest that we should begin by writing a procedure or procedures to generate lines for the Haiku. Our principles of generality suggest that those procedures should accommodate not just five or seven syllables, but an arbitrary number of syllables.
Document, design, and implement a recursive procedure, (phrase
n syllax) that randomly generates a string that contains a phrase of
n syllables given a syllax of the form above.
Make sure that you accommodate different “patterns” of n
syllables. For example, if n is 4, you should randomly choose
between
Note that there are some complications involved. For example, if
n is bigger than the maximum number of syllables in the syllax,
it cannot consist of one word.
Note: (random start finish) generates a random integer between
start (inclusive) and finish (exclusive).
Note: You may assume that every syllax has at least one one-syllable word. (Without this guarantee, it can be much harder to simplify the problem.)
Hint: You will want to generate a random number between 1 (there are
no zero-syllable words) and either n or the most number of syllables
in the syllax, whichever is smaller.
Hint: You will want to employ recursion.
b. It is difficult to write tests for random procedures, so you will likely have to conduct experiments instead. Please include a record of your experiments and your analyses of the results of those experiments. (E.g., you should check that the number of syllables is correct.)
> (phrase 7 csc151-syllax)
"CSC binary Scheme" ; 3 + 3 + 1 = 7
> (phrase 7 csc151-syllax)
"recursion tail recursion" ; 3 + 1 + 3 = 7 (are we only getting 3/1?)
> (phrase 7 csc151-syllax)
"match recurse one-fifty-one" ; 1 + 2 + 4 = 7
> (phrase 7 csc151-syllax)
"triskaidekaphobia" ; 7 (it's god to know we can get all seven)
> (phrase 7 csc151-syllax)
"pair collaborative Scheme" ; 1 + 5 + 1 = 7
> (phrase 7 csc151-syllax)
"confusing decompose pair" ; 3 + 3 + 1 = 7
c. Document and write a procedure, (haiku syllax), that takes as
input a syllax of the form above and returns a string of the
appropriate form, with each line terminated by the strange value
"\n", which represents a carriage return.
> (haiku csc151-syllax)
"boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n"
> (haiku csc151-syllax)
"boolean data\nbinary recursion lab\nDrRacket pair\n"
> (haiku csc151-syllax)
"experiential\ntriskaidekaphobia\nabstraction MP\n"
> (haiku csc151-syllax)
"list MP image\nsyllax test jelly SoLA\nvector match lambda\n"
> (display "boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n")
boolean Scheme list
triskaidekaphobia
cdr fold sandwich
> (display (haiku grinnell-syllax))
Mary B. James husk
education corn The Bear
remote Mears husk Noyce
> (display (string-append (haiku grinnell-syllax) "\n" (haiku grinnell-syllax)))
corn education
laurel leaf Webex scarlet
train advisor husk
JRC Noyce Noyce
Younker Tutorial train
corn Mary B. James
> (display (haiku csc151-syllax))
recurse binary
generalize binary
experiential
As you have observed, these are not particularly good examples of Haiku. But you might find that some generate something interesting enough that you’d want to adapt them. For example, I love the “cdr fold sandwich”; perhaps you do, too. And that last Haiku isn’t bad.
d. Create an additional syllax of your choice to use in addition to the syllaxen above. Then provide a few sample Haiku generated by your procedure and your syllax.
In generating some kinds of text it can be useful to have a large corpus of words. And, in many cases, we achieve “interesting” results by using the words of others. Let’s consider how we might make a list of all the different words that appear in a book.
a. Document, write tests for, and implement a procedure,
(extract-words str), that extracts all of the words from a
string. Make sure that your words can include both uppercase
and lowercase letters and that they can include hyphens and
apostrophes.
If you’ve already written or seen such a procedure, you can feel free to grab it from elsewhere, but make sure to include citations.
b. Document, write tests for, and implement a tail-recursive
procedure, (dedup lst), that takes a list and removes all the
duplicates from the list. You may not use any existing remove
procedures provided by DrRacket.
Your procedure should return the elements in the same order that
they appear in lst. Since you’re using tail recursion, you’ll
probably need to reverse so-far at the end of the recursion.
In generating some kinds of text, such as those in a previous problem, it is useful to have a large corpus of words in different categories. One set of categories are words with a certain number of syllables.
a. Document and write a procedure, (syllables word), that attempts
to determine how many syllables are in the string word. You can assume
that word consists of only lowercase letters.
How do you decide how many syllables are in a word? You’ll need to find an appropriate algorithm. (Many of you are creative enough to do so on your own, but if you’re struggling, the Internet is an awesome place.)
b. Include some non-trivial examples of when your procedure works well and some of when it fails to work correctly.
c. Make a copy of The Project Gutenberg version of Jane Eyre, available at http://www.gutenberg.org/files/1260/1260-0.txt. Please name it 1260-0.txt and place it in the same directory as your Racket program.
d. Using syllables, filter, and any other procedures you deem
appropriate, generate a syllax for Jane Eyre that consists of the
one-syllable, two-syllable, three-syllable, four-syllable,
five-syllable, six-syllable, and seven-syllable words in Jane
Eyre.
e. Use those lists to generate some Haiku. Include examples.
What makes a poem? While there is no requirement that poetry rhyme, many people associate rhyme with poetry. It is also certainly the case that many forms of poetry, such as a quatrain make use of rhyme.
As we think about generating or analyzing text, it may be useful to to be able to identify rhymes. Of course, we appear to be working in the wonderfully inconsistent language known as English, so precise definition of rhymes are difficult.
a. One possible metric for rhyming is the end of the word. Document
and implement a procedure, (might-rhyme? word1
word2), that takes two strings that represent words (e.g., all
lowercase letters plus potential apostrophes) and returns true if
the two words share the last three characters.
Note: Your procedure should return false if either or both of the words has fewer than three characters.
b. Identify at least six pairs of words that do not rhyme, but pass
that test. You might, for example, pick some random words (or the
words from _Jane Eyre__ and then use filter to look through a
larger list of words to see which seem to rhyme. Include them as
a comment in your submission.
c. Identify at least six pairs of words that do rhyme, but do not pass that test. Include them as a comment in your submission.
d. Describe something you could do to address each of these two cases. That is, how could you avoid the non-rhymes and how could you incorporate the rhymes? Include your analysis as a comment in your submission.
e. Using your additional analysis, write a better (rhymes? word1 word2)
procedure. You are free to make this as simple or as complicated as
you like, provided you incorporate your ideas from d.
f. Using rhymes?, write a procedure, (rhymes-with word words),
that takes a string and a list of strings as input and finds all
of the words in words that appear to rhyme with word. (You
should, of course, document rhymes-with.)
g. Write a procedure (abab words) that takes as input a corpus
(list) of words and returns a string that represents a “random”
quatrain of four lines of four words each. The last words of the first
and third lines must rhyme, as must the last words of the second
and fourth lines.
h. Include a few examples of abab in your submission.
As you’ve likely realized, generating actual language is hard, and writing programs that “interpret” language is often even harder. One of the legendary challenges of language generation has to do with the differences between two very similar statements.
Time flies like an arrow.
Fruit flies like an apple.
Can you tell why that pair is complex? If not, ask your faculty member or mentor.
In looking for ways to generate somewhat realistic text, one approach that has shown some promise relies on a relatively straightforward analysis of an existing text.
This approach sometimes works surprising well and sometimes works relatively poorly. We can often improve it by working with pairs or triplets of words. But for now, we’ll stick with single words.
a. Document, write tests for, and implement a procedure, (sentence-ends
str), that identifies all (or most of) the words that end the sentences
in str. The words that end sentences typically come before periods,
question marks, and exclamation points.
> (sentence-ends "The cat ate the hat. The rat sat.")
'("hat" "sat")
> (sentence-ends "Do you like blue mac and cheese? No I don't, it makes me sneeze!")
'("cheese" "sneeze")
> (sentence-ends ""The cat sat on the hat. \"Where is my hat?\" asked the rat. It's now a flat hat. How 'bout that? Will the fat rat jump on that brat cat who sat?")
'("hat" "rat" "that" "sat")
b. Document, write tests for, and implement a procedure, (sentence-starts
str), that identifies all (or most of) the words that start sentences
in str and returns them as a list (without duplicates). Typically,
the start of a sentence is given by (a) the start of the string and
(b) any word that comes after end punctuation (space, question mark,
exclmation point) and a sequence of whitespace (space, newline,
tab).
> (sentence-starts "The cat ate the hat. The rat sat.")
'("The")
> (sentence-starts "Do you like blue mac and cheese? No I don't, it makes me sneeze!")
'("Do" "No")
> (sentence-starts "The cat sat on the hat. \"Where is my hat?\" asked the rat. It's now a flat hat. How 'bout that? Will the fat rat jump on that brat cat who sat?")
'("The" "Where" "asked" "It's" "How" "Will") ; or maybe not "asked"
c. Document, write tests for, and implement a procedure, (right-neighbors
word words), that finds all of the words that immediately follow
word in words, which is a list of strings. For example,
> (define cat-thing (extract-words "The cat sat on the hat. \"Where is my hat?\" asked the rat. It's now a flat hat. How 'bout that? Will the fat rat jump on that brat cat who sat?"))
> (right-neighbors "hat" cat-thing)
'("Where" "asked" "how")
> (right-neighbors "The" cat-thing)
'("cat")
> (right-neighbors "the" cat-thing)
'("hat" "rat" "fat")
> (right-neighbors "will" cat-thing)
'("the")
> (right-neighbors "computer" cat-thing)
'()
d. With those three procedures, we should be able to generate things that appear to be similar to sentences. Let’s see.
Is that promising? I’m not sure. We definitely need a larger corpus. But we’re going to move ahead anyway.
Document and write a procedure, (random-sentence all-words start-words
end-words) that takes the lists or vectors we generate from the procedures
you’ve written already, and uses them to build a sentence according to
an algorithm something like the following.
1. Start with one of the start words.
2. Call that the current word.
3. Repeatedly apply the following process.
a. If the current word is one of the last words, stop with
some probability (say 50%).
b. Generate the list of following words.
c. If that list or vector is empty, stop.
d. Pick randomly from the list of following words.
e. Add that to the sentence and call it the current word.
There are two options for your freestyle.
a. You’ve explored a variety of issues in analyzing and generating text. It’s now time to explore creative ways to use what you have learned. Choose any combination of techniques you’ve learned (random sentence generation, random phrase generation with words by syllable, rhyming, Mad libs, etc.) and document and write a procedure that uses them to generate a kind of text of your choice.
Make sure to include some “successful” examples of your procedure.
b. Alternately, document and write a procedure that determines (or, more precisely, attempts to determine) how closely a piece of writing matches one of the central poetic forms. You’ll need to examine syllables per line, rhyming scheme, and such.
Make sure to include some examples.
Don’t worry if you can’t find the word “syllax” anywhere, except in strange games. We invented it for this assignment. But the plural of syllax is definitely “syllaxen”.
We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it acheives its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly). In a few cases, we will also consider the creativity of your result.
This rubric is based on an old version of the assignment. The new rubric should appear in the somewhat near future.
These are some of the issues we will be looking at. If you do not address the one-star items, you will not achieve R. If you do not address the two-star items, you will not achieve M. If you do not address the three-star items, you will not achieve E.
General
-------
[ ] File named correctly (`language.rkt`). *
[ ] File "runs" correctly in DrRacket. *
[ ] Reformatted code with Ctrl-I before submitting. *
[ ] If the program references other files, those files were submitted. *
[ ] If the program references other files, it does so with the base file
name, rather than a complex path. *
[ ] Introductory comment with name, date, assignment, course, citations. **
[ ] Correctly formatted/indented code. **
[ ] All procedures are documented and the documentation is (mostly)
correct. **
[ ] Procedures that are supposed to return strings, like `haiku`,
return strings rather than using `display`. **
[ ] Avoids expensive repeated work. ***
[ ] Clear variable names. ***
Part one
--------
[ ] Includes examples. **
[ ] Includes the "\n" at the end of the third line. **
[ ] Works correctly for different numbers of words in the subvectors in
`words-by-syllable`. **
[ ] Works correctly for different numbers of subvectors in
`words-by-syllable`, provided the subvector at index `i` contains
only words of `i` syllables. (E.g., if we add six-syllable words,
it will, on occasion, include those.) ***
Part two
--------
[ ] Includes tests for the new `remove-neighboring-duplicates` (at least
two normal cases and at least three edge cases) **
[ ] Includes tests for `unique-words` (at least two normal cases and at least
three edge cases) **
Part three
----------
[ ] The `syllables` procedure counts vowel sequences, not just vowels. **
[ ] The `syllables` procedure has at least two extensions to "count
vowel sequences". ***
Part four
---------
[ ] At least ten non-rhyming pairs whose last three letters match. *
[ ] At least ten rhyming pairs whose last three letters do not match. *
[ ] Accommodates words of fewer than three letters. **
[ ] `rhymes?` procedure is documented. **
[ ] Includes examples of `abab` **
[ ] New `rhymes?` procedures addresses some of the non-rhyming pairs. ***
[ ] New `rhymes?` procedures addresses some of the rhyming pairs. ***
[ ] New `rhymes?` procedure explains the things it is checking (e.g.,
with examples) ***
[ ] `abab` procedure ensures that words have rhymes. ***
[ ] `abab` procedure avoids significant repetition. ***
Part five
---------
[ ] Includes appropriate tests of `sentence-ends` (at least two normal
and at least three edge). **
[ ] Handles the situation in which a word lacks a predecessor. **
Part six
--------
[ ] Includes examples of procedure's results. **