Disclaimer: In this assignment, you will read and generate some utterly horrible “poetry” and other text. Please accept my apologies if any of the work included herein offends your expectations, cultural or otherwise.
Note: In doing this mini-project, you will likely want to refer back to the lab on random language generation.
Please name your file language.rkt.
A syllabic lexicon, or syllax for short, is a collection of
words or short phrases arranged by syllables. For the purposes of
this class, a syllax is a list of lists, where the listat
index i contains only the words with i syllables.
For example, here is a syllax for some words related to CSC-151.
(define csc151-syllax
(list
; 0
(list)
; 1
(list "cons" "car" "list" "pair" "Scheme" "sort" "match" "string"
"lab" "map" "fold" "test")
; 2
(list "cadr" "cdr" "Racket" "jelly" "sandwich" "syllax"
"image" "recurse" "eboard" "data" "compose" "lambda" "section"
"SoLA" "MP")
; 3
(list "recursion" "computer" "digital" "confusing" "programming"
"CSC" "abstraction" "decompose" "document" "abstraction"
"boolean" "binary")
; 4
(list "humanities" "exponential" "collaborate" "one-fifty-one"
"algorithm" "DrRacket" "dictionary" "generalize" "defenestrate")
; 5
(list "collaborative" "experiential" "decomposition" "generality"
"defenestration")
; 6
(list)
; 7
(list "triskaidekaphobia")))
Here’s another syllax for some words related to Grinnell college.
(define grinnell-syllax
(list
; 0
(list)
; 1
(list "Mears" "Noyce" "husk" "train" "corn" "black" "HSSC")
; 2
(list "self-gov" "Stonewall" "The Bear" "first-year" "scarlet"
"remote" "Webex" "prairie" "need-blind" "soybeans" "Hopkins"
"Younker" "Dibble" "cluster" "scurry" "Gremlin")
; 3
(list "liberal" "JRC" "CLS" "advisor" "FYE" "laurel leaf" "Honor G"
"ARH" "North Campus" "Iowa" "semester" "Women's quad"
"Grinnellian" "pie-on-ears")
; 4
(list "curriculum" "Mary B. James" "Tutorial" "Green alien"
"convocation" "education")
; 5
(list)
; 6
(list "Congregationalist")))
In doing this assignment, you may find the following procedures of use. You need not cite them; since you’re doing this assignment, we’ll assume that you’re taking ideas from the assignment.
;;; (random-list-element lst) -> any?
;;; lst : list? (nonempty)
;;; Randomly select an element of `lst`
(define random-list-element
(lambda (lst)
(list-ref lst (random (length lst)))))
;;; (list-of-strings? val) -> boolean?
;;; val : any?
;;; Determines if val is a list of strings.
(define list-of-strings?
(lambda (val)
(and (list? val)
(andmap string? val))))
Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.
a. Our principles of decomposition suggest that we should begin by writing a procedure or procedures to generate lines for the Haiku. Our principles of generality suggest that those procedures should accommodate not just five or seven syllables, but an arbitrary number of syllables.
Document, design, and implement a recursive procedure, (phrase
n syllax) that randomly generates a string that contains a phrase of
n syllables given a syllax of the form above.
Make sure that you accommodate different “patterns” of n
syllables. For example, if n is 4, you should randomly choose
between
Note that there are some complications involved. For example, if
n is bigger than the maximum number of syllables in the syllax,
it cannot consist of one word.
Note: (random start finish) generates a random integer between
start (inclusive) and finish (exclusive).
Note: You may assume that every syllax has at least one one-syllable word. (Without this guarantee, it can be much harder to simplify the problem.)
Hint: You will want to generate a random number between 1 (there are
no zero-syllable words) and either n or the most number of syllables
in the syllax, whichever is smaller.
Hint: You will want to employ recursion.
b. It is difficult to write tests for random procedures, so you will likely have to conduct experiments instead. Please include a record of your experiments and your analyses of the results of those experiments. (E.g., you should check that the number of syllables is correct.)
> (phrase 7 csc151-syllax)
"CSC binary Scheme" ; 3 + 3 + 1 = 7
> (phrase 7 csc151-syllax)
"recursion tail recursion" ; 3 + 1 + 3 = 7 (are we only getting 3/1?)
> (phrase 7 csc151-syllax)
"match recurse one-fifty-one" ; 1 + 2 + 4 = 7
> (phrase 7 csc151-syllax)
"triskaidekaphobia" ; 7 (it's good to know we can get all seven)
> (phrase 7 csc151-syllax)
"pair collaborative Scheme" ; 1 + 5 + 1 = 7
> (phrase 7 csc151-syllax)
"confusing decompose pair" ; 3 + 3 + 1 = 7
c. Document and write a procedure, (haiku syllax), that takes as
input a syllax of the form above and returns a string of the
appropriate form, with each line terminated by the strange value
"\n", which represents a carriage return.
> (haiku csc151-syllax)
"boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n"
> (haiku csc151-syllax)
"boolean data\nbinary recursion lab\nDrRacket pair\n"
> (haiku csc151-syllax)
"experiential\ntriskaidekaphobia\nabstraction MP\n"
> (haiku csc151-syllax)
"list MP image\nsyllax test jelly SoLA\nvector match lambda\n"
> (display "boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n")
boolean Scheme list
triskaidekaphobia
cdr fold sandwich
> (display (haiku grinnell-syllax))
Mary B. James husk
education corn The Bear
remote Mears husk Noyce
> (display (string-append (haiku grinnell-syllax) "\n" (haiku grinnell-syllax)))
corn education
laurel leaf Webex scarlet
train advisor husk
JRC Noyce Noyce
Younker Tutorial train
corn Mary B. James
> (display (haiku csc151-syllax))
recurse binary
generalize binary
experiential
As you have observed, these are not particularly good examples of Haiku. But you might find that some generate something interesting enough that you’d want to adapt them. For example, I love the “cdr fold sandwich”; perhaps you do, too. And that last Haiku isn’t bad.
d. Create an additional syllax of your choice to use in addition to the syllaxen above. Then provide a few sample Haiku generated by your procedure and your syllax.
In generating some kinds of text it can be useful to have a large corpus of words. And, in many cases, we achieve “interesting” results by using the words of others. Let’s consider how we might make a list of all the different words that appear in a book.
a. Document, write tests for, and implement a procedure,
(extract-words str), that extracts all of the words from a
string. Make sure that your words can include both uppercase
and lowercase letters and that they can include hyphens and
apostrophes.
If you’ve already written or seen such a procedure, you can feel free to grab it from elsewhere, but make sure to include citations.
b. Document, write tests for, and implement a recursive procedure,
(list-contains? lst val), that determines whether a list contains
a particular value. You may not use filter or index or anything
similar.
c. Document, write tests for, and implement a recursive procedure,
(dedup lst), that takes a list and removes all the duplicates
from the list. You may not use any existing remove procedures
provided by DrRacket.
Your procedure should return the elements in the same order that
they appear in lst.
As you may recall, Mad Libs are a type of game in which we work with a piece of text with blanks in which each blank is marked with a particular part of speech. You play Mad Libs by gathering those parts of speech, filling in the blanks, and reading the text aloud. One way computers help us with this game is that they can fill in the blanks randomly, as we saw in a recent lab.
But we can also do a bit more. Instead of using predefined Mad Libs, we can start with “normal” text, identify parts of speech in that text, grab an equivalent part of speech from a list, and replace them in the text.
For example, suppose we have the following text:
Look Frankie! See Jessie play! See Jessie play with Spot. Spot is a wonderful dog. See Jessie throw a ball with Spot. Frankie wants to play, too. Will Jessie play with Frankie?
First, we might select a new set of characters from amongst a larger set, and substitute in those new characters.
Look Prof. Smith! See Scholar Smythe play! See Scholar Smythe play with President Smitty. President Smitty is a wonderful dog. See Scholar Smythe throw a ball with President Smitty. Prof. Smith wants to play, too. Will Scholar Smythe play with Prof. Smith?
Next, we might replace the nouns with new nouns.
Look Prof. Smith! See Scholar Smythe play! See Scholar Smythe play with President Smitty. President Smitty is a wonderful window. See Scholar Smythe throw a table with President Smitty. Prof. Smith wants to play, too. Will Scholar Smythe play with Prof. Smith?
We might then replace some of the first-person, singular, intransitive verbs with other verbs in the same form. Our corpus is currently relatively limited: play, dance, compute, sleep, cough, and program.
Look Prof. Smith! See Scholar Smythe sleep! See Scholar Smythe dance with President Smitty. President Smitty is a wonderful window. See Scholar Smythe throw a table with President Smitty. Prof. Smith wants to program, too. Will Scholar Smythe cough with Prof. Smith?
Wow! Things are going downhill fast. But the potential for something interesting is there. I think.
a. Document and write a procedure, (replace-characters original-characters
new-characters words), that takes three lists of strings as
parameters and, for each string in the first list (original-characters)
chooses one random element of the second list (new-characters) and
then replaces each original character in the last list with the new
character. As you might guess, this represents the first step in
the example above.
b. Document and write a procedure, (replace-words category words),
that takes two lists of strings as parameters and, each time one
of the words in the first list appears in the second list, replaces
the word with a randomly selected element from the first list. (The
replacement might be the same word.) As you might guess, this
represents the steps of replacing nouns and third-person, singular
present-tense, intransitive verbs.
c. Identify a piece of public domain text of about one page.
(Alternately, write our own.) Using that text, as well as a list
of characters in the text, a list of new character names, and at
least three lists of parts of speech you choose, write a procedure,
(maddish-libs) that generates a random variant of the original
text. Your code will look something like the following.
(define maddish-libs
(let ([text (extract-words (file->string "my-file.txt"))]
[original-characters '("Frankie" "Jessie" "Spot")]
[new-characters '("Sam" "Prof. J" "Scholar Smythe"
"Professor Smith" "President Smitty"
"Flornnng The Spotted Dinosaur")]
[singular-nouns '("dog" "window" "cat" "ball" "table" "computer")]
[third-person-singular-present-tense-intransitive-verbs
'("play" "fly" "sleep" "cough" "compute" "program" "design")]
[third-person-singular-present-tense-transitive-verbs
'("throw" "defenestrate" "eat")])
(lambda ()
(let* ([new-text-1 (replace-characters original-characters
new-characters
text)]
[new-text-2 (replace-words singular-nouns new-text-1)]
[new-text-3 (replace-words third-person-singular-present-tense-intransitive-verbs new-text-2)]
[new-text-4 (replace-words third-person-singular-present-tense-transitive-verbs new-text-3)])
(reduce (lambda (s1 s2) (string-append s1 " " s2))
new-text-4)))))
Make sure to include at least two sample runs in your submission,
commented out with #| and |#.
In generating some kinds of text, such as those in a previous problem, it is useful to have a large corpus of words in different categories. One set of categories are words with a certain number of syllables.
a. Document and write a procedure, (syllables word), that attempts
to determine how many syllables are in the string word. You can assume
that word consists of only lowercase letters.
How do you decide how many syllables are in a word? You’ll need to find an appropriate algorithm. (Many of you are creative enough to do so on your own, but if you’re struggling, the Internet is an awesome place.)
b. Include some non-trivial examples of when your procedure works well and some of when it fails to work correctly.
c. Make a copy of The Project Gutenberg version of Jane Eyre, available at http://www.gutenberg.org/files/1260/1260-0.txt. Please name it 1260-0.txt and place it in the same directory as your Racket program.
d. Using syllables, filter, and any other procedures you deem
appropriate, generate a syllax that consists of the one-syllable,
two-syllable, three-syllable, four-syllable, five-syllable,
six-syllable, and seven-syllable words in the first thousand or so
lines of Jane Eyre.
e. Use those lists to generate some Haiku. Include examples.
What makes a poem? While there is no requirement that poetry rhyme, many people associate rhyme with poetry. It is also certainly the case that many forms of poetry, such as a quatrain make use of rhyme.
As we think about generating or analyzing text, it may be useful to to be able to identify rhymes. Of course, we appear to be working in the wonderfully inconsistent language known as English, so precise definition of rhymes are difficult.
a. One possible metric for rhyming is the end of the word. Document
and implement a procedure, (might-rhyme? word1
word2), that takes two strings that represent words (e.g., all
lowercase letters plus potential apostrophes) and returns true if
the two words share the last three characters.
Note: Your procedure should return false if either or both of the words has fewer than three characters.
b. Identify at least six pairs of words that do not rhyme, but pass
that test. You might, for example, pick some random words (or the
words from Jane Eyre) and then use filter to look through a
larger list of words to see which seem to rhyme. Include them as
a comment in your submission.
c. Identify at least six pairs of words that do rhyme, but do not pass that test. Include them as a comment in your submission.
d. Describe something you could do to address each of these two cases. That is, how could you avoid the non-rhymes and how could you incorporate the rhymes? Include your analysis as a comment in your submission.
e. Using your additional analysis, write a better (rhymes? word1 word2)
procedure. You are free to make this as simple or as complicated as
you like, provided you incorporate your ideas from d.
f. Using rhymes?, write a procedure, (rhymes-with word words),
that takes a string and a list of strings as input and finds all
of the words in words that appear to rhyme with word. (You
should, of course, document rhymes-with.)
g. Write a procedure (abab words) that takes as input a corpus
(list) of words and returns a string that represents a “random”
quatrain of four lines of four words each. The last words of the first
and third lines must rhyme, as must the last words of the second
and fourth lines.
h. Include a few examples of abab in your submission.
Don’t worry if you can’t find the word “syllax” anywhere, except in strange games. We invented it for this assignment. But the plural of syllax is definitely “syllaxen”.
We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it acheives its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly). In a few cases, we will also consider the creativity of your result.
[ ] File named correctly (`language.rkt`).
[ ] File "runs" correctly in DrRacket.
[ ] Reformatted code with Ctrl-I before submitting.
[ ] If the program references other files, those files were submitted.
[ ] If the program references other files, it does so with the base file name, rather than a complex path.
[ ] Introductory header comment with name, date, assignment, course, citations.
[ ] 1a. Includes examples.
[ ] 1c. `haiku` procedure is documented.
[ ] 2a. `extract-words` procedure is documented.
[ ] 2b. `dedup` procedure is documented.
[ ] 3a. `replace-characters` procedure is documented.
[ ] 3a. `replace-characters` procedure works for some nontrivial input.
[ ] 3b. `replace-words` procedure is documented.
[ ] 3b. `replace-words` procedure works for some nontrivial input.
[ ] 3c. `maddish-libs` procedure is documented.
[ ] 3c. `maddish-libs` procedure works for some nontrivial input.
[ ] Correctly formatted/indented code.
[ ] Procedures that are supposed to return strings, like `haiku`, return strings rather than using `display`.
[ ] All procedures, including helpers, are documented.
[ ] 1a. `phrase` procedure passes all tests.
[ ] 1b. Includes a record of experiments, including counting of words.
[ ] 1c. `haiku` includes the "\n" at the end of the third line.
[ ] 1d. Includes another syllax.
[ ] 1d. Includes a few examples.
[ ] 2a. `extract-words` appears to meet the basic requirements.
[ ] 2a. Includes at least two tests for `extract-words`
[ ] 2b. `list-contains` appears to meet the specifications.
[ ] 2b. Includes at least two tests for `list-contains`
[ ] 2c. `dedup` meets the specs.
[ ] 2c. Includes at least two tests for `dedup`
[ ] 3a. `replace-characters` meets the specifications.
[ ] 3b. `replace-words` meets the specifications.
[ ] 3c. `maddish-libs` meets the specifications.
[ ] 3c. Includes at least two examples of `maddish-libs`.
[ ] 4a. The `syllables` procedure counts vowel sequences, not just vowels.
[ ] 4b. Includes examples of `syllables` with correct counts.
[ ] 4d. Generates a syllax for _Jane Eyre_ of the requested form.
[ ] 4e. Includes at least two Haiku examples from the _Jane Eyre_ syllax.
[ ] 5a. The `might-rhyme?` procedure accommodates words of fewer than three letters.
[ ] 5d. The narrative provides some suggestion of how to deal with the rhyme problems.
[ ] 5e. `rhymes?` does something different than `might-rhyme?`.
[ ] 5h. Includes examples of `abab`
[ ] Avoids expensive repeated work.
[ ] Clear variable names.
[ ] 2a. `extract-words` goes beyond the basic requirements in at least one way.
[ ] 2a. Includes at least three edge-case tests for `extract-words` and at least five total tests.
[ ] 2b. Includes at least three edge-case tests for `list-contains` and at least five total tests.
[ ] 2c. Includes at least three edge-case tests for `dedup` and at least five total tests.
[ ] 3c. At least five different parts of speech in `maddish-libs`.
[ ] 4a. The `syllables` procedure has at least two extensions to "count vowel sequences".
[ ] 4b. Includes at least one example of syllables with an incorrect count.
[ ] 5e. The new `rhymes?` procedures addresses some of the non-rhyming pairs.
[ ] 5e. The new `rhymes?` procedures addresses some of the rhyming pairs.
[ ] 5e. The new `rhymes?` procedure explains the things it is checking (e.g., with examples)
[ ] 5g. The `abab` procedure ensures that words have rhymes.
[ ] 5g. `abab` procedure avoids significant repetition.
[ ] One of `rhymes?` or `syllables` is particularly well designed.