Mini-Project 5 : Language generation

Disclaimer: In this assignment, you will read and generate some utterly horrible “poetry” and other text. Please accept my apologies if any of the work included herein offends your expectations, cultural or otherwise.

Note: In doing this mini-project, you will likely want to refer back to the lab on random language generation.

Please name your file language-generation.rkt. (Note that the file name is a link to starter code for this mini-project.)

Background

A syllabic lexicon, or syllex for short, is a collection of words or short phrases arranged by syllables. For the purposes of this class, a syllex is a list of lists, where the list at index i contains only the words with i syllables.

For example, here is a syllex for some words related to CSC-151.

(define csc151-syllex
  (list
   ; 0
   (list)
   ; 1
   (list "cons" "car" "list" "pair" "Scheme" "sort" "match" "string"
         "lab" "map" "fold" "test")
   ; 2
   (list "cadr" "cdr" "Racket" "jelly" "sandwich" "syllex"
         "image" "recurse" "eboard" "data" "compose" "lambda" "section"
         "SoLA" "MP")
   ; 3
   (list "recursion" "computer" "digital" "confusing" "programming"
         "CSC" "abstraction" "decompose" "document" "abstraction"
         "boolean" "binary")
   ; 4
   (list "humanities" "exponential" "collaborate" "one-fifty-one"
         "algorithm" "DrRacket" "dictionary" "generalize" "defenestrate")
   ; 5
   (list "collaborative" "experiential" "decomposition" "generality"
         "defenestration")
   ; 6
   (list)
   ; 7
   (list "triskaidekaphobia")))

Here’s another syllex for some words related to Grinnell college.

(define grinnell-syllex
  (list
   ; 0
   (list)
   ; 1
   (list "Mears" "Noyce" "husk" "train" "corn" "black" "HSSC")
   ; 2
   (list "self-gov" "Stonewall" "The Bear" "first-year" "scarlet"
         "remote" "Webex" "prairie" "need-blind" "soybeans" "Hopkins"
         "Younker" "Dibble" "cluster" "scurry" "Gremlin")
   ; 3
   (list "liberal" "JRC" "CLS" "advisor" "FYE" "laurel leaf" "Honor G"
         "ARH" "North Campus" "Iowa" "semester" "Women's quad"
         "Grinnellian" "pie-on-ears")
   ; 4
   (list "curriculum" "Mary B. James" "Tutorial" "Green alien"
         "convocation" "education")
   ; 5
   (list "exploratory")
   ; 6
   (list "Congregationalist")))

Helpful procedures

In doing this assignment, you may find the following procedures of use. You need not cite them; since you’re doing this assignment, we’ll assume that you’re taking ideas from the assignment.

;;; (random-list-element lst) -> any?
;;;   lst : list? (nonempty)
;;; Randomly select an element of `lst`
(define random-list-element
  (lambda (lst)
    (list-ref lst (random (length lst)))))

;;; (list-of-strings? val) -> boolean?
;;;   val : any?
;;; Determines if val is a list of strings.
(define list-of-strings?
  (lambda (val)
    (and (list? val)
         (andmap string? val))))

Part the first: Generating Haiku

Haiku are three-line poems that consist of a line with five syllables, a line with seven syllables, and a line with five syllables.

a. Our principles of decomposition suggest that we should begin by writing a procedure or procedures to generate lines for the Haiku. Our principles of generality suggest that those procedures should accommodate not just five or seven syllables, but an arbitrary number of syllables.

Document, design, and implement a recursive procedure, (phrase n syllex) that randomly generates a string that contains a phrase of n syllables given a syllex of the form above.

Make sure that you accommodate different “patterns” of n syllables. For example, if n is 4, you should randomly choose between

  • a one-syllable word followed by a three-syllable phrase;
  • a two-syllable word followed by a two-syllable phrase;
  • a three-syllable word followed by a one-syllable phrase; and a one-syllable word)
  • a four-syllable word.

Note that there are some complications involved. For example, if n is bigger than the maximum number of syllables in the syllex, it cannot consist of one word.

Note: (random start finish) generates a random integer between start (inclusive) and finish (exclusive).

Note: You may assume that every syllex has at least one one-syllable word. (Without this guarantee, it can be much harder to simplify the problem.)

Note: The csc151 syllex has no six-syllable words. You must address this issue to get an E. For an M, you can assume that all the lists in the syllexen (except the zero-syllable entry) contain words.

Hint: You will want to generate a random number between 1 (there are no zero-syllable words) and either n or the most number of syllables in the syllex, whichever is smaller.

Hint: You will want to employ recursion.

b. It is difficult to write tests for random procedures, so you will likely have to conduct experiments instead. Please include a record of your experiments and your analyses of the results of those experiments. (E.g., you should check that the number of syllables is correct.)

> (phrase 7 csc151-syllex)
"CSC binary Scheme"             ; 3 + 3 + 1 = 7
> (phrase 7 csc151-syllex)
"recursion tail recursion"      ; 3 + 1 + 3 = 7 (are we only getting 3/1?)
> (phrase 7 csc151-syllex)
"match recurse one-fifty-one"   ; 1 + 2 + 4 = 7
> (phrase 7 csc151-syllex)
"triskaidekaphobia"             ; 7 (it's good to know we can get all seven)
> (phrase 7 csc151-syllex)
"pair collaborative Scheme"     ; 1 + 5 + 1 = 7
> (phrase 7 csc151-syllex)
"confusing decompose pair"      ; 3 + 3 + 1 = 7

c. Document and write a procedure, (haiku syllex), that takes as input a syllex of the form above and returns a string of the appropriate form, with each line terminated by the strange value "\n", which represents a carriage return.

> (haiku csc151-syllex)
"boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n"
> (haiku csc151-syllex)
"boolean data\nbinary recursion lab\nDrRacket pair\n"
> (haiku csc151-syllex)
"experiential\ntriskaidekaphobia\nabstraction MP\n"
> (haiku csc151-syllex)
"list MP image\nsyllex test jelly SoLA\nvector match lambda\n"
> (haiku csc151-syllex)
"decomposition\ncollaborative string sort\nScheme test cadr match\n"

> (display "boolean Scheme list\ntriskaidekaphobia\ncdr fold sandwich\n")
boolean Scheme list
triskaidekaphobia
cdr fold sandwich

> (display (haiku grinnell-syllex))
Mary B. James husk
education corn The Bear
remote Mears husk Noyce

> (display (string-append (haiku grinnell-syllex) "\n" (haiku grinnell-syllex)))
corn education
laurel leaf Webex scarlet
train advisor husk

JRC Noyce Noyce
Younker Tutorial train
corn Mary B. James

> (display (haiku csc151-syllex))
recurse binary
generalize binary
experiential

As you have observed, these are not particularly good examples of Haiku. But you might find that some generate something interesting enough that you’d want to adapt them. For example, I love the “cdr fold sandwich”; perhaps you do, too. And that last Haiku isn’t bad.

d. Create an additional syllex of your choice to use in addition to the syllexen above. It should be named my-syllex. Then provide a few sample Haiku generated by your procedure and your syllex.

Part the second: Randomized replacement

At times, it may be valuable to be able to “randomly” change some values in a list. In this part of the mini-project, we will build some procedures that will help with that.

a. Document, write tests for, and write a procedure, (replace-all old new lst), that takes two values and a list as parameters and replaces all instances of old by new.

> (replace-all 'a 'b '(a b c a d))
'(b b c b d)

b. Document, write tests for, and write a procedure, (randomly-replace-consistently val options lst), that takes a value and two lists as parameters, randomly picks one value from the options list, and then replaces every instance of val in lst with that random choice.

> (randomly-replace-consistently 'a '(x y z) '(a b a b a))
'(x b x b x)
> (randomly-replace-consistently 'a '(x y z) '(a b a b a))
'(y b y b y)

c. Document, write tests for, and write a procedure, (randomly-replace-variably val options lst), that takes a value and two lists as parameters as parameters, and randomly replaces each instance of val with a randomly chosen element of options. Unlike randomly-replace-consistently, which makes only one random choice, randomly-replace-variably makes a new choice each time the value appears.

> (randomly-replace-variably 'a '(x y z) '(a b a b a))
'(z b x b x)
> (randomly-replace-variably 'a '(x y z) '(a b a b a))
'(y b z b x)

Part the third: Slightly Angry Libs

As you may know, Mad Libs are a type of game in which we work with a piece of text with blanks in which each blank is marked with a particular part of speech. You play Mad Libs by gathering those parts of speech, filling in the blanks, and reading the text aloud. One way computers help us with this game is that they can fill in the blanks randomly, as we saw in a recent lab.

But we can also do a bit more. Instead of using predefined Mad Libs, we can start with “normal” text, identify parts of speech in that text, grab an equivalent part of speech from a list, and replace them in the text.

For example, suppose we have the following text:

Look Frankie! See Jessie play! See Jessie play with Spot. Spot is a wonderful dog. See Jessie throw a ball with Spot. Frankie wants to play, too. Will Jessie play with Frankie?

First, we might select a new set of characters from amongst a larger set, and substitute in those new characters. Our original characters are '("Frankie" "Jessie" "Spot"). Our larger set is '("Just Plain Smitty" "Scholar Smythe" "The Smith" "President Smitty" "Prof" Smith"). As you can tell, we’re not feeling particularly creative today. You should do better. After replacing, we get the following.

Look Prof. Smith! See Scholar Smythe play! See Scholar Smythe play with President Smitty. President Smitty is a wonderful dog. See Scholar Smythe throw a ball with President Smitty. Prof. Smith wants to play, too. Will Scholar Smythe play with Prof. Smith?

Next, we might replace the nouns with new nouns. Our current list of nouns is '("ball" "dog"). Our updated list of nouns is '("ball" "computer" "dog" "table" "window").

Look Prof. Smith! See Scholar Smythe play! See Scholar Smythe play with President Smitty. President Smitty is a wonderful window. See Scholar Smythe throw a table with President Smitty. Prof. Smith wants to play, too. Will Scholar Smythe play with Prof. Smith?

We might then replace some of the first-person, singular, intransitive verbs with other verbs in the same form. Our corpus is currently relatively limited: '("play" "dance" "compute" "sleep" "cough" "program").

Look Prof. Smith! See Scholar Smythe sleep! See Scholar Smythe dance with President Smitty. President Smitty is a wonderful window. See Scholar Smythe throw a table with President Smitty. Prof. Smith wants to program, too. Will Scholar Smythe cough with Prof. Smith?

Wow! Things are going downhill fast. But the potential for something interesting is there. Perhaps.

Now we’re ready to build our program. We’ll start with three more helper procedures.

a. Document and write a procedure, (replace-characters original-characters new-characters words), that takes three lists of strings as parameters and, for each string in the first list (original-characters) chooses one random element of the second list (new-characters) and then replaces each original character in the last list with the new character. As you might guess, this represents the first step in the example above. Note that “character” here means “person in the story”, not “letter in a string”. Isn’t English wonderfully confusing?

Make sure that each character is replaced consistently. For example, all instances of Frankie were replaced by “Prof. Smith” in the samle mad libs above.

For an E, you should make sure that each character gets a unique replacement (e.g., we would not want both Frankie and Jessie replaced by Scholar Smith). For an M, you need not ensure unique replacement.

You may assume that original-characters and new-characters are distinct. That is, no string in original-characters also appears in new-characters.

You may also assume that the length of new-characters is at least as large as the length of original-characters.

b. Document and write a procedure, (replace-words category words), that takes two lists of strings as parameters and, each time one of the words in the first list appears in the second list, replaces the word with a randomly selected element from the first list. (The replacement might be the same word.) As you might guess, this represents the steps of replacing both nouns and third-person, singular present-tense, intransitive verbs.

Hint: You will likely want to recurse over words, just as you have done in many of the procedures in part 2.

c. Identify a piece of public domain text of about two paragraphs. (Alternately, write your own.) Using that text, as well as a list of characters in the text, a list of new character names, and at least three lists of parts of speech you choose, write a procedure, (maddish-libs) that generates a random variant of the original text. Your code will look something like the following.

(define maddish-libs
  (let ([text (split-string (file->string "my-file.txt"))]
        [original-characters '("Frankie" "Jessie" "Spot")]
        [new-characters '("Sam" "Prof. J"  "Scholar Smythe"
                          "Professor Smith" "President Smitty"
                          "Flornnng The Spotted Dinosaur")]
        [singular-nouns '("dog" "window" "cat" "ball" "table" "computer")]
        [third-person-singular-present-tense-intransitive-verbs
         '("play" "fly" "sleep" "cough" "compute" "program" "design")]
        [third-person-singular-present-tense-transitive-verbs
         '("throw" "defenestrate" "eat")])
    (lambda ()
      (let* ([new-text-1 (replace-characters original-characters
                                             new-characters
                                             text)]
             [new-text-2 (replace-words singular-nouns new-text-1)]
             [new-text-3 (replace-words third-person-singular-present-tense-intransitive-verbs new-text-2)]
             [new-text-4 (replace-words third-person-singular-present-tense-transitive-verbs new-text-3)])
        (reduce string-append
                new-text-4)))))

Note: Although you may certainly use this code as the starting point, please replace the sample lists and use a different piece of starting text. (The latter change should lead to the former.)

d. Make sure to include at least two sample runs in your submission, commented out with #| and |#.

You’ll note that the sample maddish-libs requires a procedure called split-string, which is not the same as string-split. Here it is.

;;; (split-string str) -> listof string?
;;;   str : string?
;;; Split a string into a list of alternating "words" and "separators".
(define split-string
  (let* ([word-chars "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-'"]
         [word (rex-repeat (rex-char-set word-chars))]
         [separator (rex-repeat (rex-char-antiset word-chars))])
    (lambda (str)
      (rex-find-matches (rex-any-of word separator) str))))

The procedure is useful because it keeps all the parts of the string, but separates words from non-words.

> (split-string "Look Frankie!  See Jessie play!  See Jessie play with Spot.
Spot is a wonderful dog.  See Jessie throw a ball with Spot.  Frankie
wants to play, too.  Will Jessie play with Frankie?")
'("Look"
  " "
  "Frankie"
  "!  "
  "See"
  " "
  "Jessie"
  " "
  "play"
  "!  "
  "See"
  " "
  "Jessie"
  " "
  "play"
  " "
  "with"
  " "
  "Spot"
  ".\n"
  ; Many lines elided
  "Frankie"
  "?")

Part the fourth: Identifying syllables

In generating some kinds of text, such as those in a previous problem, it is useful to have a large corpus of words in different categories. One set of categories are words with a certain number of syllables.

a. Document and write a procedure, (syllables word), that attempts to determine how many syllables are in the string word. You can assume that word consists of only lowercase letters.

How do you decide how many syllables are in a word? A starting point is to count sequences of syllables. But you’ll want to find a better algorithm. (Many of you are creative enough to do so on your own, but if you’re struggling, the Internet is an awesome place. Just make sure to cite your sources.)

b. Include some non-trivial examples of when your procedure works well and some of when it fails to work correctly.

c. Make a copy of The Project Gutenberg version of Jane Eyre, available at http://www.gutenberg.org/files/1260/1260-0.txt. Please name it 1260-0.txt and place it in the same directory as your Racket program.

d. Using syllables, filter, and any other procedures you deem appropriate, generate a syllex that consists of the one-syllable, two-syllable, three-syllable, four-syllable, five-syllable, six-syllable, and seven-syllable words in the first thousand or so lines of Jane Eyre. Name it jane-eyre-syllex.

e. Use those lists to generate some Haiku. Include examples, commented out with #| and |#..

Postscript

Don’t worry if you can’t find the word “syllex” anywhere, except in strange games. We invented it for this assignment. But the plural of syllex is definitely “syllexen”.

Evaluation

We will primarily evaluate your work on correctness (does your code compute what it’s supposed to and are your procedure descriptions accurate); clarity (is it easy to tell what your code does and how it acheives its results; is your writing clear and free of jargon); and concision (have you kept your work short and clean, rather than long and rambly). In a few cases, we will also consider the creativity of your result.

Rubric

This rubric is close to final, but may still change slightly.

General

[ ] Passes all the one-star tests (*)
[ ] File named correctly (`language-generation.rkt`) (*)
[ ] File "runs" correctly in DrRacket (*)
[ ] Reformatted code with Ctrl-I before submitting (*)
[ ] If the program references other files, those files were submitted (*)
[ ] If the program references other files, it does so with the base file name, rather than a complex path (*)
[ ] Introductory header comment with name, date, assignment, course, citations (*)
[ ] All assigned procedures are documented (*)

[ ] Passes all the two-star tests (**)
[ ] All procedures are documented, including helpers (**)
[ ] Code is generally formatted/indented correctly (**)

[ ] Passes all the three-star tests (***)
[ ] Almost no stylistic issues (***)
[ ] Avoids expensive repeated work (***)
[ ] Clear variable names (***)

Part One: Haiku

[ ] `phrase` works for phrases of up to six words, with full syllexen (*)
[ ] `phrase` returns a string (*)
[ ] `haiku` returns a string (*)

[ ] `phrase` works for any number of syllables with a full syllexen (**)
[ ] `haiku` includes three newline characters, one after each line (**)
[ ] `haiku` has the correct 5/7/5 pattern (**)
[ ] Includes a record of experiments, including counting of words (**)
[ ] Includes another syllex (**)
[ ] Includes example output from the `haiku` procedure (**)

[ ] `phrase` works with syllexen with blank entries (other than at position 1), such as the `csc151` syllex. (***)

Part Two: Utilities

[ ] `replace-all` returns a list (*)
[ ] `randomly-replace-consistently` returns a list (*)
[ ] `randomly-replace-variably` returns a list (*)

[ ] `replace-all` appears to work on most inputs, including the empty list (**)
[ ] `randomly-replace-consistently` appears to work on most inputs, including the empty list (**)
[ ] `randomly-replace-variably` appears to work on most inputs, including the empty list (**)

[ ] At least four new tests for `replace-all` (**)
[ ] At least two new tests for `randomly-replace-consistently` (**)
[ ] At least two new tests for `randomly-replace-variably` (**)

Part Three: Somewhat Angry Libs

[ ] `maddish-libs` returns a string (*)

[ ] `replace-characters` appears to behave correctly (**)
[ ] `replace-words` appears to behave correctly (**)
[ ] Includes at least two examples of `maddish-libs` (**)

[ ] Uses at least five different parts of speech in `maddish-libs` (***)
[ ] `replace-characters` ensures that character replacement is unique (***)

Part Four: Syllabification

[ ] The `syllables` procedure returns an integer (*)

[ ] The `syllables` procedure counts vowel sequences, not just vowels (**)
[ ] Includes examples of `syllables` with correct counts (**)

[ ] The `syllables` procedure has at least two extensions beyond "count vowel sequences" (***)
[ ] Generates a syllex for _Jane Eyre_ of the requested form (***)
[ ] Includes at least two Haiku examples from the _Jane Eyre_ syllex (***)