Skip to main content

The joy of code: Writing a one-of macro

Topics/tags: The joy of code, Racket, digital humanities

One of the many projects we’re considering for our code camp on Code and Language is an exercise in language generation. We’re going to use a fairly simple generative grammar. For example, to say that a simple sentence is a subject, a transitive verb, and an object, we might write something like the following

(define (simple-sentence)
  (combine (subject) (verb-transitive) (object)))

At least to start, we’ll just have them make a list of words for the transitive verb.

(define (verb-transitive)
  (one-of "watches" 
          "throws" 
          "admires" 
          "observes"))

In that form, one-of is relatively easy to write; we just randomly select between the arguments.

(define (one-of . vals)
   (list-ref vals (random (length vals)))))

However, some definitions might involve procedure calls, including potentially expensive procedure calls. Here’s a silly example that shows that we do unnecessary procedure calls.

(define (hack val)
  (display "Processing ... ")
  (display val)
  (newline)
  val)

> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... a
Processing ... b
Processing ... c
"a"

As that example suggests, we did processing for all three possibilities, even though, conceptually, we only needed to process one. The issue gets more complicated in a realistic situation.

(define (noun-phrase)
  (one-of (noun)
          (combine (adjective) (noun-phrase))))

Yeah, that’s recursive. If we try to evaluate the parameters to one-of, it will recurse [1] infinitely. Hence, we need one-of to have a different evaluation strategy than is traditional in Scheme [2]. The normal way to get other evaluation strategies is through macros. I needed to write one-of as a macro. But I don’t write Racket macros nearly often enough. Let’s see what I figured out.

The last macro I wrote took the following form:

(define-syntax NAME
  (lambda (stx)
    (let ([datum (syntax->datum stx)])
      (cond
        [(symbol? datum)
         (datum->syntax stx '(quote <macro:NAME>))]
        [(null? (cdr datum))
         (error "NAME: requires at least one parameter")]
        [else
         (datum->syntax stx
                        CODE)]))))

What’s going on here? As you can tell, we’re using define-syntax instead of define. I realize that Racket has a wide variety of ways to write macros. That’s the one I learned. The parameter is the text of the program, which is of type syntax. To work with it, I’m converting it to a datum (basically, a Scheme value). But someone can use one-of in a variety of ways, not all of which are correct. Let’s see. They could just type the name.

> one-of

If one-of were a procedure, we’d see the output.

#<procedure:one-of>

What I’ve written will generate the following text.

<macro:one-of>

It’s not perfect, but I think it’s close enough for beginning programmers. Another alternative would be to just say bad syntax, or something similar.

What else could they do that is wrong? They could skip the parameters.

> (one-of)

But calling one-of with no parameters makes no sense. One of what? So we issue an error message.

> (one-of)
. . one-of: requires at least one parameter

Now we just have to write the real code. My first time through, I wasn’t thinking, and I just wrote something similar to the procedure version.

         (datum->syntax stx
                        (list-ref (cdr datum) 
                                  (random (length (cdr datum)))))]))))

My initial experiment suggested that it worked fine.

> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... a
"a"
> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... b
"b"
> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... b
"b"
> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... a
"a"
> (one-of (hack "a") (hack "b") (hack "c"))
Processing ... c
"c"

Things looked good. But then I used it in a more realistic situation.

> (define (transitive-verb) (one-of "attacked" "observed" "threw"))
> (transitive-verb)
"observed"
> (transitive-verb)
"observed"
> (transitive-verb)
"observed"
> (transitive-verb)
"observed"

Whoops! I need to delay the evaluation until later. For some reason, my initial inclination was to build a vector of program text [3], select randomly from that at runtime, and then call eval [4].

         (let ([options (list->vector (cdr datum))])
           (datum->syntax stx
                          `(eval (vector-ref ,options
                                             (random (vector-length ,options))))))]))))

I said for some reason, because this is clearly a strange hack. Still, I got to use backquote and comma [5], which made it fun. And it looked like it worked okay.

> (define (transitive-verb) (one-of "attacked" "observed" "threw"))
> (transitive-verb)
"observed"
> (transitive-verb)
"threw"

But eval is dangerous. In particular, there are scoping issues that I hadn’t considered. See, for example, the following not-very-realistic example, in which I’ve added a parameter to transitive-verb to allow an additional choice.

> (define (transitive-verb x)
    (one-of x "attacked" "observed" "threw"))
> (transitive-verb "coded")
"threw"
> (transitive-verb "coded")
"attacked"
> (transitive-verb "coded")
. . ../../../../Applications/Racket v6.5/collects/racket/private/kw.rkt:929:25: x: undefined;
 cannot reference an identifier before its definition

Whoops! Okay, what’s a better way to delay evaluation in situations like this? How does Scheme delay evaluations? Well, I know that evaluation is delayed for if, cond, or, and, case, and similar functions. So I could generate a giant case statement. Let’s see if I can write such code.

(define-syntax one-of
  (lambda (stx)
    (let ([datum (syntax->datum stx)])
      (cond
        [(symbol? datum)
         (datum->syntax stx '(quote <macro:one-of>))]
        [(null? (cdr datum))
         (error "one-of: requires at least one parameter")]
        [else
         (let ([len (length (cdr datum))])
           (let kernel ([options '((else "?????"))]
                        [remaining (reverse (cdr datum))]
                        [num (- len 1)])
             (if (null? remaining)
                 (let ([code (cons 'case
                                   (cons `(random ,len)
                                         options))])
                   ; (write code) (newline)
                   (datum->syntax stx code))
                 (kernel (cons (list (list num) (car remaining))
                               options)
                         (cdr remaining)
                         (- num 1)))))]))))

You’ll note that I have a write in the middle. That’s so that I can see what code I’m generating. I realize that there’s probably a better way to do so, but that’s a useful quick and dirty hack. Here’s an example.

> (define (abc) (one-of (hack "a") (hack "b") (hack "c")))
(case (random 3) ((0) (hack "a")) ((1) (hack "b")) ((2) (hack "c")) (else "?????"))

Yup, that’s the code I want! I’ve included an else clause that should never be called. Why? Because I make it a habit to always included such clauses. I’d rather get not-quite-right output when things go wrong than have my program crash [6]. Let’s see if it works for the strange transitive-verb example from before.

> (define (transitive-verb x)
    (one-of x "attacked" "observed" "threw"))
> (transitive-verb "coded")
"observed"
> (transitive-verb "coded")
"attacked"
> (transitive-verb "coded")
"coded"
> (transitive-verb "coded")
"threw"

I could stop there. However, before I wrote that code, I said to myself Wow, generating a giant case statement is inelegant. There must be a better way. And then I remembered the thing that I should have realized in the first place; thunks are probably the way to go [7].

(define-syntax one-of
  (lambda (stx)
    (let ([datum (syntax->datum stx)])
      (cond
        [(symbol? datum)
         (datum->syntax stx '(quote <macro:one-of>))]
        [(null? (cdr datum))
         (error "one-of: requires at least one parameter")]
        [else
         (let ([len (length (cdr datum))]
               [thunks (map (lambda (exp) `(lambda () ,exp))
                                          (cdr datum))])
           (let ([code `(let ([vec ,(cons 'vector thunks)])
                          ((vector-ref vec (random ,len))))])
             ; (write code) (newline)
             (datum->syntax stx code)))]))))

That was surprisingly hard to get correct. I’m pretty sure that writing generate a case version took half as much time. But the new version feels cleaner, somehow. What code does it generate? Let’s see … [8].

> (define (abc) (one-of (hack "a") (hack "b") (hack "c")))
(let ((vec (vector (lambda () (hack "a")) 
                   (lambda () (hack "b")) 
                   (lambda () (hack "c"))))) 
  ((vector-ref vec (random 3))))

Note that the extra set of parentheses around the call to vector-ref are what tells Scheme to evaluate the thunk. Does it work? Let’s see.

> (abc)
Processing ... a
"a"
> (abc)
Processing ... a
"a"
> (abc)
Processing ... c
"c"

> (define (transitive-verb x)
    (one-of x "attacked" "observed" "threw"))
> (transitive-verb "coded")
"observed"
> (transitive-verb "coded")
"coded"
> (transitive-verb "coded")
"observed"
> (transitive-verb "coded")
"attacked"

It appears to work. Now I can put together a complete example. But that will be a topic of another musing [9].

What have I learned from all of this? I probably shouldn’t code when I’m tired. I should pay more attention to Scheme scoping rules. I need to learn more about macros and pay attention to when things are evaluated. Yeah, that’s enough for now.


[1] One of my colleagues says that I should use recur rather than recurse. I still like using recurse as a verb.

[2] Traditional Scheme evaluation strategy is evaluate the arguments then pass the evaluated arguments to the procedure.

[3] You may recall that we were using lists. vector-ref is much more efficient than list-ref; it seemed worth the extra effort.

[4] eval is one of those wonderful and wonderfully dangerous tools available in Scheme.

[5] For the students reading this, backquote is a lot like quote, except that you can use a comma to force evaluation of some terms.

[6] I’m not sure why I trust that the list-ref will work correctly but I don’t trust that the case will work correctly. I’ll need to reflect on that issue.

[7] A thunk is a zero-parameter procedure.

[8] I’ve reformatted for clarity.

[9] For those who want to see a fun recursive example written in a language other than Racket, check out example 7 on the Tracery Tutorial.


Version 1.0 of 2018-07-03.