[Instructions] [Search] [Current] [Syllabus] [Handouts] [Outlines] [Assignments]
In general, the exams were awful. While most of you did at least one question nearly perfectly (which question it was varied widely), many of you also missed some questions completely. As a makeup, I will give two one-hour, closed book, one problem exams while I am away. Each makeup exam can replace one problem on your exam (and you may decide which problem).
On a more casual note, please remember that the word ``grammar'' has no ``e''s in it. If anyone misspells ``grammar'' again, I will penalize you severely.
In the abstract, we can think of an HTML file as containing a sequence of two ``types'' of tokens: tags and chunks of text. We might describe tags (at least a simplified version of tags) as:
A tag is a sequence of characters beginning with a less-than sign, < and ending with a greater-than sign, >. The first greater-than sign that appears after the opening less-than sign closes the tag, unless that greater-than sign appears within a quoted string. Quoted strings begin with either a single quote, ', or a double quote, ", and must end with the same symbol. A single-quoted string cannot contain single quotes and a double-quoted string cannot contain double quotes. Single quotes can appear within double-quoted strings and double quotes can appear within single-quoted strings.
A chunk is any nonempty sequence of characters that's not a tag.
Write a lex
, flex
, or Jlex
program
that describes tags and chunks.
Some of you assumed that quoted strings had to appear within the tag. However, the definition does say ``The first greater-than sign that appears after the opening less-than sign closes the tag, unless [...]'', suggesting that some characters don't appear in quoted strings.
The ``any nonempty sequence of characters that's not a tag'' was
somewhat ambiguous. For example, is ``<a
'' a tag? It's
really an error, rather than a tag or a chunk. In general, a tag is
anything not containing a less-than sign.
A few of you decided to do some error checking for problems like the one noted above. You received a small amount of extra credit for error checking.
Surprisingly, the request to write a program that ``describes'' tags and
chunks was also ambiguous. Traditionally, a lex
program
identifies the nonterminals (in this case, tag and chunk) and returns a
note that it's found such a symbol. The program should also ensure that
the corresponding input string is stored in yytext. Many of you ignored
this component and instead printed out some message. I did not penalize
you for reasonable alternatives.
This is approximate lex-syntax. I cared more about the ability to describe tokens than about the lex stuff.
\<((\"[^\"]*\")|(\'[^\']*\')|([^<>\'\"]))*\> { return TOKEN_TAG; } [^<]* { return TOKEN_CHUNK; }
Some of you chose to take advantage of the ``contextual expressions'',
starting and ending tags. However, this does make it more difficult
to maintain the string for the token. In the sample code below, I've
used startTag
to begin the string for a tag, addToTag
to add more text to it, and getTag
to get that string.
%START TAG %{ Code for startTag, addToTag, getTag Code for TOKEN_TAG and TOKEN_CHUNK %} %% \< { startTag(); addToTag("<"); BEGIN TAG; } <TAG>">" { addToTag(yytext); yytext = getTag(); BEGIN INITIAL; return TOKEN_TAG; } <TAG>\"[^\"]*\" { addToTag(yytext); } <TAG>\'[^\']*\' { addToTag(yytext); } <TAG>. { addToTag(yytext); } [^<]+ { return TOKEN_CHUNK; } %%
Note that the final rule for TAG
takes advantage of
lex's ``the first matching rule wins'' strategy. However, it does
mean that <">
is treated as a tag, which is
incorrect. The correct line should be
<TAG>[^\'\"] { addToTag(yytext); }
It is also possible to use contexts for ``in a string'' as in the following version.
\< { startTag(); addToTag("<"); BEGIN TAG; } <TAG>">" { addToTag(yytext); yytext = getTag(); BEGIN INITIAL; return TOKEN_TAG; } <TAG>\" { addToTag("\"); BEGIN DOUBLEQUOTE; } <TAG>\' { addToTag("'"); BEGIN SINGLEQUOTE; } <TAG>[^\'\"] { addToTag(yytext); } <SINGLEQUOTE>\' { addToTag(yytext); BEGIN TAG; } <SINGLEQUOTE>. { addToTag(yytext); } <DOUBLEQUOTE>\" { addToTag(yytext); BEGIN TAG; } <DOUBLEQUOTE>. { addToTag(yytext); } [^<]+ { return TOKEN_CHUNK; }
In this case, it is valid to use the dot, since the previous pattern is a single character rather than a full string.
Some of you treated single characters as chunks. This means that a character, rather than ``sequence of characters'' is returned.
A surprising number of you forgot about the ``negation set'', leading to overly complicated grammars. Others of you assumed that you could negate regular expressions, which is not legal.
In this problem, you will build a variety of representations for the
language
``strings of a
's and b
's beginning with an
a
and ending with a b
.''
Build a regular expression for the language of
``strings of a
's and b
's beginning with an
a
and ending with a b
.''
a([ab]*)b
Build an optimal deterministic finite automaton for the language of
``strings of a
's and b
's beginning with an
a
and ending with a b
.''
You can use the formal translation from regular expressions to finite
automata or you can build the DFA by hand.
I've done the optimization by hand and then represented the automaton textually, rather than graphically. Basically, the first a puts you in an intermediate state, a b in that state puts you in a final state. An a in the final state brings you back to the intermediate state.
Alphabet: { a b }
Initial state: q0
Final states: { q2 }
State | Symbol | New-State |
q0 | a | q1 |
q1 | a | q1 |
q1 | b | q2 |
q2 | a | q1 |
q2 | b | q2 |
Here's an ASCII-graphic to represent that automaton.
a b +----+ +----+ a v | b v | ---> (0) ---> (1) --+---> (2) --+ ^ | +-----------+ a
Write a context-free grammar for the language of
``strings of a
's and b
's beginning with an
a
and ending with a b
.''
This should be somewhat straightforward.
I've used E
for ``any sequence of a
s and
b
s''.
S ::= a E b E ::= a E | b E | epsilon
Build a predictive parsing table for the language of ``strings of
a
's and b
's beginning with an a
and ending with a b
.'' It is likely that you will want to
use one of the formal mechanisms for converting a context-free grammar
to a predictive parser. You may find that you need to modify the
context-free grammar from the previous step in order to build the
parsing table.
To build a predictive parser, we must first build the first and
follow tables. Note that I've used the end-of-string marker,
$
as the follow symbol of the start state.
Symbol First Follow Nullable S a $ E a,b b yes
Now, recall that a predictive parsing table has rows indexed by nonterminals and columns indexed by terminals. An entry in the table is a production that should be executed when one is trying to match the nonterminal and sees the particular symbol.
If N ::= alpha
then for each element, s
of first(alpha),
we put N ::= alpha
at (N,s)
.
If N ::= alpha
and nullable(alpha), then for each element,
s
of follow(N), we put N ::= alpha
at (N,s)
.
For convenience, I'll just write the right hand sides of the production in the table.
a b $ S a E B E a E b E epsilon
Whoops! There are two entries in one position in the table. This means that this particular grammar cannot be parsed by an LL(1) parser. What can we do? We can try an LL(2) parser or we can change the grammar.
Here's an LL(2) parser that works
aa ab ba bb a$ b$ S a E b a E b E a E a E b E b E epsilon
A small change, but it makes a big difference. We know know that when
the remaining characters are b
and the end-of-string mark,
we reduce E
to the empty string.
It is somewhat more difficult to produce an appropriate grammar. Here
is one I came up with. The EndsWithB
nonterminal is used
to indicate a sequence of a
s and b
s ending
in b
. The
Bs
nonterminal is used for a sequence of 0 or more
b
s.
S ::= a NeedsB NeedsB ::= a NeedsB // If you see an a, you still need a b NeedsB ::= b Bs Extra // A sequence of b's followed by ... Extra ::= epsilon // (1) The empty string or Extra ::= a NeedsB // (2) An a and something ending in b Bs ::= b Bs Bs ::= epsilon
Now, we can build the helper tables.
Symbol First Follow Nullable S a $ NeedsB a,b $ Extra a $ yes Bs b a,$ yes
And finally the parse table.
a b $ S a NeedsB NeedsB a NeedsB b Bs Extra Extra a NeedsB epsilon Bs epsilon b Bs epsilon
Some of you suggested that it might be better to build a shift-reduce parser. However, this is also difficult.
We start by building an LR(0) automaton, with hopes of doing LR(0) or SLR(0) matching.
s0: S' ::= * S $ S ::= * a E b s1: S' ::= S * $ s2: S ::= a * E b E ::= * a E E ::= * b E E ::= *
Whoops! We can already tell that this won't work because we can't decide between shifting a b and reducing to epsilon.
Let's try an LR(1) automaton.
s0: S' ::= * S $ , ? S ::= * a E b , $ s1: S' ::= S * $ , ? s2: S ::= a * E b , $ E ::= * a E , b E ::= * b E , b E ::= * , b
Whoops! We still have the same problem. We don't know whether to shift or to reduce upon seeing a b in state 2. Let's try rewriting the grammar slightly.
r0: S ::= a E b r1: E ::= E a r2: E ::= E b r3: E ::= epsilon
Now, let's try building an LR(0) automaton.
s0: S' ::= * S $ S ::= * a E b s1: S' ::= S * $ s2: S ::= a * E b E ::= * E a E ::= * E b E ::= *
So far, so good. We can reduce to epsilon when we see an
a
or a b
. A little bit odd, but
let's see what else it gives us.
s3: S ::= a E * b E ::= E * a E ::= E * b s4: E ::= E a * s5: S ::= a E b * E ::= E b *
Now, there seems to be a reduce-reduce conflict in state s5. However,
you'll note that follow(S) includes only $ and follow(E) includes only
a
and b
. Hence, we can successfully build an
SLR(1) automaton.
a b $ S E s0 s2 g1 s1 acc s2 r3 r3 g3 s3 s4 s5 s4 r1 r1 s5 r2 r2 r0
Many of you got this problem wrong. Why? Because you either produced an invalid table (one with two entries) or because you decided to drop one of the overlapping entries and produced a table that wouldn't parse. For example, a surprising number of you turned in something similar to the following table.
a b $ S a E b E a E b E epsilon
But consider what happens when we try to match abb
To Match Input S a b b $ a E b a b b $ E b b b $ b E b b b $ E b b $ b E b b $ E b $ b $ Boom!
What does your work on this problem suggest to you?
For some languages, it is significantly simpler to build a deterministic finite automaton than to build a shift-reduce parser. In fact, this is why we have lexers in the first place: the simpler things can be done quicker and more efficiently with a lexer than with a parser.
I've also noted that left-recursive grammars are easier to parse with a shift-reduce parser than are right-recursive grammars.
This also makes it somewhat clear that the declarative definitions of languages (regular expressions and grammars) are significantly simpler than the operational definitions (automata and parsers).
A number of you noted that there was a clear progression between the four parts, with each step building on the last. Many also noted that it's nice that we can automate the translation between declarative definitions and operational definitions.
Write an unambiguous BNF grammar for the language of regular expressions
over symbols (which are given by the SYMBOL
token). That
is, your grammar should accept only valid regular expressions using the
base alphabet plus parenthesization, concatenation, alternation, Kleene
star, question mark, plus, and set (bracket) notations. Make sure that
your grammar is unambiguous and gives appropriate precedence to the
various mechanisms for combining regular expressions.
Note that we give precedence by using different nonterminals for each of the types of operators, with higher precedence operators appearing lower in the grammar. In that sense, regular expressions are just like arithmetic expressions, except that we use ``nothing'' for multiplication and a vertical bar for alternation.
regexp ::= regexp OR regterm // Alternation regexp ::= regterm regterm ::= regterm regfactor // Concatenation regterm ::= regterm DOT regfactor // Alternate concat regterm ::= regfactor regfactor ::= regfactor STAR regfactor ::= regfactor QUESTION regfactor ::= regfactor PLUS regfactor ::= regstuff regstuff ::= SYMBOL regstuff ::= LPAREN regexp RPAREN regstuff ::= LBRACKET setcontents RBRACKET regstuff ::= LBRACKET negation setcontents RBRACKET regstuff ::= epsilon setcontents ::= setcontents element setcontents ::= element element ::= SYMBOL DASH SYMBOL element ::= SYMBOL
The alternate forms of concatenation and the ``negative sets'' were not required.
A surprising number of you wrote incredibly ambiguous grammars. The problem did call for an unambiguous grammar.
A surprising number of you ignored precedence. The problem did call for you to give appropriate precedence.
A number of you ignored the restrictions on sets. Recall that sets can contain only symbols and ranges (not other regular expressions).
Answer one of the following two questions. You may receive a modicum of extra credit if you successfully answer both.
Here is a simple grammar that describes binary numbers.
binum1 ::= bit binum2 binum ::= bit bit ::= ZERO bit ::= ONE
Peter and Paula Positive have decided to write an attribute grammar for nonnegative binary numbers. In that representation, the ith bit from the right is the 2ith place.
They've decided to begin their grammar as follows
binum1 ::= bit binum2 binum1.value = bit.value + binum2.value
They're now a little bit stumped as to how they're going to get an appropriate
value for the bit
s that make up a binary number. Add rules
that will do so. You may need to add other attributes in order to successfully
compute that value.
Note that you need to preserve their one rule, which means that the value of a bit depends on its position in the string.
In order to give the appropriate value to each bit, we need to determine its position in the string. The grammar has conveniently been written so that the position is given by the length of the substring to the right.
binum1 ::= bit binum2 bit.pos = binum2.length binum1.value = bit.value + binum2.value binum1.length = 1 + binum2.length binum ::= bit bit.pos = 0 binum.value = bit.value binum.length = 1 bit ::= ZERO bit.value = 0 bit ::= ONE bit.value = 2bit.pos
A few of you ignored the restriction that ``you need to preserve their one rule''. That restriction was there for a reason. Without that restriction, the problem should be trivial.
Some of you updated attributes. Recall that attributes only get set once, and cannot be changed.
A few of you used what I would call a ``hack'', but one that I accepted anyway.
You assigned an attribute to bit.value
and then used it immediately,
as in the following.
binum1 ::= bit binum2 bit.value = bit.val * 2binum2.length binum1.value = bit.value + binum2.value binum1.length = 1 + binum2.length binum ::= bit binum.value = bit.value bit ::= ZERO bit.val = 0 bit ::= ONE bit.val = 1
Thelma and Charles Bitwise have decided to implement an attribute grammar for binary numbers in two's complement notation. They've decided that all such numbers must have at least two bits. So far, all they've been able to come up with is a core BNF grammar.
twoscomplement ::= ZERO bits twoscomplement ::= ONE bits bits1 ::= bits2 bit bits ::= bit bit ::= ZERO bit ::= ONE
Complete their attribute grammar. If you deem it necessary, you may also modify the underlying BNF grammar.
We know that the first bit in a two's complement number gives the sign of the number. An initial 0 indicates that it's a positive number, an initial 1 indicates that it's a negative number. Positive numbers are easy to translate. For negative numbers, we may recall that two's complement is quite similar to one's complement, and in one's complement, each 0 is worth -2pos and each 1 is worth 0. We need to subtract one from that result.
twoscomplement ::= ZERO bits bits.sign = positive twoscomplement.value = bits.value twoscomplement ::= ONE bits bits.sign = negative twoscomplement.value = bits.value - 1 bits1 ::= bits2 bit bit.sign = bits1.sign bits2.sign = bits1.sign bits1.value = 2 * bits1.value + bit.value bits ::= bit bit.sign = bits.sign bits.value = bit.value bit ::= ZERO bit.value = if bit.sign == positive then 0 else -1 bit ::= ONE bit.value = if bit.sign == positive then 1 else 0
A few of you took advantage of the rule like ``if a number is in two's complement, is negative, and has b bits, then you can compute 2^b - (value of remaining bits in unsigned notation)''. A source for or explanation of that rule would have been nice.
These problems are intended to test your reading of Appel. They focus on various details. Answer either the Java section or the C section.
5.a.i. In your introductory programming courses, you should have learned to indent your code to enhance readability. In particular, you typically select an indenting style that makes it clear what values are parameters of what method.
Unfortunately, the dictates of the printed page can make it difficult to indent appropriately. In addition, in certain cases other requirements besides nesting come into play.
Consider the code for the abstract syntax tree for the let expression from page 104 of the red book. Rewrite the code so that it follows the traditional rules of indentation (in particular, that any argument to a method should be indented further than the start of the method name).
Why do you think Appel chose this alternate indentation style?
I've noted that Appel is inconsistent in his construction of abstract syntax
trees. In some places, he writes the commands for creating the trees
(e.g., with new
) but leaves out the positions. In others, he
includes the positions but skips the new
s. I'll use a hybrid,
which has neither new
nor position.
I've also tried to fill in some of the details that Appel elided.
Here is the original Tiger code
let var a := 5 function f() : int = g(a) function g(i: int) = f() in f() end
Note that this is a semantically invalid program, since f
returns
integers but g
is a procedure and therefore returns nothing. In
addition, the program (if legal), appears to run forever.
Nonetheless, here is the abstract syntax tree
LetExp( DecList( VarDec(symbol("a"), null, IntExp(5)), DecList( FunctionDec( symbol("f"), // Function name null, // Parameter list symbol(int), // Type CallExp( // Body symbol("g"), // Called function ExpList( // Actual parameters VarExp(SimpleVar(symbol("a"))) // First parameter null)), // No other parameters FunctionDec( // Related function symbol("g"), // Function name FieldList(symbol("i"), symbol("int"), null), // Parameter list null, // Type (this is a procedure) CallExp(symbol("f"), null), null) // No more related functions! ) // End outermost function declaration ) // End sub declaration list ), // End outermost declaration list CallExp(symbol("f"), null) ) // end LetExp
And you thought Scheme and LISP had a lot of parentheses!
It is likely that Appel chose his indentation scheme so that the elements of a list appeared at the same indentation level. For example, you will note that all three declarations appear at the same indent, ``nicely'' lined up.
5.a.ii. In the code at the beginning of section 5.4 on page 123 of the
red book, there are two consecutive calls to beginScope
.
Why?
One call begins a new scope for type variables, the other begins a new scope for function and normal variables.
5.a.iii. In the code for function declarations on page 125 of the red book,
Appel does one put
before beginning a scope and then multiple
put
s after beginning the scope. Why?
The first put
adds the function name and return type to the
current scope. That name and type may be needed by subsequent functions,
so they must be added before beginning a new scope. After the new scope
begins, the parameters are added. Since the parameters are not needed
after the body of the function is type-checked, they must be done in
a separate scope.
5.a.iv. In the code for variable declarations on page 124 of the red book,
there is an instruction to return null;
. Why?
Appel has noted that the Exp
class is a detail left for a
subsequent chapter. Eventually, in addition to type checking we'll want
to build intermediate representation trees. The Exp
is
used in building such trees. Since we're presently doing only type checking,
we need not return any intermediate representation. Does the translation
still serve a purpose? Yes, it updates the symbol table.
However, it is important to note that the function does have a return type,
rather than be declared as void
. This is because Appel has
planned ahead for the subsequent sections.
5.b.i. In your introductory programming courses, you should have learned to indent your code to enhance readability. In particular, you typically select an indenting style that makes it clear what values are parameters of what method.
Unfortunately, the dictates of the printed page can make it difficult to indent appropriately. In addition, in certain cases other requirements besides nesting come into play.
Consider the code for the abstract syntax tree for the let expression from page 99 of the green book. Rewrite the code so that it follows the traditional rules of indentation (in particular, that any argument to a method should be indented further than the start of the method name).
Why do you think Appel chose this alternate indentation style?
5.b.ii. In the code at the beginning of section 5.4 on page 118 of the
green book, there are two consecutive calls to S_beginScope
.
Why?
5.b.iii. In the code for function declarations on page 120 of the red book,
Appel does one put
before beginning a scope and then multiple
put
s after beginning the scope. Why?
5.b.iv. Why does transExp
return a struct but
transDec
return nothing?
[Instructions] [Search] [Current] [Syllabus] [Handouts] [Outlines] [Assignments]
Disclaimer Often, these pages were created "on the fly" with little, if any, proofreading. Any or all of the information on the pages may be incorrect. Please contact me if you notice errors.
Source text last modified Wed Oct 28 13:05:47 1998.
This page generated on Wed Oct 28 13:10:46 1998 by SiteWeaver.
Contact our webmaster at rebelsky@math.grin.edu