Held Friday, February 16, 2001
Summary
Today we consider how to hand-code a parser based on a grammar. The
traditional technique for doing such coding is called
predictive parsing.
Assignments
Notes
- Reminder: It is discourteous to show up to lab late!
- Reminder: You should either email me your laboratory or drop it in
the drop box on blackboard by Tuesday.
- What languages did you use?
- Although I have not yet set up phase 2 of the project, you should
start thinking about the grammar for Pascal (fortunately, Wirth gives
it to you in the book).
- Reminder: Tomorrow in the Grinnell Entrepreneur's conference.
Overview
- Disambiguating the conditional grammar (left over from Monday)
- Basics of predictive parsing
- Some examples
- We were stuck with ambiguous conditionals.
- What should we do?
- Reminder:
- Identify incorrect parse trees.
- Create new conditionals to prevent illegal parts of the tree.
- Prove new grammar is equivalent
- We've seen how to use grammars to describe languages.
- Now we have a different problem: How do you build a parse
tree for a string?
- So that we can verify that it's in the language.
- So that we can use the parse tree for understanding
the intended structure of the program.
- There are many different dimensions you can consider when
writing a parser.
- Do you hand code it or do you rely on a parser generator?
- Do you build the parse tree by gradually expanding the start
symbol or by combining parts of the string?
- Do you process the input string in a particular order (e.g.,
left-to-right or right-to-left or ...?)
- Predictive parsing is the most common technique for hand-generated
parsers.
- Predictive parsers are top-down, left-to-right parsers.
- In predictive parsing, you repeatedly expand the leftmost nonterminal.
- Predictive parsers are also called recursive-descent parsers
because of the way they recursively descend through the tree.
- How do you decide how to expand the nonterminal (that is, which production
to apply?). In the simplest case,
- You look at the next token.
- You look at the right-hand-sides
- If you're lucky, only one right-hand-side begins with that token.
- You've already seen some examples of hand-written predictive parsers
(or at least you should have if you did yesterday's lab).
- Here's a simple language for ``palindromes of
a
's and
b
's with a c
in the middle''.
- Here's the corresponding set of procedures:
boolean parseS(Lexer l) {
Token t = l.nextToken();
if (t is an a) {
parseS(l);
Token u = l.nextToken();
return (u is an a);
}
else if (t is a b) {
parseS(l);
Token u = l.nextToken();
return (u is a b);
}
else
return t is a c;
}
- However, subtle changes to the grammar can make it much more
difficult to parse this way.
- What if we didn't have the middle c? E.g.,
- Then there are essentially two choices when we see an
a
- We might apply
S ::= a S a
- We might apply
S ::= epsilon
- Hence, this grammar is not immediately amenable to predictive
parsing.
- There are other potential problems. For example, what if
we rewrite the grammar as
S ::= A
| B
| c
A ::= a S a
B ::= b S b
- How do we decide which rule to apply when we see a symbol, given that
two of the right-hand sides begin with a nonterminal?
- As you can tell, we need to make a number of decisions.
- Given multiple right-hand sides that start with a nonterminal,
which one do you choose?
- If a nonterminal derives epsilon, how do you decide to apply
that derivation?
- What if a nonterminal derives a nonterminal that derives epsilon?
Consider what to do when you see a
b
when matching
an S
in
S :: = X b
X ::= x X
| epsilon
- All of these analyses are aided by three tables (and corresponding
functions) known as First (and first),
Follow, and
Nullable (and nullable).
- The tables and functions are recursively constructed. That is,
the functions may be defined in terms of the tables and the
tables may be defined in terms of the current versions of the functions.
- First maps nonterminals to sets of tokens that can
begin strings derived from those nonterminals.
- first maps sequences of nonterminals and
terminals to the symbols that can begin strings derived from those
sequences
- Follow maps nonterminals to sets of tokens that
can follow those nonterminals in sequences derivable in the grammar
- Nullable indicates which nonterminals can derive
the empty string
- nullable operates on sequences of symbols
(nonterminals and terminals), returning true only when the
sequence can derive epsilon
- We'll see how to construct these three tables in the next class.
Monday, 22 January 2001
- Created as a blank outline.
Friday, 16 February 2001