CSC362 2004S, Class 15: Predictive Parsing (1)
Admin:
* Talk Wednesday. Pizza afterwards (for those who replied early)!
* Applications for summer research way down. Bleh.
* Don't ask me about first and follow until after Wednesday's class.
Overview:
* Parsing basics
* Predictive parsing
* Problems of predictive parsing
* Producing parse trees
* Looking ahead: Useful tables and functions
For regular languages: Regular expressions describe languages; Lexers match (and tokenize)
Now: Grammars describe languages; Parsers match (and build parse trees)
There are many ways to go from grammars to parsers, each of which has its own advantages and disadvantages
The simplest is "predictive parsing"
* Can be automated
* Can be implemented by hand relatively easily
This is a top-down matching strategy
Given a current symbol you're matching and a next input token(s), determine
which right hand side applies
R1: s ::= A s A
R2: s ::= B s B
R3: s ::= C
If you're matching an s and see an A, attempt to apply R1
If you're matching an s and see an B, attempt to apply R2
If you're matching an s and see an C, attempt to apply R3
In the by-hand version of this, we write a procedure, parse_s, that
looks something like the following
/*
* Procedure:
* match
* Parameters:
* tok, a token
* ts, a token stream
* Purpose:
* Verify that the next token in the token strem is tok and
* consume that token.
* Produces:
* consumed: boolean
* Preconditions:
* (None)
*/
jjk
boolean parse_s(TokenStream ts)
{
switch (ts.peek()) {
case A: return match(ts,A) && parse_s(ts) && match(ts,A);
case B: return match(ts,B) && parse_s(ts) && match(ts,B);
case C: return match(ts,C);
default: return false;
} // switch
} // parse_s(TokenStream)
Note that the pictures we drew were very stack-like
The automated implementation uses an explicit stack
push the start symbol
while symbols remain in the input and symbols remain on the stack
if the top symbol in the stack is a token, match it against the next
input token (and return false if they don't match)
if the top symbol on the stack is a nonterminal, find the "appropriate" rhs and push it in right-to-left order
The problem can get a little or a lot harder if you change the grammar (or language)
s ::= x
s ::= y
s ::= C
x ::= A s A
y ::= B s B
Can we teach the computer which rule to apply, even though it requires some "reasoning"? Probably
s ::= A s A
s ::= B s B
s ::= epsilon
This grammar is not parsable by predictive parsing
----
When we parse, we can also build a parse tree
Node parse_s(TokenStream ts)
{
switch (ts.peek()) {
case A:
Node result = new Node(Nonterminal_s);
result.addChild(match(A, ts));
result.addChild(parse_s(ts));
result.addChild(match(A, ts));
return result;
case B:
Node result = new Node(Nonterminal_s);
result.addChild(match(B, ts));
result.addChild(parse_s(ts));
result.addChild(match(B, ts));
return result;
case C:
Node result = new Node(Nonterminal_s);
result.addChild(C);
return result;
default: throw new Exception("You seem unable to write palindromes.");
} // switch
} // parse_s(TokenStream)
Detour: Context-free grammars vs. context-sensitive grammars
BNF grammars describe languages by derivation
A string is a sequence of symbols (nonterminals and terminals)
A string, alpha, is in the language of a grammar G iff
(1) alpha consists only of terminals
(2) there exists a finite series of strings alpha_0 alpha_1 alpha_2 ... alpha_n
such that each alpha_i is derivable from alpha_(i-1) by replacing some substring by the corresponding rhs of a rule
In context-free grammars, the left-hand-sides of rules have length one
In context-sensistive grammars, the left-hand-sides of rules can be bigger (but must include an nonterminal)
As compiler writers, we work only in the world of context-free grammars, which we can (usually) build efficient parsers for.
* When we need extra stuff (e.g. declaration before use), we rely on semantic analysis
Returning to the issue of predictive parsing:
* Given a nonterminal and an input symbol, how do you decide which rule to apply?
* We pre-compute the answer for every possible pair
(If we can't pre-compute those answers, we say the grammar isn't parsable by this technique and rewrite it.)
* How do we compute that table?
* We begin by computing other tables,
first[NONTERMINAL] = all terminals that can begin strings derivable from NONTERMINAL
follow[NONTERMINAL] = all terminals that can follow NONTERMINAL in some derivable string
nullable[NONTERMINAL] = can this thing derive epsiolon?
Read about first and follow in the book, come with questions