CSC362 2004S, Class 16: Predictive Parsing (2): Helpful Tables

Admin:
* Interesting panel tomorrow 11:00-12:45
* Cool talk today 4:30-5:30
* Questions on the project?
* No midterm!

Overview:
* Predictive parsing, reviewed
* A sample grammar
* Three helpful tables and two helpful functions
* The first function
* Building the First table
* The nullable function
* Building the Nullable table
* Building the Follow table

Question: Is a grammar non-parsable by a Recursive Descent parser ambiguous?
Answer: Not necessarily

s ::= A s A
s ::= B s B
s ::= epsilon

Is not ambiguous but is also not RD parsable

Question: Do there exist any ambiguous grammars that are parsable by RD parsers?
Answer: I don't think so

Question: What does it mean for a grammar to be left-recursive?
Answer: For some nonterminal, n

   n => ... => n asdfasdf

Observation: Left-recursive grammars are not parsable by RD parsing
Observation: Left-recursive grammars can become right recursive (on Monday)

Recursive-descent parsing's key ideas:
* Top-down parsing
* Expand nonterminal by ...
   look at next input symbol
   decide which RHS corresponds
* Observation: Given multiple nonterminals to expand, we expand the *leftmost*

Consider

s ::= b s X
s ::= A
b ::= epsilon
b ::= B

Can you parse this grammar via recursive-descent parsing?

Side note: Since we're in a computer-equipped classroom, can't we just write software-based clickers (that lets Sam keep a live update on how bored or confused the students are)?

This grammar (the one above the side note) is left-recursive

s => b s X => s X

Why is left recursion a problem in this grammar?

Suppose our first input is A (we don't know what follows)

Do we apply

s ::= b s X

or do we apply

s ::= A

In the first case, the input of "A" will prove us wrong.  In the second case, the input of "AX" will prove us wrong.  Since we can't know which it is, we're doomed.

The original theorem only holds for one-token lookahead

R1: s ::= b X
R2: s ::= A C
R3: b ::= epsilon
R4: b ::= B

If you're trying to match an s and you see 
* an A: Apply rule R2
* a B: Apply rule R1 (leading us to expect to apply R4 next)
* a C: CRASH AND BURN
* an X: Apply rule R1 (expecting to apply R3 next)

How do we automate this process?

Define three tables and two functions that give us additional information

Tables start with capital letters, functions start with lowercase

First[n] = the set of tokens that start strings derivable from n
first(s1 s2 s3 .. sk) where s1 ... sk are all tokens or nonterminals
  = the set of tokens that start strings derivable from the sequence
    s1 s2 ... sk

* Why is this different than First[s1]?
  * Can the sequence s1 ... sk every derive something that begins with something different than what something derivable from s1 can begin with

Step back, check:

The key idea in parsing is repeatedly "deriving" a string from another string
(the => operation) by replacing a substring that matches a LHS with the corresponding rhs

s1 => asfas h; asdfda 4

Question now understood.  Jonathan's intuitive answer is "Yes"
Sam asks "When"?  Jonathan replies "For instance, if s1 derives epislon"

b X => X
  so X is in first(b X)
First[b] = { B }

Observation: To figure this out in more depth, we need to know whether or not a nonterminal derives epsilon

* Why do we need a function rather than a table?