CSC362 2004S, Class 17: Predictive Parsing (4): Wrapup

Admin:
* The late list returns: Sam, Piper (Grrr!!!), Ananta, Jonathan (???)
* Questions on your parsers?
* Does Sam have an attendace policy?  
  * Yes!  Attend or else.  (Please email me when you can't make it to class.)
* Wednesday's "Talk" at 4:30: A CS video: E.C. for attending and sending me a reflective comment

Overview:
* Why we need a Follow table
* Building the Follow table
* Building the parse table
* Typical grammar problems:
  * Similar right-hand sides
  * Left recursion
* Correcting those problems

/Sample Grammars/
Grammar One:
R1: s ::= b s X
R2: s ::= A
R3: b ::= epsilon
R4: b ::= B

Grammar Two:
R1: s ::= A s D
R2: s ::= b c 
R3: b ::= epsilon
R4: b ::= B
R5: c ::= epsilon
R6: c ::= C

Grammar TwoPrime: (Add the following rule)
R7: c ::= D

Grammar Three:
R01: s ::= b X
R02: s ::= A C
R03: s ::= c d
R04: b ::= epsilon
R05: b ::= B
R06: c ::= d e
R07: c ::= epsilon
R08: d ::= D
R09: d ::= epsilon
R10: e ::= A e e
R10: e ::= E 

Reminder: Predictive parsing is top-down, left-to-right
  Expansion of leftmost nonterminal is done completely through table lookup

How do we build the table?
* Easy thing: 
  * For each production, p_i, of the form (n ::= rhs)
      For each terminal, T, in first(rhs)
        TAB[n,T] = p_i
* Background question: What if exists a terminal, S, s.t. S is in two rhs's.
  * The grammar is not parsable using this technique
  * You can "left factor" the grammar
* Hard thing: What if rhs (in p_i: n ::= rhs) is nullable?
  * Background question: What if multiple RHS's are nullable?
    * Does it really matter which one you choose?  Yes, if you assign meaning to the parse tree
    * Such grammars are ambiguous
    * We'll require that grammars not have two nullable rhs's for the same lhs.
  * One naive strategy: For any cells not filled in, fill in the "nullable" rhs.
  * A "better" strategy: For any terminals, T, that can follow n, TAB[n,T] = p_i
    * First: Catches errors earlier
    * Sometimes necessary to report errors in the grammar

Here's grammar two

R1: s ::= A s D
R2: s ::= b c 
R3: b ::= epsilon
R4: b ::= B
R5: c ::= epsilon
R6: c ::= C

First[s] = { A, B, C }	Nullable[s] = true
First[b] = { B }	Nullable[b] = true
First[c] = { C }	Nullable[c] = true

First[s] includes first(bc)
  first(bc) = first(b) union (if Nullable[b] then first(c))
When do we apply rule R2?  When we're trying to match an s and
   (1) we see B or C.  (By the "first(bc)" guidelines)
or (2) we're at the end of the input.
or (3) when we see a D

Let's watch this in action

Suppose input is A D ...

s =R1> A s D =matchA> s D =R2> b c D =R3> c D =R5> D =MatchD> END

Suppose input is A E ... (NOT IN OUR LANGUAGE)

s =>R1> A s D =matchA> s D 
  Crash!  E is not in Follow[s], so there's a problem

A slight variant

R1: s ::= A s D
R2: s ::= b c 
R3: b ::= epsilon
R4: b ::= B
R5: c ::= epsilon
R6: c ::= C
R7: c ::= D

First[s] = { A,B,C,D }	Nullable[s] = true
First[b] = { B }	Nullable[b] = true
First[c] = { C,D }	Nullable[c] = true

TAB[s,A] = R1	TAB[s,B] = R2	TAB[s,C] = R2	TAB[s,D] = R2
  TAB[s,EOI] = R2 (for nullification)
  TAB[s,D] = R2 (for nullification)

TAB[c,A] = CRASH  TAB[c,B] = CRASH  TAB[c,C] = R6 	TAB[c,D] = R7
  TAB[c,EOI] = R5
  TAB[c,D] = R5
    Can EOI every follow c in a valid string?  Yes, by R2

Why would you ever need to apply R5 upon seeing a D?
	Input: AD$
Why would you ever need to apply R7 upon seeing a D?
	Input: ADD$

How do you compute the Follow table?

Initialize:
* Follow[n] = empty_set for all nonterminals n
* Follow[start_symbol] = end-of-input ($)

REPEAT
For each rule n ::= rhs
  For each "splitting" of the rhs into alpha m beta (m is a nonterminal)
    Follow[m] := Follow[m] union first(beta)
    if nullable[beta]
      Follow[m] := Follow[m] union Follow[n]
UNTIL YOU STOP MAKING CHANGES

Some grammars are bad for predictive parsing:
* Left-overlap (Solution: Left factor)
	s ::= A s
	s ::= A b
  Rewrite as
	s ::= A extras
	extras ::= s
	extras ::= b
* Left-recursive
	s ::= A x
	s ::= s B
        x ::= ...
  Make right recursive
    s => s s s s s s s s s s s s B
    s => A x  A x  A x  A x  A x  A x  A x  A x  A x  A x  A x  A x  B
    s is supposed to represent (A x)+ B
	s ::= axstar A x B
 	axstar ::= A x axstar
	axstar ::= epsilon