CSC362 2004S, Class 17: Predictive Parsing (4): Wrapup Admin: * The late list returns: Sam, Piper (Grrr!!!), Ananta, Jonathan (???) * Questions on your parsers? * Does Sam have an attendace policy? * Yes! Attend or else. (Please email me when you can't make it to class.) * Wednesday's "Talk" at 4:30: A CS video: E.C. for attending and sending me a reflective comment Overview: * Why we need a Follow table * Building the Follow table * Building the parse table * Typical grammar problems: * Similar right-hand sides * Left recursion * Correcting those problems /Sample Grammars/ Grammar One: R1: s ::= b s X R2: s ::= A R3: b ::= epsilon R4: b ::= B Grammar Two: R1: s ::= A s D R2: s ::= b c R3: b ::= epsilon R4: b ::= B R5: c ::= epsilon R6: c ::= C Grammar TwoPrime: (Add the following rule) R7: c ::= D Grammar Three: R01: s ::= b X R02: s ::= A C R03: s ::= c d R04: b ::= epsilon R05: b ::= B R06: c ::= d e R07: c ::= epsilon R08: d ::= D R09: d ::= epsilon R10: e ::= A e e R10: e ::= E Reminder: Predictive parsing is top-down, left-to-right Expansion of leftmost nonterminal is done completely through table lookup How do we build the table? * Easy thing: * For each production, p_i, of the form (n ::= rhs) For each terminal, T, in first(rhs) TAB[n,T] = p_i * Background question: What if exists a terminal, S, s.t. S is in two rhs's. * The grammar is not parsable using this technique * You can "left factor" the grammar * Hard thing: What if rhs (in p_i: n ::= rhs) is nullable? * Background question: What if multiple RHS's are nullable? * Does it really matter which one you choose? Yes, if you assign meaning to the parse tree * Such grammars are ambiguous * We'll require that grammars not have two nullable rhs's for the same lhs. * One naive strategy: For any cells not filled in, fill in the "nullable" rhs. * A "better" strategy: For any terminals, T, that can follow n, TAB[n,T] = p_i * First: Catches errors earlier * Sometimes necessary to report errors in the grammar Here's grammar two R1: s ::= A s D R2: s ::= b c R3: b ::= epsilon R4: b ::= B R5: c ::= epsilon R6: c ::= C First[s] = { A, B, C } Nullable[s] = true First[b] = { B } Nullable[b] = true First[c] = { C } Nullable[c] = true First[s] includes first(bc) first(bc) = first(b) union (if Nullable[b] then first(c)) When do we apply rule R2? When we're trying to match an s and (1) we see B or C. (By the "first(bc)" guidelines) or (2) we're at the end of the input. or (3) when we see a D Let's watch this in action Suppose input is A D ... s =R1> A s D =matchA> s D =R2> b c D =R3> c D =R5> D =MatchD> END Suppose input is A E ... (NOT IN OUR LANGUAGE) s =>R1> A s D =matchA> s D Crash! E is not in Follow[s], so there's a problem A slight variant R1: s ::= A s D R2: s ::= b c R3: b ::= epsilon R4: b ::= B R5: c ::= epsilon R6: c ::= C R7: c ::= D First[s] = { A,B,C,D } Nullable[s] = true First[b] = { B } Nullable[b] = true First[c] = { C,D } Nullable[c] = true TAB[s,A] = R1 TAB[s,B] = R2 TAB[s,C] = R2 TAB[s,D] = R2 TAB[s,EOI] = R2 (for nullification) TAB[s,D] = R2 (for nullification) TAB[c,A] = CRASH TAB[c,B] = CRASH TAB[c,C] = R6 TAB[c,D] = R7 TAB[c,EOI] = R5 TAB[c,D] = R5 Can EOI every follow c in a valid string? Yes, by R2 Why would you ever need to apply R5 upon seeing a D? Input: AD$ Why would you ever need to apply R7 upon seeing a D? Input: ADD$ How do you compute the Follow table? Initialize: * Follow[n] = empty_set for all nonterminals n * Follow[start_symbol] = end-of-input ($) REPEAT For each rule n ::= rhs For each "splitting" of the rhs into alpha m beta (m is a nonterminal) Follow[m] := Follow[m] union first(beta) if nullable[beta] Follow[m] := Follow[m] union Follow[n] UNTIL YOU STOP MAKING CHANGES Some grammars are bad for predictive parsing: * Left-overlap (Solution: Left factor) s ::= A s s ::= A b Rewrite as s ::= A extras extras ::= s extras ::= b * Left-recursive s ::= A x s ::= s B x ::= ... Make right recursive s => s s s s s s s s s s s s B s => A x A x A x A x A x A x A x A x A x A x A x A x B s is supposed to represent (A x)+ B s ::= axstar A x B axstar ::= A x axstar axstar ::= epsilon