CSC362 2004S, Class 20: Shift-Reduce Parsing (1): Introduction

Admin:
* Thoughts on yesterday's lab?
* Questions on phase 2?
* Come to tonight's show with "Really Bland Lighting"
* Today: Move ahead to shift-reduce parsing.  Please read the textbook.

Overview:
* The basic concept
* An example: a<sup>n</sup>(b<sup>n</sup>|c<sup>n</sup>)
* An example: Expressions
* Building shift-reduce parsers

Some languages are not predictively parsable, such as

A^n(B^n|C^n) n>=0

a ::= b
a ::= c
b ::= A b B
b ::= epsilon
c ::= A c C
c ::= epsilon

Left factor!

a ::= A more
a ::= epsilon
more ::= b B
more ::= c C
b ::= A b B
b ::= epsilon
c ::= A c C
c ::= epsilon

Left factor!

a ::= A more
a ::= epsilon
more ::= A moremore
more ::= B
more ::= C
moremore ::= b B B
moremore ::= c C C
b ::= A b B
b ::= epsilon
c ::= A c C
c ::= epsilon

Broad picture: Distinguising betweek A^k B^k and  A^k C^k requires k+1 lookahead
* No matter what lookahead you choose for your predictive parser, there exist pairs of strings you cannot predictively parse

Ideas: 
* Build trees bottom-up instead of top-down
  * Convert right-hand-sides to left-hand-sides (rather than attempting to expand a left-hand side)
* Use a stack to keep track of the things not yet (partially converted)
* Magically build a table that, given a next input symbol and knowledge of what's on the stack, picks "something" to do next
  * "shift" the next input token onto the stack
  * "reduce" the top few elements of the stack (a rhs) to the corresponding lhs
* If we exhaust the input and end up with the start symbol on the stack, we win

a ::= b
a ::= c
b ::= A b B
b ::= epsilon
c ::= A c C
c ::= epsilon

(input: A, top: anything): shift
(input: B, top: A): reduce b ::= epsilon (push 'b')
(input: B, top: A b B): reduce b ::= A b B (pop A b B, push b)
(input: EOI, top: A b B): reduce b ::= A b B (pop A b B, push b)
(input, B, top: b): shift B
(input: C, top: A): reduce c ::= epsilon (push 'c')
(input: C, top: A c C): reduce c ::= A c C (pop A c C, push c)
(input: EOI, top: A c C): reduce c ::= A c C (pop A c C, push c)
(input: C, top: c): shift C
(input: EOI, top: b): reduce a ::= b
(input: EOI, top: c): reduce a ::= c

(input: NUM, stack: <em>anything</em>) -> shift
(input: ADDOP, stack: NUM) -> reduce NUM to factor
(input: ADDOP, stack: term MULOP factor) -> reduce term MULOP factor to term
(input: ADDOP, stack: factor) -> reduce factor to term
(input: ADDOP, stack: expression ADDOP term) -> reduce expression ADDOP term to expression
(input: ADDOP, stack: term) -> reduce term to expression
(input: ADDOP, stack: expression) -> shift
(input: MULOP, stack: NUM) -> reduce NUM to factor
(input: MULOP, stack: term MULOP factor) -> reduce term mulop factor to term
(input: MULOP, stack: factor) -> reduce factor to term
(input: MULOP, stack: term) -> shift
There's some related stuff for end-of-input not shown.

Goal of shift-reduce parsing: Write an algorithm that converts a
grammar to a shift-reduce table.

Key idea similar to NFA to DFA conversion:
* States of the DFA are power sets of states of the NFA
* Here, we'll build an automaton in which states of the automaton are sets of "annotated productions" - "You could have seen this much of the rhs at this point"
* Edges are both terminals and nonterminal

Hackzored