Held Friday, October 4, 2002
Summary
Today we consider an alternate form of parsing, shift-reduce parsing.
Shift-reduce parsing can be applied to many more grammars than can
predicitve parsing.
Notes
- I've finally figured out how to stop vim (vi-munged) from
highlighting my search text everywhere it appears. Yay!
- I've rearranged the topics in the syllabus to accomodate the extra
time we spent on predictive parsing.
- I'm not assigning the parser until next week so that you have time
to student other parsing techniques.
- You should, however, use the weekend to read the Red Dragon book
to gain further understanding.
- The Putnam Mathematics Exam (a full day of challenging mathematical
problems) is on Saturday, 7 December, from 9am to 5pm.
- Email gumben@grinnell.edu
to sign up.
- You're guaranteed to do at least as well as I did when I took it
twenty years ago.
Due
Overview
- Introduction to shift-reduce parsing
- Sample shift-reduce parsers
- A non-predictive language
- A simple expression
- LR(0) parsers: Basic shift-reduce parsers
- Note also that recursive descent parsers are, in effect, top-down (you start
with the start symbol and attempt to derive the string).
- We can gain some power by starting at the bottom and working our way up.
- The most common bottom-up parsers are the so-called
shift-reduce parsers. These parsers examine the input
tokens and either shift them onto a stack or
reduce the top of the stack, replacing a right-hand side
by a left-hand side.
- Think about the sample grammar for
anbn|ancn, which
we write as
S ::= B
| C
B ::= a B b
B ::= epsilon
C ::= a B c
C ::= epsilon
- Here's a procedure for matching the language
- If you see an a with nothing on the stack, shift a onto the stack.
- If you see an a with a on the stack, shift a onto the stack.
- If you see a b with an a on top of the stack, reduce the epsilon on
top of the stack to B (which goes back on the stack)
- If you see a b with [a B] on top of the stack, shift the
b onto the stack.
- If you have [a B b] on top of the stack, reduce it to B.
- If you see a c with an a on top of the stack, reduce the epsilon on
top of the stack to C (which goes back on the stack)
- If you see a c with [a C] on top of the stack, shift the
c onto the stack.
- If you have [a C c] on top of the stack, reduce it to C.
- If you hit the end of the string with B on top of the stack,
reduce it to S.
- If you hit the end of the string with C on top of the stack,
reduce it to S.
- If you hit the end of the string with S the only value on the
stack, accept.
- In any other case, crash and burn.
- Here's another example using the expression 2 + 3 * 4 + 5.
- We shift 2 onto the stack:
[2]
- We reduce the 2 on the stack to a Factor:
[Factor]
- We reduce the Factor on the stack to a Term:
[Term]
- We reduce the Term on the stack to an Expression:
[Expression]
- We shift the plus onto the stack
[Expression PLUS]
- We shift the 3 onto the stack:
[Expersion PLUS 3]
- We reduce the 3 to a Factor:
[Expression PLUS Factor]
- We reduce the Factor to a Term:
[Expression PLUS Term]
- We shift the multiplication symbol onto the stack:
[Expression PLUS Term TIMES]
- We shift the 4 onto the stack:
[Expression PLUS Term TIMES 4]
- We reduce the 4 to a Factor:
[Expression PLUS Term TIMES Factor]
- We reduce the Term TIMES Factor to a Term:
[Expression PLUS Term]
- We reduce the Expression PLUS Term to an Expression:
[Expression]
- We shift the plus onto the stack:
[Expression PLUS]
- We shift the number onto the stack
[Expression PLUS 5]
- We reduce the 5 to a Factor:
[Expression PLUS Factor]
- We reduce the Factor to a Term:
[Expression PLUS Term]
- We reduce the Expression PLUS Term to an Expression:
[Expression]
- We reduce the expression to the start symbol.
- Note that it seems that we made different decisions in the same
context. But the context is not the same. Decisions are
based on what is on the stack and what the next input tokens are.
- In some ways, these parsers can be viewed as finite automata that
use a stack (also known as push-down automata).
- Shift reduce parsing is traditionally done with LR(k) parsers.
The first L stands for
left-to-right traversal of the input
, the
next R stands for rightmost derivation
and the k stands for
number of characters of lookahead
.
- You should be able to understand the traversal and lookahead.
- The rightmost refers to the types of derivations that the
grammar represents. In a rightmost top-down derivation from the
start symbol, you always replace the rightmost nonterminal.
- While LR parsers are bottom up, they simulate rightmost top-down
derivations
- The simplest LR parsers are based on LR automata with no lookahead.
-
Wait!
You may say, How can we build a parser with no
lookahead?
- It turns out the the 0 refers to the lookahead used in
constructing the automata, not (in this case) to the
lookahead in running them.
- The states of all LR parsers are sets of extended productions (also called
items), representing the states of all possible parses of the input
(more or less)
- Each production is extended with a position (where in the parse we may
be)
- Each production is may be extended with lookahead symbols (none in
LR(0) automata).
- We begin building an LR(0) parser by augmenting the grammar with a
simple rule in which we add an
end of input
($) to the start symbol.
- The initial state of an LR(0) parser begins with
- This means
when we begin parsing, we are ready to match an S
and the end-of-input symbol
- We then augment this with the other things that we might match on our
way to building an S and end of input. What are those things? Anything
that might build us an S. What have we seen of those things? Nothing,
yet.
- For our expression grammar, we might write
S' ::= . E $
E ::= . E + T
E ::= . E - T
E ::= . T
- Of course, in this case, if we're waiting to see a T, then we also need
to add the T rules (and then the F rules).
S' ::= . E $
E ::= . E + T
E ::= . E - T
E ::= . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
- We make new states in the automaton by choosing a symbol (terminal
or nonterminal) and advancing the
here mark
(period) over that
symbol in all rules, and then filling in the rest.
- For example, if we see an E in state 0, we could be
S' ::= E . $
E ::= E . + T
E ::= E . - T
- If we see a plus sign in that state, we could only be making
progress on one rule, giving us
- But now, we're ready to see a
T
, so we need to fill in
all the items that say ready to see a T
E ::= E + . T
T ::= . T mulop F
T ::= . F
F ::= . id
F ::= . num
F ::= . ( E )
- If we do indeed see a T, we advance the
here mark
and get to
E ::= E + T .
T ::= T . mulop F
- The first part suggests that we may be at the end of an
E
. The
second suggests that we may be in the midst of a T
.
How do we decide which it is? By context (and a little lookahead
in some cases).
Thursday, 29 August 2002
- First version, based somewhat on outlines from
CS362 2001S.