# Class 14: Parsing Expressions

Back to Ambiguous Grammars. On to Predictive Parsing (1).

Held: Friday, 20 February 2004

Summary: Today we consider the traditional grammar for expressions and mechanisms for making that grammar unambiguous.

Related Pages:

Notes:

• Are there questions on Phase 2 of the project?
• Any thoughts on yesterday's lab?
• As I warned you yesterday, I may be a few minutes late for today's class.

Overview:

• A basic expression grammar.
• Ambiguity in that grammar.

## An Expression Grammar

• Let us consider a grammar one might use to define simple arithmetic expressions over variables and numbers.
• Each number is an expression. We get numbers from the lexer.
```exp ::= NUMBER
```
• Each identifier is an expression (we won't worry about types right now).
```exp ::= ID
```
• Each application of a unary operator to an expression is an expression.
```exp ::= unop exp
```
• We can get the legal unary operators from the lexer or we can handle them here:
```unop ::= PLUS
unop ::= MINUS
```
• The application of a binary operator to two expressions is an expression.
```exp := exp binop exp
```
• And there are a number of binary operators. For shorthand, we'll write a vertical bar (representing "or") to show alternates .
```binop ::= PLUS | MINUS | TIMES | DIVIDE
```
• A parenthesized expression is an expression
```exp ::= OPEN exp CLOSE
```
• How do we show that `a+b*2` is an expression? First, we observe that the lexer converts this to
`ID PLUS ID TIMES NUMBER`
• The derivation follows. A right arrow (=>) is used to indicate one derivation step.
```exp =>                              // exp ::= exp binop exp
exp binop exp =>                  // exp ::= ID
ID binop exp =>                   // binop ::= PLUS
ID PLUS exp =>                    // exp ::= exp binop exp
ID PLUS exp binop exp =>          // exp ::= NUMBER
ID PLUS exp binop NUMBER =>       // exp ::= ID
ID PLUS ID binop NUMBER =>        // Binop ::= TIMES
ID PLUS ID TIMES NUMBER
```
• Note that we can have multiple choices at each step. For example, we might have expanded the second `exp` rather than the first in `exp binop exp`.
• To describe these simultaneous choices, we often write visual descriptions of context-free derivations using trees. The interior nodes of a tree are the nonterminals. A derivation is shown by connecting a node (representing the lhs) to children representing the rhs of a derivation.
```                    exp
/  |  \
/   |   \
exp  binop  exp------
/      |    | |      \
/       |    |  \      \
ID      PLUS  exp  binop  exp
/      |     |
/       |     |
ID      TIMES NUMBER
```

## Ambiguity in the Expression Grammar

• The standard expression grammar is ambiguous. Why? Because there are multiple ways to parse expressions with two operations.
• Consider `NUMBER PLUS NUMBER TIMES NUMBER`
• It might be parsed (in shorthand) as
```        exp
/ | \
/  |  \
exp  +  exp
|     / | \
|    /  |  \
NUM exp  *  exp
|       |
|       |
NUM     NUM
```
• It might also be parsed as
```        exp
/ | \
/  |  \
exp  *  exp
/ | \     |
/  |  \    |
exp  +  exp NUM
|       |
|       |
NUM     NUM
```
• Similarly, `NUM MINUS NUM MINUS NUM` might be parsed with two different trees.
• Is this a problem? Yes, in both cases.
• Which tree is correct?
• In the first case, it's the first tree, which shows us doing addition after multiplication. This is because multiplication has precedence over addition.
• In the second case, it's the second tree, which shows us doing the second subtraction second. This is because subtraction is left associative.
• How do we resolve the problems? It turns out that it's easiest to resolve the two problems separately.
• We're going to ignore the unary operators for this discussion.

## Disambiguating the Expression Grammar

### Adding Precedence to the Expression Grammar

• How do we handle precedence?
• By considering what's wrong with the misparsed tree.
• By dividing expressions up into categories to eliminate such problems.
• What's wrong with that tree?
• There's a subtree of a "multiplication tree" that contains unparenthesized addition.
• What does this suggest about categories? That we need a category for "does not include unparenthesized addition". We'll call this category `term`.
• What is a `term`? Something that may include multiplication but does not include no unparenthesized addition.
• What other categories do we need? We also need a category for stuff that can include addition.
• It could be "stuff that includes unparenthesized addition". We might call this `Addexp`
• It could be "stuff that may (but need not) include unparenthesized addition".
• Which do you prefer? We'll use the second, since it's close to our definition of `term`.
• We observe that there are two possibilities for `term`.
• Something that includes multiplication as the "top level" operation.
• Something that doesn't include multiplication as the "top level" operation. We'll call this `factor`.
• What expressions include multiplication as the top level operation? We need multiplication. We also need safe "arguments". But we've defined such safe arguments as `term`s.
```term ::= term mulop term
```
• What are the other `term`s? Those that don't include multiplication. We've already called those `factor`s.
```term ::= factor
```
• What are the factors? The expressions that include parentheses and the base expressions.
```factor ::= OPEN exp CLOSE
factor ::= NUM
factor ::= ID
```
• Now, let's return to general expressions. These may or may not include addition at the root.
• As with `term`, we can separate them into those that do and those that don't.
```exp ::= exp addop exp
exp ::= term
```
• Have we eliminated the problem with precedence? If we try to misparse our original expression, we'll find that we can't even parse it.
• Do we have the same language? Informally, yes. It's clear that we haven't added any strings. Have we removed any? No, but I wouldn't want to try to argue that.

### Adding Associativity to the Expression Grammar

• Have we solved the associativity problem? No. You'll observe that we can still misparse the "double subtraction" expression (although with a different tree).
• How do we handle precedence?
• By considering which tree is correct.
• By considering why the incorrect tree is incorrect.
• By trying to change the grammar to eliminate the incorrect tree.
• Why is the incorrect tree incorrect? Because there is unparenthesized subtraction to "the right of" subtraction.
• What should we do? Come up with a category for "no unparenthesized subtraction".
• Do we have such a category? Yes, we called it `term`.
```exp ::= exp addop term
```
• Are there other similar areas we must worry about?. Yes, we'll need to worry about repeated division (or multiplication). We can use the same strategy.
```term ::= term mulop factor
```
• Have we changed the language? No, but the proof is more difficult.

### The Final Grammar

• Here is our unambiguous grammar, so far.
```exp ::= exp addop term
|  term
term ::= term mulop factor
|  factor
factor ::= OPEN exp CLOSE
|  NUM
|  ID
```
• We could further extend this grammar in many ways.
• Include the unary operators. Note that each at most one unary operator is typically used (e.g., `-+num` is meaningless, as is `---num`).
• Include exponentiation. Is exponentiation left or right associative?
• Include other operations.
• Function calls.
• ...
• These extensions are left as an exercise to the reader.

Back to Ambiguous Grammars. On to Predictive Parsing (1).

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed May 5 11:47:02 2004.
The source to the document was last modified on Tue Jan 20 23:06:45 2004.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2004S/Outlines/outline.14.html`.

You may wish to validate this document's HTML ; ; Check with Bobby

Samuel A. Rebelsky, rebelsky@grinnell.edu