# Class 11: Parsing Expressions

Back to Ambiguous Grammars. On to Predictive Parsing.

Held Wednesday, February 14, 2001

Summary

Today we consider the traditional grammar for expressions and mechanisms for making that grammar unambiguous.

Notes:

• Are there questions on phase 1 of the project?
• We'll observe a key issue in lexer design today.
• I think I promised you a quiz for today.

Overview

• A basic expression grammar
• Ambiguity in that grammar

## An Expression Grammar

• Let us consider a grammar one might use to define simple arithmetic expressions over variables and numbers.
• Each number is an expression. We get numbers from the lexer.
```Exp ::= NUMBER
```
• Each identifier is an expression (we won't worry about types right now).
```Exp ::= ID
```
• Each application of a unary operator to an expression is an expression.
```Exp ::= UnOp Exp
```
• We can get the legal unary operators from the lexer or we can handle them here:
```UnOp ::= PLUS
UnOp ::= MINUS
```
• The application of a binary operator to two expressions is an expression.
```Exp := Exp BinOp Exp
```
• And there are a number of binary operators. For shorthand, we'll write a vertical bar (representing "or") to show alternates .
```BinOp ::= PLUS | MINUS | TIMES | DIVIDE
```
• A parenthesized expression is an expression
```Exp ::= OPEN Exp OPEN
```
• How do we show that `a+b*2` is an expression? First, we observe that the lexer converts this to
`identifier + identifier * number`
• The derivation follows. A right arrow (=>) is used to indicate one derivation step.
```Exp =>                              // Exp ::= Exp BinOp Exp
Exp BinOp Exp =>                  // Exp ::= ID
ID BinOp Exp =>                   // BinOp ::= PLUS
ID PLUS Exp =>                    // Exp ::= Exp BinOp Exp
ID PLUS Exp BinOp Exp =>          // Exp ::= NUMBER
ID PLUS Exp BinOp NUMBER =>       // Exp ::= ID
ID PLUS ID BinOp NUMBER =>        // Binop ::= TIMES
ID PLUS ID TIMES NUMBER
```
• Note that we can have multiple choices at each step. For example, we might have expanded the second `Exp` rather than the first in `Exp BinOp Exp`.
• To describe these simultaneous choices, we often write visual descriptions of context-free derivations using trees. The interior nodes of a tree are the nonterminals. A derivation is shown by connecting a node (representing the lhs) to children representing the rhs of a derivation.
```                    Exp
/  |  \
/   |   \
Exp  BinOp  Exp------
/      |    | |      \
/       |    |  \      \
ID      PLUS  Exp  BinOp  Exp
/      |     |
/       |     |
ID      TIMES NUMBER
```

## Ambiguity in the Expression Grammar

• The standard expression grammar is ambiguous. Why? Because there are multiple ways to parse expressions with two operations.
• Consider `NUMBER PLUS NUMBER TIMES NUMBER`
• It might be parsed (in shorthand) as
```        Exp
/ | \
/  |  \
Exp  +  Exp
|     / | \
|    /  |  \
NUM Exp  *  Exp
|       |
|       |
NUM     NUM
```
• It might also be parsed as
```        Exp
/ | \
/  |  \
Exp  *  Exp
/ | \     |
/  |  \    |
Exp  +  Exp NUM
|       |
|       |
NUM     NUM
```
• Similarly, `NUM MINUS NUM MINUS NUM` might be parsed with two different trees.
• Is this a problem? Yes, in both cases.
• Which tree is correct?
• In the first case, it's the first tree, which shows us doing addition after multiplication. This is because multiplication has precedence over addition.
• In the second case, it's the second tree, which shows us doing the second subtraction second. This is because subtraction is left associative.
• How do we resolve the problems? It turns out that it's easiest to resolve the two problems separately.
• We're going to ignore the unary operators for this discussion.

## Disambiguating the Expression Grammar

### Adding Precedence to the Expression Grammar

• How do we handle precedence?
• By considering what's wrong with the misparsed tree.
• By dividing expressions up into categories to eliminate such problems.
• What's wrong with that tree?
• There's a subtree of a "multiplication tree" that contains unparenthesized addition.
• What does this suggest about categories? That we need a category for "does not include unparenthesized addition". We'll call this category `Term`.
• What is a `Term`? Something that may include multiplication but does not include no unparenthesized addition.
• What other categories do we need? We also need a category for stuff that can include addition.
• It could be "stuff that includes unparenthesized addition". We might call this `AddExp`
• It could be "stuff that may (but need not) include unparenthesized addition".
• Which do you prefer? We'll use the second, since it's close to our definition of `Term`.
• We observe that there are two possibilities for `Term`.
• Something that includes multiplication as the "top level" operation.
• Something that doesn't include multiplication as the "top level" operation. We'll call this `Factor`.
• What expressions include multiplication as the top level operation? We need multiplication. We also need safe "arguments". But we've defined such safe arguments as `Term`s.
```Term ::= Term MulOp Term
```
• What are the other `Term`s? Those that don't include multiplication. We've already called those `Factor`s.
```Term ::= Factor
```
• What are the factors? The expressions that include parentheses and the base expressions.
```Factor ::= OPEN Exp CLOSE
Factor ::= NUM
Factor ::= ID
```
• Now, let's return to general expressions. These may or may not include addition at the root.
• As with `Term`, we can separate them into those that do and those that don't.
```Exp ::= Exp AddOp Exp
Exp ::= Term
```
• Have we eliminated the problem with precedence? If we try to misparse our original expression, we'll find that we can't even parse it.
• Do we have the same language? Informally, yes. It's clear that we haven't added any strings. Have we removed any? No, but I wouldn't want to try to argue that.

### Adding Associativity to the Expression Grammar

• Have we solved the associativity problem? No. You'll observe that we can still misparse the "double subtraction" expression (although with a different tree).
• How do we handle precedence?
• By considering which tree is correct.
• By considering why the incorrect tree is incorrect.
• By trying to change the grammar to eliminate the incorrect tree.
• Why is the incorrect tree incorrect? Because there is unparenthesized subtraction to "the right of" subtraction.
• What should we do? Come up with a category for "no unparenthesized subtraction".
• Do we have such a category? Yes, we called it `Term`.
```Exp ::= Exp AddOp Term
```
• Are there other similar areas we must worry about?. Yes, we'll need to worry about repeated division (or multiplication). We can use the same strategy.
```Term ::= Term MulOp Factor
```
• Have we changed the language? No, but the proof is more difficult.

### The Final Grammar

• Here is our unambiguous grammar, so far.
```Exp ::= Exp AddOp Term
|  Term
Term ::= Term MulOp Factor
|  Factor
Factor ::= OPEN Exp OPEN
|  NUM
|  ID
```
• We could further extend this grammar in many ways.
• Include the unary operators. Note that each at most one unary operator is typically used (e.g., `-+num` is meaningless, as is `---num`).
• Include exponentiation. Is exponentiation left or right associative?
• Include other operations.
• Function calls.
• ...
• These extensions are left as an exercise to the reader.

## History

Monday, 22 January 2001

• Created as a blank outline.

Back to Ambiguous Grammars. On to Predictive Parsing.

Disclaimer: I usually create these pages on the fly. This means that they are rarely proofread and may contain bad grammar and incorrect details. It also means that I may update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This page was generated by Siteweaver on Mon Apr 30 10:51:51 2001.
This page may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2001S/outline.11.html`.