# Class 04: Exploring Compilation

Back to Parameter-Passing Strategies. On to Lexical Analysis.

This outline is also available in PDF.

Held: Friday, 2 September 2011

Summary: We explore compilation with an example, some discussion, and some laboratory exercises.

Related Pages:

Notes:

Overview:

## A Compilation Example

Let's explore the stages of compilation with a simple example (although one a little bit more uniform than the ones in the book).

### Source Code

You should find this algorithm familiar.

```var
x,z: real;
i: integer;
begin
write("x:");
readln(x);
write("i:");
readln(i);
z :=  1;
while (i > 0) do
begin
if (odd(i)) then
z := z*x;
i := i div 2;
x := sqr(x);
end;
println(z);
```

### Tokenizing (Lexical Analysis)

```VAR     ID(x)   COMMA   ID(z)   COLON   REAL    SEMI    ID(i)
COLON   INTEGER SEMI    BEGIN   WRITE	LPAREN	STR(x:) RPAREN
SEMI	READLN	LPAREN	ID(x)	RPAREN	SEMI	WRITE	LPAREN
STR(i:)	RPAREN	SEMI	READLN	LPAREN	ID(i)	RPAREN	SEMI
ID(z)   ASSIGN  INT(1)	SEMI    WHILE   LPAREN  ID(i)   GT
INT(0)  RPAREN  DO	BEGIN   IF      LPAREN  ID(odd) LPAREN  ....
```

Note: We might use `ID(integer)` and `ID(real)` instead of `INTEGER` and `REAL`. We might also used `ID(write)` and `ID(readln)` rather than `WRITE` and `READLN`.

### Parsing (Syntactic Analysis)

```...
\
StatementList
\
StatementList
/       \
Statement       StatementList
|             /       \
Assignment    Statement    StatementList
/     \          |          /        \
ID(z)  INT(1)  WhileLoop   Statment    NIL
/         \         |
GT        Compound    Output
/  \          |
ID(i) INT(0) StatementList
/       \
Statement
|
Conditional
/     \
```

### An AST

```Assignment -> Output -> Input -> Output -> Input -> WhileLoop -> Output -> NIL
/   \         |         |        |         |       /  \          |
ID(z) INT(1)  STR(x:)   ID(x)    STR(i)    ID(i)   GT    \       ID(z)
/  \    \
ID(i) INT(0) Conditional
-> Assignment
-> Assignment
-> NIL
```

### Type Checking (Semantic Analysis)

```Assignment -> WhileLoop -> Output -> NIL
real           -          real
/   \         /  \          |
ID(z) INT(1)  GT    \       ID(z)
real  int   int     \      real
/  \     \
ID(i) INT(0) Conditional -> Assignment -> Assignment -> NIL
int   int
```

### Intermediate Code Generation

```.DATA	S0	"x:"
.DATA	S1	"i:"
SWRITE	S0
FREAD	X			# readln(X)
SWRITE	S1
IREAD	I
I2F     V1      \$0              # v1 = real(0)
FASSIGN Z       V1              # z = v1
LABEL   LOOP1
ICMP    V2      I       \$0      # v2 = i > 0
JMPZERO V2      END1            # if (! v2) goto END1
LABEL   BODY1
IPUSH   I
ICALL   V3      ODD             # v3 = ODD(i)
JMPZERO V3      ALT2            # if (! v3) goto ALT2
LABEL   CONS2
FMULT   V4      Z       X       # v4 = z * x
FASSIGN Z       V4              # z = v4
LABEL   ALT2
IDIV    V5      I       2       # v5 = i div 2
IASSIGN I       V5              # i = v5
...
GOTO    LOOP1
LABEL   END1
FPUSH   Z
CALL    P_WRITE
```

### Code Optimization Improvement

• Assignment propagation
• From
```I2R     V1      \$0              # v1 = real(0)
FASSIGN Z       V1              # z = v1
```
• To
```I2R     Z       \$0              # Z = real(0)
```
• Convert constant types
• From
```I2R     Z       \$0              # Z = real(0)
```
• To
```FASSIGN Z       \$0.0            # Z = real(0)
```
• Call inlining
• Loop unrolling
• Etc.

### Convert to Assembly

• From
```IPUSH   I
ICALL   V3      ODD             # v3 = ODD(i)
```
• To
```movl 8(%ebp), 0(%esp)           # Assume that i is 8 from the base
subl \$4, %esp
call odd
movl -4(%esp), 20(%ebp)         # Assume that v3 is 20 from the base
```

### Code Optimization Improvement

• Liveness analysis and register allocation
```movl %eax, 0(%esp)              # i goes in register a
subl \$4, %esp
call odd
movl -4(%esp), %ebx             # v3 goes in register b
```
• Loop unrolling
• Etc.

## Describing Languages

• So, how do we describe languages to do all this?
• And how do we use those descriptions to implement languages?
• We use one language to describe the tokens.
• Lexical analysis is often straightforward.
• We use BNF to describe the syntax.
• We often rewrite our BNF grammars to deal with things that are hard to deal with computationslly, such as ambiguity and left recursion.
• There are standard ways to turn certain BNF grammars into parsers.
• We add semantic actions to add meaning
• Type checking
• Code generation
• There are standard ways to turn annotated BNF grammars into translators.
• Note: As this suggests, some of the steps above are theoretical rather than actual; for example, we don't really build the parse tree or AST.

## Chapter 2

• What did you see as the other highlights of chapter 2?
• What questions do you have on chapter 2?

Back to Parameter-Passing Strategies. On to Lexical Analysis.

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed Dec 7 10:26:31 2011.
The source to the document was last modified on Fri Aug 26 13:03:12 2011.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2011F/Outlines/outline.04.html`.

Samuel A. Rebelsky, rebelsky@grinnell.edu