# Class 08: Finite Automata

Back to Regular Expressions. On to From Specification to Optimal DFA (1).

This outline is also available in PDF.

Held: Monday, 12 September 2011

Summary: Today we consider finite automata, which serve as another, more computational, representation of data for lexical analysis.

Related Pages:

• EBoard.
• Reading: Aho et al., 3.6.

Notes:

• Don't forget that you are supposed to drop me a short note reflecting on my notes on your first assignment.
• Do you have to do the second assignment? No. But if you don't do the assignment, you will get a zero on the assignment, which is between two and five points off on your final grade. In addition, it is likely that you will find the next assignment harder.
• Can you use extended regular expression syntax on the second assignment? See Piazza.
• EC for President's speech on Thursday at 11.
• EC for Walker Group's speech on Thursday at 4:30.

Overview:

• Finite Automata.
• Deterministic Finite Automata.
• Examples
• Nondeterministic Finite Automata.

## Finite Automata

• Regular expressions provide a simple, easy to understand, and elegant mechanism for describing the tokens of a language.
• Unfortunately, it is difficult to compute with regular expressions.
• Do you know how to execute a regular expression?
• How about a set of regular expressions?
• Regular expressions (which we like to write) are declarative, not imperative. That is, they say what we want to match, but now how.
• Hence, we will also consider a more computational approach to lexical analysis: finite automata.
• What are finite automata?
• They are very simple computing machines (simple enough that there are things that they cannot compute that more complex computing machines can compute).
• What do they compute?
• Typically, they compute membership in simple languages.
• That is, given a string, they determine whether or not that string is in the language.
• As with regular expressions, they can be modified to find tokens in larger sequences of symbols.
• Because they are simple, finite automata can easily be translated to machine code.
• Essentially, a finite automaton is a series of states and transitions.
• Each state represents knowledge about the part of the string seen so far.
• Each transition indicates how state changes depending on the next input character.
• We will often draw finite automata as simple graphs (states are nodes, transitions are labeled edges).
• Each state is represented by a circle.
• Final states have a double circle.
• The start state has an incoming arrow.
• Sigma is not represented explicitly.
• Formally, a deterministic finite automaton is a five tuple <Q,Sigma,delta,q0,F> where
• Q is a set of states;
• Sigma is the base alphabet, that is a finite set of symbols;
• delta is a transition function, a function from state/symbol pairs to states;
• q0 in Q is the start state;
• F subset Q is the set of final states.
• How do we use an automaton to determine whether a string is in the language given by that automaton? Using a matching function.
```/**
* Determine whether a string is in the language given by the
* current automaton.
*/
public boolean inLanguage(String candidate)
begin
State current_state = q0;
for each symbol sym in candidate
current_state = delta(current_state, sym)
if current_state is undefined return false
end for
return (current_state is a final state);
end inLanguage
```
• Alternately, one could end with
• `return the token associated with the current state`
• `execute the code associated with the current state`
• As this example suggests, it is also helpful to have an explicit or implicit failure state.
• You may find it easier to represent some tokens with finite automata and others with regular expressions.
• They have the same computational power.
• Neither can represent the language strings of `a`'s and `b`'s with equal numbers of `a`'s and `b`'s.
• Neither can represent palindromes.

## Some Examples

• We'll try doing each of these as a regular expression and then as a DFA.
• "Strings that start with a, have an arbitrary number of b's and end with a"
• "Strings that start and end with a".
• "Comments in C" (to simplify things notationally: strings that start with ab, end with ba, and have no intervening ba's).

## Nondeterministic Finite Automata

• It turns out that it is interesting and useful to consider a variant of finite automata in which
• some transitions are free, in that they do not require any symbols in order to make the transition. Typically such edges are labeled with epsilon.
• there may be multiple edges with the same label exiting the same node
• Such finite automata are often called nondeterministic finite automata or NFAs.
• When is a string in the language given by such an automaton? When there is some path through the automaton that uses the appropriate symbols and free transitions. [We'd like the computer to guess the correct path.]
• Do we gain any power from this nondeterminism? Surprisingly, no. As we'll see next class, nondeterministic finite automata can easily be converted to deterministic finite automata.

• Why is all this important?
• Regular expressions are declarative - they describe what, not how.
• Finite automata, on the other hand, include a nice procedure for matching.
• Fortunately, with a little bit of effort, you can turn any regular expression (or set of regular expressions) into a DFA.
• Convert each regular expression to an NFA.
• Annotate the final state for each regular expression with the token type or other useful information.
• Join all of the NFA's together. That gives a new NFA.
• Convert the NFA to a DFA.
• We'll do all that over the next few classes. If you are feeling particularly energetic, you can write a program to do it for you.

Back to Regular Expressions. On to From Specification to Optimal DFA (1).

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Wed Dec 7 10:26:32 2011.
The source to the document was last modified on Fri Aug 26 13:03:12 2011.
This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2011F/Outlines/outline.08.html`.

Samuel A. Rebelsky, rebelsky@grinnell.edu