Primary: [Skip To Body] [Front Door] [Current] [Glance] [Honesty] [Instructions] [Links] [Search] [Syllabus]
Sets: [EBoards] [Examples] [Exams] [Handouts] [Homework] [Labs] [Outlines] [Project] [Readings] [Reference]
ECA: [About] [Grades] [Quizzes] [Submit Work] [Change Password] [Reset Password]
Miscellaneous: [2001S] [98F] [SamR] [Glimmer Labs]

Class 05: Finite Automata

Back to Regular Expressions, Continued. On to From Specification to Optimal DFA.

Held Monday, September 9, 2002

Summary

Today we consider finite automata, which serve as another, more computational, representation of data for lexical analysis.

Notes

In lab tomorrow, we'll explore regular expressions in the Unix environment.
In preparation for the first phase of your compiler, you should look for a list of tokens in Pascal.

Due

Regular expression for strings that are not 'if' (or strings that are not 'while') (or strings that are neither 'if' nor 'while').

Overview

Finite automata
Deterministic Finite Automata
Examples
Nondeterministic Finite Automata
Looking ahead

Finite Automata

Regular expressions provide a simple, easy to understand, and elegant mechanism for describing the tokens of a language.
- Unfortunately, it is difficult to compute with regular expressions.
- Hence, we will consider a more computational approach to lexical analysis: finite automata.
What are finite automata?
- They are very simple computing machines (simple enough that there are things that they cannot compute that more complex computing machines can compute).
What do they compute?
- Typically, they compute membership in simple languages.
- That is, given a string, they determine whether or not that string is in the language.
- As with regular expressions, they can be modified to find tokens in larger sequences of symbols.
Because they are simple, finite automata can easily be translated to machine code.
Essentially, a finite automaton is a series of states and transitions.
- Each state represents knowledge about the part of the string seen so far.
- Each transition indicates how state changes depending on the next input character.
Formally, a deterministic finite automaton is a five tuple <Q,Sigma,delta,q0,F> where
- Q is a set of states;
- Sigma is the base alphabet, that is a finite set of symbols;
- delta is a transition function, it is a function from state/symbol pairs to states;
- q in Q is the start state;
- F subset Q is the set of final states.
Typically, we represent finite automata pictorially.
- Each state is represented by a circle.
- The transition function, delta, is represented by directed and labeled edges between states. For example, if (q1,b,q2) is in delta, then there is an edge from q1 to q2 labeled b.
- Final states have a double circle.
- The start state has an incoming arrow.
- Sigma is not represented explicitly.

How do we use an automaton to determine whether a string is in the language given by that automaton? Using a matching function.

/**
 * Determine whether a string is in the language given by the
 * current automaton.
 */
public boolean inLanguage(String candidate)
begin
  State current_state = q0;
  for each symbol sym in candidate
    current_state = delta(current_state ,sym)
    if current_state is undefined return false
  end for
  return (current_state is a final state);
end inLanguage

As this suggests, it is also helpful to have an explicit or implicit failure state.
You may find it easier to represent some tokens with finite automata and others with regular expressions.
- They have the same computational power.
- Neither can represent the language strings of a's and b's with equal numbers of a's and b's.

Some Examples

We'll try doing each of these as a regular expression and then as a DFA.
"Strings that start with a, have an arbitrary number of b's and end with a"
"Strings that start and end with a".
"Comments in C" (to simplify things notationally: strings that start with ab, end with ba, and have no intervening ba's).

Nondeterministic Finite Automata

It turns out that it's interesting and useful to consider a variant of finite automata in which
- some transitions are free, in that they do not require any symbols in order to make the transition. Typically such edges are labeled with epsilon.
- there may be multiple edges with the same label exiting the same node
Such finite automata are often called nondeterministic finite automata or NFAs.
When is a string in the language given by such an automaton? When there is some path through the automaton that uses the appropriate symbols and free transitions. [We'd like the computer to guess the correct path.]
Do we gain any power from this nondeterminism? Surprisingly, no. As we'll see next class, nondeterministic finite automata can easily be converted to deterministic finite automata.

Looking Ahead

Why is all this important?
Because regular expressions (which we like to write), are declarative, not imperative. That is, they say what we want to match, but now how.
Finite automata, on the other hand, include a nice procedure for matching.
Fortunately, with a little bit of effort, you can turn any regular expression (or set of regular expressions) into a DFA.
First, you convert each regular expression to an NFA.
Then you join the NFA's together, giving a new NFA.
Then you convert the NFA to a DFA.
We'll do all that in the next few classes.

History

Thursday, 29 August 2002

First version, based somewhat on outlines from CS362 2001S.

Monday, 9 September 2002

Removed section on using regular expressions for lexical analysis (covered in a slightly different format in Friday's class).
Updated wording somewhat.

Back to Regular Expressions, Continued. On to From Specification to Optimal DFA.

Disclaimer: I usually create these pages on the fly, which means that I rarely proofread them and they may contain bad grammar and incorrect details. It also means that I tend to update them regularly (see the history for more details). Feel free to contact me with any suggestions for changes.

This document was generated by Siteweaver on Fri Dec 6 10:37:55 2002.
The source to the document was last modified on Mon Sep 9 10:13:52 2002.
This document may be found at http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2002F/Outlines/outline.05.html.

You may wish to validate this document's HTML ; ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Research
glimmer@grinnell.edu