**Primary:**
[Skip To Body]
[Front Door]
[Current]
[Glance]
[Honesty]
[Instructions]
[Links]
[Search]
[Syllabus]

**Sets:**
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Project]
[Readings]
[Reference]

**ECA:**
[About]
[Grades]
[Quizzes]
[Submit Work]
[Change Password]
[Reset Password]

**Miscellaneous:**
[2001S]
[98F]
[SamR]
[Glimmer Labs]

Back to Regular Expressions, Continued. On to From Specification to Optimal DFA.

**Held** Monday, September 9, 2002

**Summary**

Today we consider *finite automata*, which serve as another, more
computational, representation of data for lexical analysis.

**Notes**

- In lab tomorrow, we'll explore regular expressions in the Unix environment.
- In preparation for the first phase of your compiler, you should look for a list of tokens in Pascal.

**Due**

- Regular expression for
strings that are not 'if'

(orstrings that are not 'while'

) (orstrings that are neither 'if' nor 'while'

).

**Overview**

- Finite automata
- Deterministic Finite Automata
- Examples
- Nondeterministic Finite Automata
- Looking ahead

- Regular expressions provide a simple, easy to understand, and elegant
mechanism for describing the tokens of a language.
- Unfortunately, it is difficult to compute with regular expressions.
- Hence, we will consider a more computational approach to lexical
analysis:
*finite automata*.

- What are finite automata?
- They are very simple computing machines (simple enough that there are things that they cannot compute that more complex computing machines can compute).

- What do they compute?
- Typically, they compute membership in simple languages.
- That is, given a string, they determine whether or not that string is in the language.
- As with regular expressions, they can be modified to find tokens in larger sequences of symbols.

- Because they are simple, finite automata can easily be translated to machine code.
- Essentially, a finite automaton is a series of states and transitions.
- Each state represents knowledge about the part of the string seen so far.
- Each transition indicates how state changes depending on the next input character.

- Formally, a
*deterministic finite automaton*is a five tuple <Q,Sigma,delta,q0,F> where- Q is a set of
*states*; - Sigma is the base
*alphabet*, that is a finite set of symbols; - delta is a
*transition function*, it is a function from state/symbol pairs to states; - q in Q is the
*start state*; - F subset Q is the set of
*final states*.

- Q is a set of
- Typically, we represent finite automata pictorially.
- Each state is represented by a circle.
- The transition function, delta, is represented by directed and labeled edges between states. For example, if (q1,b,q2) is in delta, then there is an edge from q1 to q2 labeled b.
- Final states have a double circle.
- The start state has an incoming arrow.
- Sigma is not represented explicitly.

- How do we use an automaton to determine whether a string is in the language
given by that automaton? Using a matching function.
/** * Determine whether a string is in the language given by the * current automaton. */ public boolean inLanguage(String candidate) begin State current_state = q0; for each symbol sym in candidate current_state = delta(current_state ,sym) if current_state is undefined return false end for return (current_state is a final state); end inLanguage

- As this suggests, it is also helpful to have an explicit or implicit
*failure state*. - You may find it easier to represent some tokens with finite
automata and others with regular expressions.
- They have the same computational power.
- Neither can represent the language
strings of

.`a`

's and`b`

's with equal numbers of`a`

's and`b`

's

- We'll try doing each of these as a regular expression and then as a DFA.
- "Strings that start with a, have an arbitrary number of b's and end with a"
- "Strings that start and end with a".
- "Comments in C" (to simplify things notationally: strings that start with ab, end with ba, and have no intervening ba's).

- It turns out that it's interesting and useful to consider a variant of
finite automata in which
- some transitions are
free

, in that they do not require any symbols in order to make the transition. Typically such edges are labeled with epsilon. - there may be multiple edges with the same label exiting the same node

- some transitions are
- Such finite automata are often called
*nondeterministic finite automata*or NFAs. - When is a string in the language given by such an automaton? When
there is
*some path*through the automaton that uses the appropriate symbols and free transitions. [We'd like the computer to guess the correct path.] - Do we gain any power from this nondeterminism? Surprisingly, no. As we'll see next class, nondeterministic finite automata can easily be converted to deterministic finite automata.

- Why is all this important?
- Because regular expressions (which we like to write), are declarative, not imperative. That is, they say what we want to match, but now how.
- Finite automata, on the other hand, include a nice procedure for matching.
- Fortunately, with a little bit of effort, you can turn any regular expression (or set of regular expressions) into a DFA.
- First, you convert each regular expression to an NFA.
- Then you join the NFA's together, giving a new NFA.
- Then you convert the NFA to a DFA.
- We'll do all that in the next few classes.

Thursday, 29 August 2002

- First version, based somewhat on outlines from CS362 2001S.

Monday, 9 September 2002

- Removed section on using regular expressions for lexical analysis (covered in a slightly different format in Friday's class).
- Updated wording somewhat.

Back to Regular Expressions, Continued. On to From Specification to Optimal DFA.

**Primary:**
[Skip To Body]
[Front Door]
[Current]
[Glance]
[Honesty]
[Instructions]
[Links]
[Search]
[Syllabus]

**Sets:**
[EBoards]
[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Project]
[Readings]
[Reference]

**ECA:**
[About]
[Grades]
[Quizzes]
[Submit Work]
[Change Password]
[Reset Password]

**Miscellaneous:**
[2001S]
[98F]
[SamR]
[Glimmer Labs]

**Disclaimer**:
I usually create these pages on the fly

, which means that I rarely
proofread them and they may contain bad grammar and incorrect details.
It also means that I tend to update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.

This document was generated by
Siteweaver on Fri Dec 6 10:37:55 2002.

The source to the document was last modified on Mon Sep 9 10:13:52 2002.

This document may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2002F/Outlines/outline.05.html`

.

You may wish to validate this document's HTML ; ; Check with Bobby

Glimmer Labs: The Grinnell Laboratory for Interactive Multimedia Experimentation & Researchglimmer@grinnell.edu