Back to Finite Automata. On to From NFA to Optimal DFA.

**Held** Monday, February 5, 2001

**Summary**

Today we consider how to move from the readable but declarative regular exprssion notation to the executable but obtuse finite automaton notation.

**Notes**

- If you have not done so already, please start reading Chapter 4 of the Red Dragon.
- Please don't forget lab this week.
- I'm still working on writing up last Wednesday's class.

**Overview**

- Regular expressions vs. DFAs.
- Regular expressions as lexical descriptions.
- From lexical specification to DFA
- From one regular expression to NFA
- From many regular expressions to NFA

- We've seen how to match with finite automata, but how do we tokenize?
- We only tokenize with deterministic finite automata.
- We begin by attaching a token (or, sometimes, a set of instructions)
to each final state.
- When we reach an appropriate final state (possibly not the first final state), we stop and return the corresponding token.

- How do we decide whether a final state is appropriate? We always
keep track of the last final state we encountered, but peek ahead
until we find another final state or hit a dead end.
/** * Find the first token in the candidate string, starting * at a particular position. */ public token findToken(String candidate, starting_pos) begin State current_state = q0; for i = starting_pos to the length of candidate current_state = edge(current_state,candidate.symbolAt(i)) if current_state is a final state final_found = current_state final_pos = i end if if current_state is undefined then exit the for loop end if end for if (final_found is defined) then return the token given by final_found at position final_pos else no token can be found end if end

- We've now seen two mechanisms for describing tokens (and, in effect, for tokenizing): regular expressions and finite automata.
- How do we go from one to another? We convert regular expressions to nondeterministic finite automata using a surprisingly naive translation. We convert NFAs to DFAs. We then "optimize" the DFAs.
- We'll do this with a trio of expressions: one for an odd variant of identifiers ((a|bb)[a-d]*) and two for a simple keywords (aa and dd).

- Our regular expression to NFA converter builds a finite automaton with one final state (and one start state). It uses the recursive definitions of regular expressions to build a finite automaton for a compound expression from the finite automata for the base expressions.
- The finite automaton for the empty string has two states with an epsilon transition between the states. The first state is the start state. The second state is the final state.
- The finite automaton for a single symbol is similar, except that the transition is labeled with the symbol.
- The finite automaton for alternation has a new start state with epsilon edges to the start states of the automata for the subexpressions and a new final state with epsilon edges from the final states of the automata for the subexpressions.
- The finite automaton for concatenation uses the start state of the automaton for the first expression and the final state of the automaton for the second expression. There is an epislon transition from the final state of the first automaton to the start state of the second automaton.
- The finite automaton for repetition has a new start state and a
new final state. There are also epsilon transitions
- from the start state to the final state,
- from the start state to the start state of the subautomaton,
- from the final state of the subautomaton to the final state, and
- from the final state of the subautomaton to the start state of the subautomaton.

- If it seems to you that some of these extra states and epsilon transitions are unnecessary, you're probably right. However, you can't get rid of all of them. For example, consider what happens if you don't create any extra states for concatenation or repetition.

- How do we convert a series of token definitions (using regular expressions) to a tokenizing NFA?
- We build an NFA for each of the token definitions.
- We augment the final states with token type and priority.
- We put 'em together in one big automaton by adding a new start state with epsilon transitions to the start states of all the other automata.

Monday, 22 January 2001

- Created as a blank outline.

Monday, 5 February 2001

- Filled in the details, based primarily on outline 7 of CSC152 98F.
- Updated formatting.
- Added introductory material.

Back to Finite Automata. On to From NFA to Optimal DFA.

[Current]
[Discussions]
[Glance]
[Honesty]
[Instructions]
[Links]
[Search]
[Syllabus]
**Primary**

[Examples]
[Exams]
[Handouts]
[Homework]
[Labs]
[Outlines]
[Project]
[Quizzes]
[Readings]
[Reference]
**Sets**

[Blackboard]
[98F]
**Links**

**Disclaimer**:
I usually create these pages on the fly. This means that they
are rarely proofread and may contain bad grammar and incorrect details.
It also means that I may update them regularly (see the history for
more details). Feel free to contact me with any suggestions for changes.

This page was generated by Siteweaver on Mon Apr 30 10:51:48 2001.

This page may be found at `http://www.cs.grinnell.edu/~rebelsky/Courses/CS362/2001S/outline.07.html`

.

You may validate
this page's HTML.

The source was last modified Mon Feb 5 10:15:52 2001.