CSC362 2011F, Class 08: Finite Automata
Overview:
* Finite Automata.
* Deterministic Finite Automata.
* Examples
* Nondeterministic Finite Automata.
* Looking Ahead.
Admin:
* Don't forget that you are supposed to drop me a short (one to two
paragraph) note reflecting on my notes on your first assignment.
* Some of you have asked about my grading style for this class.
* As you may have noted, I respond to your work in a variety of ways.
* I note potential and real errors.
* I note potential stylistic problems.
* I give suggestions for improving the code.
* I may give example inputs for which I am not sure your program
will work correctly.
* I may write short notes like "ugh" or "bleh", which indicate
that you've done something unpleasant at that point in your code.
(I expect that you are sophisticated enough to figure it out,
and can ask me if you're unsure.)
* In the past, I have found that some students do not take the time
to look at the comments, and that those who do look at the comments
don't always think about them carefully.
* Hence, this semester, I am not going to summarize my comments and
will instead ask you to come up with a summary.
* You may certainly come talk to me if you have questions about the
details of particular comments or about the big picture of
your assignment.
* Do you have to do the second assignment?
No.
* But if you don't do the assignment, you will get a zero on the
assignment, which is between two and five points off on your final
grade.
* And you won't learn.
* In addition, it is likely that you will find the next assignment
harder, since assignments are likely to build.
* Can you use "extended regular expression syntax" on the second assignment?
Yes, see Piazza for details.
R* - (a|b)*- strings of length 0 or more consisting of a's and b's
E.g., epsilon | (a|b) | (a|b)(a|b) | (a|b)(a|b)(a|b)
R+ == R(R*) == one or more repetitions
R? == epsilon | R - Optional
(+|-)?[0-9]*
[abc] = (a|b|c)
[a-z] = (a|b|c|d|...|f)
* How can we test our regular expressions?
* grep
* EC for President's speech on Thursday at 11.
* EC for Walker Group's speech on Thursday at 4:30.
Finite Automata
* Recall: Doing lexical analysis
From sequences of characters to sequences of (annotated) tokens
Annotations
* A string will have what string it is
* An integer will have its value
* How do we specify tokens?
* Regular expressions - What, not how.
* We like *how* stuff: Code (or models) we can execute
* How do we computationally specify tokens?
* Magically convert regular expressions with a LAG
* Ad hoc coding
* Finite automata: Computational, but easy to analyze mathematically
Deterministic Finite Automata
* At any stage of a computation, the computer is in some "state"
* An input specifies the next state
* We can describe computation by specifying abstract states and
the transitions between those states
* Easy to encode into program
Two important values:
* state
* next input symbol
state = 0
while (input remains)
switch (state,getchar())
case (0,'a'): state = 1; break;
case (0,'b'): state = 3; break;
case (1,'b'): state = 2; break;
case (3,'c'): state = 4; break;
default: crash-and-burn
If state is a 'specially designated final state', we've matched
To mathematicians, a finite automaton is an n-tuple
Sigma: A finite alphabet
S: a finite set of states
delta: a function from state/alphabets pairs to states
s: a designated start state
E: a set of designated matching states
How does the finite automaton work? A lot like the code above.
Finite automaton as language acceptor
state = s; // start state
while (input characters remain)
c = next input character
state = delta(state,c)
return state in E
More computationally interesting
* Associate a token type with each end state
* Associate code with each end state
Assume you can write a FA for palindromes.
* That FA has n states for some n.
Inputs
empty string
(ab)(ba)
(ab)^2(ba)^2
(ab)^3(ba)^3
...
(ab)^n(ba)^n
Consider the states reached after (ab)^i (i = 0 ... n)
* Two of the strings of the form (ab)^i REACH THE SAME STATE
Exists i and j, 0 <= i < j <= n
(ab)^i puts you in the same state as (ab)^j
(ab)^i(ba)^i is accepted by the FA
(ab)^j(ba)^i is accepted by the FA and is not a palindrome
* Whoops
Why like FAs?
* Simple
* Fast
* Provably equivalent to regular expressions
* Automatically generatable from regular expressions
Examples (alphabet = { a, b, c }
* "Strings that start with a, have an arbitrary number of b's and end with a"
* "Strings that start and end with a".
* "Comments in C" (to simplify things notationally: strings that start with
ab, end with ba, and have no intervening ba's).
Nondeterministic Finite Automata
Looking Ahead
----
Lab questions