Today in CS362: Finite Automata Warning:The keyboard has a nasty space bar,so my writing may be even worse than normal. Admin: RegExps due (email) Look for Pascal token definitions Standard ref: Jensen & Wirth "Pascal User Manual and Report" Regular expression for "C Comment" Simplified "ab" (not ba)* "ba" Overview: What are finite automata? DFAs Examples NFAs Looking Ahead Finite Automata: A computational model for lexical analysis Simple model: The program can be in different "states" which represent the knowledge of what's been seen so far. The program has "transitions" from state to state based on the next input character Generic FA routine state := initial_state while (!instream.eof()) { state = follow-transition(state,instream.nextchar()); } return (state.isAccepting()); Some examples: ab*a abb*ba A Deterministic Finite Automaton (DFA) is a five-tuple (Sigma: The alphabet Q: a finite set of states q_0 in Q: a designated "start state" delta: state x symbol -> state: transition function F subset Q: The "final states") Emily's abb*ba automataon Sigma: { a, b } Q: { s0, s1, s2, s3, s4 } q_0 is s0 F = { s4 } delta(s0,a) = s1 delta(s1,b) = s2 delta(s2,b) = s3 delta(s3,a) = s4 delta(s3,b) = s3 Observations: (1) Implicit failure state: No transition given, fail (2) q_0, F, and delta are all that is really necc (since everything else can be determined) Challenge: Comments in C Alternatively: "Strings over a-e that start with ab, end with ba, and cannot have ba in the middle". ab([^b]*|b[^a])*ba [WRONG] ab([^b]*|b[^a])*b*ba [WHO KNOWS?] Finite Automataon is "much" easier to build Making life more fun: Nondeterminism (1) There can be more than one transition from a state that uses the same character. (2) There can be "free" transitions that consume no input characters A string is accepted by a NFA if there is *some* legal finite path to a final state (using that string as input). Theorem: Any language that can be expressed by an NFA can be expressed by a DFA. Proof: Instructions for converting any NFA to a DFA. (Coming soon) Theorem: Any language that can be expressed by a regexp can be expressed by a NFA. Proof: Instructions (coming soon) Theorem: Any NFA or DFA can be expressed by a regular expression Proof: Instructions (take 341) Looking ahead: Wednesday NFA->DFA (+ extra) Tuesday: RegExp lab Starting on RegExp to NFA Five basic kinds of regular expressions Rules for converting each kind to an NFA Key design idea: Every created NFA will have exactly one start state and exactly one final state. Epsilon: