Today in CS362: Fun with Grammars
ADMIN
[bcdf]
Question d: How do I do "strings of digits with no
repetitions?"
Alternate: How do I approach the problem?
* Start with a simpler alphabet (e.g., 1-3 rather than 0-9)
* Look at all the cases
+ Length
+ Starting digit
+ Repeated digit
+ ...
* Name regular expressions
* Think about characteristics
+ Maximum length
Question f: Even # of 0's and odd # of 1's.
* Use ideas from today's class.
* Name regular expressions and make them recursive
evenodd: (eveneven 1) | (oddodd 0) | ...
eveneven: (evenodd 1) | ...
Quiz: When is tokenizing ambiguous in Pascal?
QUIZ FOR CREDIT ON WEDNESDAY
Question on 3.17: How many accepting states?
Answer from SamR: (Just one)
ON TO THE TOPIC OF TODAY'S CLASS: GOING BEYOND REGEXPS
Problem: Some fairly simple languages cannot be implemented
with regular expressions or finite automata
Example: "Strings of a's and b's with equal numbers of
a's and b's".
Intuition: Need a way to count. How do you count? Using
states. But you'll run out of states for enough a's or b's.
Theorem: There does not exist a DFA for the language "strings
of a's and b's with equal numbers of a's and b's"
Proof (by contradiction):
* Assume there is a DFA of n states for that language.
* Trace the sequence of states it visits for the string
a^(n+1).
* That sequence must contain a loop. [Pigeonhole principle]
* Formally, delta*(q0,a^i) = delta*(q0,a^j) for some i
and j.
* Observation: delta*(q0,a^ib^i) is an accepting state.
* Hence: delta*(q0,a^jb^i) is an accepting state.
* QED (or, for the computer scientists, CAB)
Larger conclusion: Finite automata fail to work for many
useful languages:
* Matched parentheses
Observation:
* Equal numbers of a's and b's is hard.
* Will always tax memory.
What do we do (about the regexp/fa problem)?
* Need something with a theoretically infinite memory
* But a finite description
Standard model: Grammar
* A whole lot like named regular expressions
* Except that the named expression can be recursive
S : epsilon | (a S b)
Exp: NUMBER | ID | (OP Exp) | (Exp OP Exp) | '(' Exp ')'
Statement : IfStatement | WhileStatement | Assignment | ...
IfStatement : IF Exp THEN Statement ELSE Statement
StatementSeq : Statement | Statement SEMI StatementSeq
FORMAL DEFINITION:
* A grammar is a four-tuple:
+ T, a set of tokens/terminals/symbols (individual
building blocks)
+ N, a set of nonterminals (instructions for building
stuff in the language)
+ S in N, a designated start-symbol
+ P, a function from sequences of terminals and
nonterminals to sequences of terminals and nonterminals
A string, str, in T*, is in the language of a grammar,
G, (the language
is L(G))
if
Start with S, repeatedly apply rules from P and end up
with str.
Useful for membership testing and for generation.
Questions for the theoretician:
* Are there times I want more than one thing on the left?
* Do I gain anything from putting more than one thing on the
left?
Questions for the hacker:
* Can I build efficient programs that check membership from
these nondeterministic-seeming descriptions?