Today in CS362: Fun with Grammars ADMIN [bcdf] Question d: How do I do "strings of digits with no repetitions?" Alternate: How do I approach the problem? * Start with a simpler alphabet (e.g., 1-3 rather than 0-9) * Look at all the cases + Length + Starting digit + Repeated digit + ... * Name regular expressions * Think about characteristics + Maximum length Question f: Even # of 0's and odd # of 1's. * Use ideas from today's class. * Name regular expressions and make them recursive evenodd: (eveneven 1) | (oddodd 0) | ... eveneven: (evenodd 1) | ... Quiz: When is tokenizing ambiguous in Pascal? QUIZ FOR CREDIT ON WEDNESDAY Question on 3.17: How many accepting states? Answer from SamR: (Just one) ON TO THE TOPIC OF TODAY'S CLASS: GOING BEYOND REGEXPS Problem: Some fairly simple languages cannot be implemented with regular expressions or finite automata Example: "Strings of a's and b's with equal numbers of a's and b's". Intuition: Need a way to count. How do you count? Using states. But you'll run out of states for enough a's or b's. Theorem: There does not exist a DFA for the language "strings of a's and b's with equal numbers of a's and b's" Proof (by contradiction): * Assume there is a DFA of n states for that language. * Trace the sequence of states it visits for the string a^(n+1). * That sequence must contain a loop. [Pigeonhole principle] * Formally, delta*(q0,a^i) = delta*(q0,a^j) for some i and j. * Observation: delta*(q0,a^ib^i) is an accepting state. * Hence: delta*(q0,a^jb^i) is an accepting state. * QED (or, for the computer scientists, CAB) Larger conclusion: Finite automata fail to work for many useful languages: * Matched parentheses Observation: * Equal numbers of a's and b's is hard. * Will always tax memory. What do we do (about the regexp/fa problem)? * Need something with a theoretically infinite memory * But a finite description Standard model: Grammar * A whole lot like named regular expressions * Except that the named expression can be recursive S : epsilon | (a S b) Exp: NUMBER | ID | (OP Exp) | (Exp OP Exp) | '(' Exp ')' Statement : IfStatement | WhileStatement | Assignment | ... IfStatement : IF Exp THEN Statement ELSE Statement StatementSeq : Statement | Statement SEMI StatementSeq FORMAL DEFINITION: * A grammar is a four-tuple: + T, a set of tokens/terminals/symbols (individual building blocks) + N, a set of nonterminals (instructions for building stuff in the language) + S in N, a designated start-symbol + P, a function from sequences of terminals and nonterminals to sequences of terminals and nonterminals A string, str, in T*, is in the language of a grammar, G, (the language is L(G)) if Start with S, repeatedly apply rules from P and end up with str. Useful for membership testing and for generation. Questions for the theoretician: * Are there times I want more than one thing on the left? * Do I gain anything from putting more than one thing on the left? Questions for the hacker: * Can I build efficient programs that check membership from these nondeterministic-seeming descriptions?