Unit 4 - Notes
Unit 4: CONTEXT- FREE LANGUAGES
1. Introduction to Context-Free Grammars (CFG)
A Context-Free Grammar (CFG) is a formal system that describes a language by recursive rewriting rules. It is a 4-tuple , where:
- V (Variables): Finite set of non-terminal symbols (e.g., ).
- T (Terminals): Finite set of terminal symbols (e.g., ).
- P (Productions): Finite set of rules of the form , where and .
- S (Start Symbol): A distinguished element of .
Language of a CFG
The language generated by a grammar , denoted as , is the set of all strings composed of terminal symbols that can be derived from the start symbol .
2. Derivations and Sentential Forms
Derivations Generated by a Grammar
A derivation is a sequence of production rule applications that transforms the start symbol into a string of terminals.
- Step (): Replacing a variable with the right-hand side of a production rule.
- Multi-step (): Zero or more derivation steps.
Sentential Forms
If , then is called a Sentential Form.
- may contain a mix of terminals and non-terminals.
- If contains only terminals, it is called a Sentence of the language.
Leftmost and Rightmost Derivations
For any string in a Context-Free Language (CFL), there may be multiple ways to derive it.
1. Leftmost Derivation (LMD)
In every step of the derivation, the leftmost non-terminal in the sentential form is replaced.
- Example:
- Derivation of "ab":
2. Rightmost Derivation (RMD)
In every step of the derivation, the rightmost non-terminal in the sentential form is replaced.
- Example:
- Derivation of "ab":
3. Ambiguity in CFG
Definition
A Context-Free Grammar is said to be ambiguous if there exists at least one string that has:
- More than one Leftmost Derivation, OR
- More than one Rightmost Derivation, OR
- More than one Parse Tree (Derivation Tree).
Example of Ambiguity
Consider the grammar for simple arithmetic expressions:
String:
Parse Tree 1 (Precedence to +):
E
/|\
E + E
| /|\
id E * E
| |
id id
*Parse Tree 2 (Precedence to ):**
E
/|\
E * E
/|\ |
E + E id
| |
id id
Since two distinct parse trees exist for the same string, the grammar is ambiguous.
Inherently Ambiguous Languages
A Context-Free Language is inherently ambiguous if every grammar that generates it is ambiguous. (i.e., it is impossible to construct an unambiguous grammar for that specific language).
4. Simplification of Context-Free Grammars
Simplification involves transforming a grammar into an equivalent one (generating the same language) but with reduced complexity. This process is usually done in three specific steps:
- Construction of Reduced Grammars (Removal of Useless Symbols).
- Elimination of Null Productions (-productions).
- Elimination of Unit Productions.
A. Construction of Reduced Grammars
A symbol is useless if it is either:
- Non-generating: It cannot derive a terminal string.
- Algorithm: Identify all variables that can eventually generate terminals. Remove those that cannot.
- Non-reachable: It cannot be reached from the Start symbol .
- Algorithm: Draw a dependency graph starting from . Remove any symbol not visited.
B. Elimination of Null Productions
A Null Production is a rule of the form .
A variable is nullable if .
Algorithm:
- Identify all nullable variables.
- For every production , add new productions representing all possible variations where nullable variables in are removed (substituted by ).
- Remove the original rules (unless and does not appear on RHS).
Example:
- Nullable: .
- New Grammar:
- (Since A is nullable)
- (Since inner A is nullable)
C. Elimination of Unit Productions
A Unit Production is a rule of the form (where both are single variables).
Algorithm:
- Find all "unit pairs" such that using only unit productions.
- For every pair , if is a non-unit production, add to the grammar.
- Remove the original unit productions.
5. Normal Forms for CFG
Normal forms restrict the structure of production rules to facilitate parsing algorithms and proofs.
A. Chomsky Normal Form (CNF)
A CFG is in CNF if every production is in one of the following two forms:
- (Non-terminal Two Non-terminals)
- (Non-terminal Single Terminal)
(Note: is allowed if is not on the RHS of any rule).
Conversion to CNF:
- Simplify grammar (Remove useless, , and unit productions).
- Isolate Terminals: Replace terminals in mixed bodies. E.g., if , create new variable and change rule to .
- Break long strings: If a body has 3+ variables (e.g., ), introduce cascading variables. , .
B. Greibach Normal Form (GNF)
A CFG is in GNF if every production is of the form:
Where:
- is a single terminal.
- is a string of zero or more variables ().
Characteristics:
- Every step consumes exactly one terminal symbol.
- Crucial for the construction of Pushdown Automata.
- Requires the elimination of Left Recursion () before conversion.
6. Pumping Lemma for CFG
The Pumping Lemma is a tool used to prove that a specific language is NOT Context-Free.
Statement
If is a Context-Free Language, there exists a pumping length such that any string with can be decomposed into five parts satisfying:
- (v and x are not both empty).
- (The "middle" part is not too long).
- For all , the string .
Application Example
To prove is not CFL:
- Assume is CFL. Let be the pumping length.
- Choose .
- According to conditions, cannot contain both 's and 's (because ).
- If we pump and (e.g., ), the number of one or two symbols increases, but the third remains constant.
- The resulting string will not have equal numbers of .
- Contradiction is not CFL.
7. Applications of Context-Free Grammars
- Compiler Construction:
- CFGs are used to define the syntax of programming languages.
- Parsers (like YACC or Bison) use CFGs to verify if code is syntactically correct and generate Parse Trees.
- Document Type Definition (DTD):
- Used in XML and SGML to describe the structure of documents (nesting of tags), which is inherently a context-free structure.
- Arithmetic Expressions:
- Defining precedence and associativity rules in calculators and interpreters.
- Natural Language Processing (NLP):
- Used in early linguistic models to describe sentence structures (Subject-Verb-Object).