Unit 3 - Notes
Unit 3: FORMAL LANGUAGES
1. Definition of a Grammar
A Grammar is a mathematical model used to define the structure of a language. It is a set of rules (productions) used to generate valid strings in that language.
Formally, a grammar is defined as a 4-tuple:
Where:
- (Variables / Non-terminals): A finite set of symbols that denote syntactic categories. These are typically represented by uppercase letters (e.g., ).
- (Terminals): A finite set of symbols that form the strings of the language. These are typically represented by lowercase letters, numbers, or special symbols (e.g., ). Note that .
- (Production Rules): A finite set of relations of the form , where represents a string comprising variables and terminals with at least one variable, and represents a string of variables and terminals.
- (Start Symbol): A special variable () from which the generation of strings begins.
2. Derivations and the Language Generated by a Grammar
Derivations
A derivation is the process of generating a string from the start symbol by applying production rules.
- Direct Derivation (): If is a production rule, and we have a string , we can replace with to get . We write this as:
- Derivation in zero or more steps (): If a string can be derived from in zero or more steps, we write .
Types of Derivations:
- Leftmost Derivation (LMD): At every step, the leftmost variable (non-terminal) in the string is replaced.
- Rightmost Derivation (RMD): At every step, the rightmost variable (non-terminal) in the string is replaced.
Language Generated by a Grammar
The language generated by a grammar , denoted by , is the set of all strings consisting only of terminals that can be derived from the start symbol .
Note:
- A string is in if and only if it consists solely of terminals and can be derived from .
- Strings containing variables are sentential forms, not sentences of the language.
3. Chomsky Classification of Languages (Chomsky Hierarchy)
Noam Chomsky classified grammars into four types based on the restrictions applied to their production rules ().
| Type | Grammar Name | Language Accepted | Automaton | Production Restrictions () |
|---|---|---|---|---|
| Type 0 | Unrestricted Grammar | Recursively Enumerable | Turing Machine | No restrictions. containing at least one variable. |
| Type 1 | Context-Sensitive Grammar (CSG) | Context-Sensitive | Linear Bounded Automaton | (Length of LHS Length of RHS). Exception: is allowed if doesn't appear on RHS. |
| Type 2 | Context-Free Grammar (CFG) | Context-Free | Pushdown Automaton | , where and . (LHS must be a single variable). |
| Type 3 | Regular Grammar (RG) | Regular Language | Finite Automaton | or (Right Linear) OR or (Left Linear). |
Languages and their Relation
The languages form a strict hierarchy (subset relationship):
- Every Regular language is Context-Free.
- Every Context-Free language is Context-Sensitive.
- Every Context-Sensitive language is Recursively Enumerable.
4. Recursive and Recursively Enumerable Sets
These concepts relate to Type 0 languages and Turing Machines.
Recursively Enumerable (R.E.) Sets
A language is Recursively Enumerable if there exists a Turing Machine that recognizes .
- If input , halts and accepts.
- If input , may halt and reject OR loop forever.
- Key Concept: We can enumerate (list) the elements of the set, but we might not know if a specific element is not in the set (due to looping).
Recursive Sets (Decidable)
A language is Recursive if there exists a Turing Machine that decides .
- If input , halts and accepts.
- If input , halts and rejects.
- Key Concept: The machine is guaranteed to halt (Total Turing Machine).
Relationship:
- All Recursive sets are R.E., but not all R.E. sets are Recursive (this is related to the Halting Problem).
5. Languages and Automata
There is a direct one-to-one correspondence between classes of languages (generated by grammars) and classes of automata (machines that recognize them).
- Regular Languages: Recognized by Finite Automata (DFA/NFA). Memory is limited to states; no external stack.
- Context-Free Languages: Recognized by Pushdown Automata (PDA). Finite Automaton + one infinite Stack (Last-In-First-Out memory).
- Context-Sensitive Languages: Recognized by Linear Bounded Automata (LBA). A Turing Machine with a tape bounded by the length of the input.
- Recursively Enumerable Languages: Recognized by Turing Machines (TM). Infinite tape, random access memory.
6. Regular Sets and Regular Grammars
Regular Sets
A Regular Set is a language that can be described by a Regular Expression or accepted by a Finite Automaton. Examples include:
- Strings ending in '00'.
- Strings with an even number of 'a's.
- Finite sets of strings.
Regular Grammars
A Regular Grammar is a Type 3 grammar. It must be either Left Linear or Right Linear.
Right Linear Regular Grammar (RLRG)
All productions are of the form:
Where and .
- The variable (if present) is always the rightmost symbol on the RHS.
Left Linear Regular Grammar (LLRG)
All productions are of the form:
Where and .
- The variable (if present) is always the leftmost symbol on the RHS.
Important: A grammar cannot mix Left Linear and Right Linear rules and remain a Regular Grammar.
7. Converting Regular Expressions to Regular Grammars
To convert a Regular Expression (RE) to a Regular Grammar, it is often easiest to construct a Finite Automaton (DFA/NFA) first, and then convert that FA to a grammar.
Method (via DFA):
- Construct a DFA/NFA for the given Regular Expression.
- Let states of the DFA be the variables () of the grammar.
- Let the start state be the start symbol .
- For every transition (transition from state A to B on input 'a'):
- Add production:
- If is a final state:
- Add production: (in addition to )
- Alternatively, standard practice often uses if is a final state.
Example: RE =
- DFA States: (loops on , goes to on ), (final state).
- Transitions:
- Final State Handling ( is final):
- Resulting Grammar (Right Linear):
8. Converting Regular Grammars to Regular Expressions
To convert a Regular Grammar (specifically Right Linear) to a Regular Expression, we can solve the system of linear language equations, often using Arden's Theorem.
Arden's Theorem
If and are two regular expressions over , and does not contain , then the equation:
Has a unique solution:
Conversion Steps:
- Write the equations for the grammar. For every variable , write an equation where equals the sum (union) of its derivation options.
- If , term is .
- If , term is .
- Express the equations in the form .
- Substitute variables into one another to eliminate them until you have an equation for the Start Symbol strictly in terms of terminals.
- Apply Arden's Theorem () or () to resolve self-loops (recursion).
Example:
Grammar:
Solution:
From (2): .
Applying Arden's theorem ():
Substitute into (1):
Apply Arden's theorem ():
Correction on Arden Application:
Equation: matches (where ). Solution is .
Regular Expression: