Home / Sem 5 / CSE322 / Unit 3 Subjective

Unit3 - Subjective Questions

CSE322 • Practice Questions with Detailed Answers

Define a Formal Grammar. Explain the four tuples used to represent it mathematically.

Formal Grammar is a set of rules used to generate strings in a formal language. It describes how to form strings from the language's alphabet that are valid according to the language's syntax.

Mathematically, a grammar $G$ is defined as a 4-tuple: $G = (V, T, P, S)$ , where:

$V$ (Variables or Non-terminals): A finite, non-empty set of symbols that are replaced by other symbols during the derivation of strings. Usually represented by uppercase letters (e.g., $A, B, S$ ).
$T$ (Terminals): A finite set of symbols that form the actual strings of the language. They cannot be replaced once generated. Usually represented by lowercase letters, digits, or symbols (e.g., $a, b, 0, 1$ ). Note that $V \cap T = \emptyset$ .
$P$ (Productions): A finite set of production rules that specify how variables can be replaced by other variables and terminals. A rule is of the form $\alpha \rightarrow \beta$ , where $\alpha$ contains at least one non-terminal and $\beta$ is a string of variables and/or terminals.
$S$ (Start Symbol): A special non-terminal symbol from $V$ from which the derivation of all strings begins. $S \in V$ .

Explain the concept of 'Derivation' and the 'Language Generated by a Grammar'.

Derivation:
Derivation is the process of generating a string of terminal symbols from the start symbol of a grammar by successively applying the production rules.
If a grammar has a rule $\alpha \rightarrow \beta$ , and we have a string $\gamma \alpha \delta$ , we can replace $\alpha$ with $\beta$ to get $\gamma \beta \delta$ . This single step is denoted as $\gamma \alpha \delta \Rightarrow \gamma \beta \delta$ .
If a string $w$ can be derived from the start symbol $S$ in zero or more steps, it is denoted as $S \Rightarrow^* w$ .

Language Generated by a Grammar:
The language generated by a grammar $G = (V, T, P, S)$ , denoted as $L(G)$ , is the set of all strings consisting exclusively of terminal symbols that can be derived from the start symbol $S$ .
Mathematically:
$L(G) = \{ w \in T^* | S \Rightarrow^* w \}$

Here, $T^*$ represents the set of all possible strings over the terminal alphabet $T$ .
A string is in $L(G)$ if and only if it consists solely of terminals and can be derived from $S$ using the rules in $P$ .

Describe the Chomsky Classification of Languages in detail.

Noam Chomsky classified formal grammars into four hierarchical levels based on the restrictions applied to their production rules ( $\alpha \rightarrow \beta$ ):

1. Type 0: Unrestricted Grammars

Restriction: None, except that $\alpha$ must contain at least one non-terminal.
Language: Recursively Enumerable Languages.
Recognizing Automaton: Turing Machine.

2. Type 1: Context-Sensitive Grammars (CSG)

Restriction: $|\alpha| \le |\beta|$ (the length of the right-hand side must be greater than or equal to the left-hand side). Rules are of the form $\alpha_1 A \alpha_2 \rightarrow \alpha_1 \beta \alpha_2$ , meaning $A$ is replaced by $\beta$ only in the context of $\alpha_1$ and $\alpha_2$ .
Language: Context-Sensitive Languages.
Recognizing Automaton: Linear Bounded Automaton (LBA).

3. Type 2: Context-Free Grammars (CFG)

Restriction: $\alpha$ must be a single non-terminal. Rules are of the form $A \rightarrow \beta$ , where $A \in V$ and $\beta \in (V \cup T)^*$ .
Language: Context-Free Languages.
Recognizing Automaton: Pushdown Automaton (PDA).

4. Type 3: Regular Grammars

Restriction: $\alpha$ is a single non-terminal, and $\beta$ can be a single terminal, or a single terminal followed by a single non-terminal (Right Linear), or a single non-terminal followed by a terminal (Left Linear).
Language: Regular Languages.
Recognizing Automaton: Finite Automata (DFA/NFA).

What is the relationship between the languages in the Chomsky Hierarchy? Explain with a diagrammatic concept.

The languages defined by the Chomsky Hierarchy exhibit a strict subset relationship. Every language of Type 3 is also of Type 2, every Type 2 is also Type 1, and every Type 1 is also Type 0. However, the reverse is not true.

Relationship:
$L_{Regular} \subset L_{Context-Free} \subset L_{Context-Sensitive} \subset L_{Recursively Enumerable}$
Or equivalently:
$Type\ 3 \subset Type\ 2 \subset Type\ 1 \subset Type\ 0$

Explanation:

Regular Languages (Type 3) are the most restricted and form the innermost circle.
Context-Free Languages (Type 2) encompass all Regular Languages plus languages that require counting/stack memory (e.g., $a^n b^n$ ).
Context-Sensitive Languages (Type 1) encompass all CFLs plus languages requiring bounded memory proportional to input size (e.g., $a^n b^n c^n$ ).
Recursively Enumerable Languages (Type 0) form the outermost circle, containing all languages generated by formal grammars and recognized by Turing Machines.
Conceptually, this is represented by concentric circles (a Venn diagram) where Type 0 is the universal set of formal languages.

Distinguish between Recursive Sets and Recursively Enumerable (RE) Sets.

Recursively Enumerable (RE) Sets:

A language (or set) is RE if there exists a Turing Machine that accepts every string in the language.
If a string is in the language, the Turing Machine will eventually halt and accept.
If a string is NOT in the language, the Turing Machine may either halt and reject, OR loop infinitely.
RE languages are generated by Type 0 (Unrestricted) grammars.

Recursive Sets (Decidable Sets):

A language is Recursive if there exists a Turing Machine that accepts every string in the language AND rejects every string not in the language.
The Turing Machine is guaranteed to halt on all inputs (it never goes into an infinite loop).
If a set is Recursive, both the set and its complement are Recursively Enumerable.

Key Difference:
The membership problem is decidable for Recursive sets (we always get a Yes/No answer). For RE sets, it is semi-decidable (we get a Yes if true, but might wait forever if false). Therefore, Every Recursive set is an RE set, but not every RE set is a Recursive set.

Map the different classes of Formal Languages to their corresponding Automata.

In formal language theory, there is a direct equivalence between the classes of grammars (generators) and automata (recognizers). The mapping according to the Chomsky Hierarchy is as follows:

Regular Languages (Type 3):
- Automaton: Finite Automata (FA).
- This includes Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA). They possess no auxiliary memory beyond their current state.
Context-Free Languages (Type 2):
- Automaton: Pushdown Automata (PDA).
- A PDA is essentially a Finite Automaton augmented with a Stack data structure for memory (LIFO access).
Context-Sensitive Languages (Type 1):
- Automaton: Linear Bounded Automata (LBA).
- An LBA is a restricted Turing Machine where the read/write head cannot move beyond the portion of the tape containing the initial input string.
Recursively Enumerable Languages (Type 0):
- Automaton: Turing Machine (TM).
- A TM has an infinite tape and a read/write head that can move in both directions, making it the most powerful computational model.

Define Left Linear and Right Linear Regular Grammars. Provide examples of each.

A Regular Grammar (Type 3) can be classified into two forms based on the position of the non-terminal on the right side of the production:

1. Right Linear Grammar:
In a right linear grammar, all productions are of the form:

$A \rightarrow xB$
$A \rightarrow x$
where $A, B \in V$ (Non-terminals) and $x \in T^*$ (String of terminals).
If a non-terminal appears on the right side, it must be at the extreme right.
Example:
$S \rightarrow aS | bA | \epsilon$
$A \rightarrow aA | b$

2. Left Linear Grammar:
In a left linear grammar, all productions are of the form:

$A \rightarrow Bx$
$A \rightarrow x$
where $A, B \in V$ and $x \in T^*$ .
If a non-terminal appears on the right side, it must be at the extreme left.
Example:
$S \rightarrow Sa | Ab | \epsilon$
$A \rightarrow Aa | b$

Note: Both generate exactly the same class of languages (Regular Languages). However, a single grammar cannot mix both left and right linear rules; otherwise, it becomes a Context-Free Grammar, which might not be regular.

Outline the algorithm to convert a given Regular Expression to a Regular Grammar.

Converting a Regular Expression (RE) to a Regular Grammar involves a two-step process: converting the RE to a Finite Automaton (NFA), and then converting the NFA to a Regular Grammar (Right Linear).

Step 1: Convert RE to NFA
Use Thompson's Construction to convert the RE into an NFA with $\epsilon$ -transitions, or construct a direct DFA. Ensure the automaton has a defined start state and one or more final states.

Step 2: Convert Automaton to Right Linear Grammar
Let the automaton be $M = (Q, \Sigma, \delta, q_0, F)$ . We construct a grammar $G = (V, T, P, S)$ as follows:

Variables ( $V$ ): Create a non-terminal for each state in $Q$ . Let $V = Q$ .
Terminals ( $T$ ): The alphabet $\Sigma$ becomes the terminal set $T$ .
Start Symbol ( $S$ ): The start state $q_0$ becomes the start symbol.
Productions ( $P$ ):
- For every transition $\delta(q_i, a) = q_j$ in the automaton, add the production rule: $q_i \rightarrow a q_j$ to $P$ .
- For every transition $\delta(q_i, \epsilon) = q_j$ , add the rule: $q_i \rightarrow q_j$ .
- If a state $q_f$ is a final state ( $q_f \in F$ ), add the $\epsilon$ -production: $q_f \rightarrow \epsilon$ .

The resulting grammar is a Right Linear Regular Grammar that generates the same language as the initial Regular Expression.

Explain the method of converting a Regular Grammar into a Regular Expression using Arden's Theorem.

To convert a Regular Grammar (assumed to be Right Linear) to a Regular Expression, we formulate equations for each non-terminal and solve them using Arden's Theorem.

Arden's Theorem states:
If $P$ and $Q$ are two regular expressions over an alphabet $\Sigma$ , and if $P$ does not contain the null string ( $\epsilon$ ), then the equation $R = Q + RP$ has a unique solution given by $R = QP^*$ .

Conversion Procedure:

Formulate Equations: For each non-terminal $A_i$ in the grammar, create an equation. If the productions for $A_i$ are $A_i \rightarrow a_1 A_1 | a_2 A_2 | \dots | a_k A_k | b_1 | b_2$ , the equation becomes:
$A_i = a_1 A_1 + a_2 A_2 + \dots + a_k A_k + b_1 + b_2$
Handle Start Symbol: If $A_i$ is a final accepting variable (meaning it has a production $A_i \rightarrow \epsilon$ ), add $\epsilon$ to its equation.
Solve Equations: Use substitution to express all equations in the form $R = Q + RP$ .
Apply Arden's Theorem: Replace equations of the form $R = Q + RP$ with $R = QP^*$ .
Find Start Symbol's Expression: Continue substituting back until the start variable (e.g., $S$ ) is expressed entirely in terms of terminal symbols. The final expression for $S$ is the required Regular Expression.

Convert the Right Linear Grammar $S \rightarrow 0A | 1B$ , $A \rightarrow 0S | 1A | \epsilon$ , $B \rightarrow 1S | 0B$ into a Regular Expression.

We use the equation method and Arden's Theorem.
Given Equations:
1) $S = 0A + 1B$
2) $A = 0S + 1A + \epsilon$
3) $B = 1S + 0B$

Step 1: Solve for B
From (3), $B = 1S + 0B$ . Using Arden's Theorem ( $R = Q + RP$ ), let $R=B, Q=1S, P=0$ .
$B = (1S)0^* = 10^* S$

Step 2: Solve for A
From (2), $A = (0S + \epsilon) + 1A$ . Using Arden's Theorem, let $R=A, Q=0S+\epsilon, P=1$ .
$A = (0S + \epsilon)1^* = 0S1^* + 1^*$

Step 3: Substitute A and B into S
Substitute into (1):
$S = 0(0S1^* + 1^*) + 1(10^* S)$
$S = 00S1^* + 01^* + 110^* S$
Group $S$ terms:
$S = (001^* + 110^*)S + 01^*$
Rewrite to match Arden's form $S = Q + SP$ :
$S = 01^* + S(001^* + 110^*)$ (assuming commutativity in concatenation of independent sets, specifically writing as $S = Q + SP$ means $S = 01^* + (001^* + 110^*)S$ wait, standard Arden is $R = Q+RP \implies R=QP^*$ . Let's ensure Right Linear variables are factored correctly).
Wait, Right Linear Grammar equations actually yield $S = aA$ which translates to left-factored variables if not careful.
Actually, $S = 001^*S + 110^*S + 01^* = (001^* + 110^*)S + 01^*$ .
Applying Arden's Theorem ( $X = AX + B \implies X = A^*B$ is for left linear. For $R = Q + RP$ , $R = QP^*$ ):
$S = 01^* (001^* + 110^*)^*$ is mathematically incorrect due to right-linear derivation resulting in $X = aX + b$ .
Correction: For Right Linear Grammars, standard substitution yields $S = AS + B$ . The correct rule is $X = AX + B \implies X = A^* B$ .
Here, $S = (001^* + 110^*)S + 01^*$ .
So $S = (001^* + 110^*)^* 01^*$ .

Final Regular Expression:
$(001^* + 110^*)^* 01^*$

What are Regular Sets? Discuss their relation to Regular Grammars.

Regular Sets:
A Regular Set is a language over an alphabet $\Sigma$ that can be generated by a Regular Expression. Regular sets are constructed from basic sets (the empty set $\emptyset$ , the set containing the empty string $\{ \epsilon \}$ , and sets containing a single symbol $\{ a \}$ ) using three operations a finite number of times:

Union ( $A \cup B$ )
Concatenation ( $A \cdot B$ )
Kleene Closure / Star ( $A^*$ )

Relation to Regular Grammars:
Regular Sets and Regular Grammars are completely equivalent in terms of generative power.

Generative equivalence: The class of languages generated by Regular Grammars (Type 3 Grammars in the Chomsky hierarchy) is exactly the class of Regular Sets.
Mutual Conversion: Every Regular Set can be represented by a Regular Expression, which can be systematically converted into a Regular Grammar (either left-linear or right-linear). Conversely, any Regular Grammar can be converted into a Regular Expression representing a Regular Set.
Recognizers: Both are recognized by Finite Automata (DFA or NFA).

Differentiate between Leftmost Derivation and Rightmost Derivation. Provide an example.

Leftmost Derivation (LMD):
In a leftmost derivation, at each step, the leftmost non-terminal in the sentential form is chosen to be replaced by applying a production rule until the string consists entirely of terminal symbols.

Rightmost Derivation (RMD):
In a rightmost derivation, at each step, the rightmost non-terminal in the sentential form is chosen to be replaced by applying a production rule until only terminal symbols remain.

Example:
Consider the grammar for arithmetic expressions: $E \rightarrow E + E | E * E | id$
String to derive: $id + id * id$

Leftmost Derivation:
$E \Rightarrow E + E$ (Replaced leftmost E)
$\Rightarrow id + E$ (Replaced leftmost E)
$\Rightarrow id + E * E$ (Replaced leftmost E)
$\Rightarrow id + id * E$ (Replaced leftmost E)
$\Rightarrow id + id * id$ (Replaced leftmost E)

Rightmost Derivation:
$E \Rightarrow E + E$ (Replaced rightmost E)
$\Rightarrow E + E * E$ (Replaced rightmost E)
$\Rightarrow E + E * id$ (Replaced rightmost E)
$\Rightarrow E + id * id$ (Replaced rightmost E)
$\Rightarrow id + id * id$ (Replaced rightmost E)

Are Left Linear Grammars and Right Linear Grammars equivalent in power? Justify your answer.

Yes, Left Linear Grammars (LLG) and Right Linear Grammars (RLG) are strictly equivalent in their generative power.

Justification:

Both LLG and RLG generate the exact same class of languages, known as Regular Languages.
Every RLG has a corresponding NFA/DFA that accepts the same language. Similarly, every LLG has a corresponding NFA/DFA.
Conversion: If a language $L$ is generated by an RLG, its reverse $L^R$ can be generated by an LLG formed by reversing the productions (e.g., $A \rightarrow aB$ becomes $A \rightarrow Ba$ ). Since regular languages are closed under string reversal, the original language $L$ can also be represented by an LLG.
Therefore, any language that can be defined by a Left Linear Grammar can also be defined by a Right Linear Grammar, and vice versa.
Note: Mixing left linear and right linear rules in the same grammar makes it Context-Free, and it may no longer generate a Regular Language.

What is an Unrestricted Grammar? Discuss its rule constraints and expressive power.

Unrestricted Grammar (Type 0 Grammar):
An unrestricted grammar is the most general class of grammars in the Chomsky hierarchy. As the name suggests, it places virtually no restrictions on the form of its production rules.

Rule Constraints:
Productions are of the form $\alpha \rightarrow \beta$ , where:

$\alpha \in (V \cup T)^+$ ( $\alpha$ is a non-empty string of variables and terminals).
$\beta \in (V \cup T)^*$ ( $\beta$ is any string of variables and terminals, including the empty string $\epsilon$ ).
The only strict constraint is that the left side ( $\alpha$ ) must contain at least one non-terminal variable (it cannot consist entirely of terminals).

Expressive Power:

Unrestricted Grammars have the highest expressive power among all formal grammars.
They generate Recursively Enumerable (RE) Languages.
This class of languages exactly matches the computational capability of a Turing Machine. Any language that can be computed or recognized by a Turing Machine can be generated by an unrestricted grammar.

Describe Context-Sensitive Languages (Type 1). Why are they called 'context-sensitive'?

Context-Sensitive Languages (Type 1):
These are languages generated by Context-Sensitive Grammars (CSG) and are accepted by Linear Bounded Automata (LBA).

Production Rules:
Productions are of the form $\alpha \rightarrow \beta$ , subject to the length restriction:
$|\alpha| \le |\beta|$
This means the length of the string on the right-hand side must be at least as long as the string on the left-hand side. The only exception allows for $S \rightarrow \epsilon$ if the start symbol $S$ does not appear on the right side of any production.

Why 'Context-Sensitive'?
The standard form for CSG rules is $\alpha_1 A \alpha_2 \rightarrow \alpha_1 \gamma \alpha_2$ , where $A$ is a non-terminal, and $\gamma$ is a non-empty string.
This notation implies that the variable $A$ can be replaced by the string $\gamma$ only if it is surrounded by the specific context $\alpha_1$ (left context) and $\alpha_2$ (right context). Because the replacement rule relies on the surrounding symbols (the context), these grammars and their resulting languages are termed "context-sensitive."

Discuss the closure properties of Recursive and Recursively Enumerable Sets.

Closure Properties of Recursive Sets:
Recursive sets (decidable languages) are closed under:

Union, Intersection, Concatenation, and Kleene Star: Similar to other language classes.
Complementation: This is a crucial property. If $L$ is recursive, an algorithm exists that halts and accepts if $w \in L$ and halts and rejects if $w \notin L$ . By simply swapping the accept and reject states of this Turing Machine, we get a TM that decides the complement language. Thus, the complement is also recursive.

Closure Properties of Recursively Enumerable (RE) Sets:
RE sets (Turing-recognizable languages) are closed under:

Union, Intersection, Concatenation, and Kleene Star.
NOT closed under Complementation: If $L$ is RE, its complement $\overline{L}$ is not necessarily RE. If both $L$ and $\overline{L}$ are RE, then $L$ is actually Recursive. This highlights the semi-decidable nature of RE sets; a TM might loop forever on strings not in the language, so swapping accept/reject states does not produce a valid recognizer for the complement.

Convert the Regular Expression $(a+b)^* a$ into a Regular Grammar.

Step 1: Construct NFA for $(a+b)^* a$
Let the states be $q_0$ (start state) and $q_1$ (final state).

The $(a+b)^*$ part means from $q_0$ , on reading 'a' or 'b', stay in $q_0$ .
The terminating 'a' means from $q_0$ , on reading 'a', transition to $q_1$ .
Transitions:
$\delta(q_0, a) = \{q_0, q_1\}$
$\delta(q_0, b) = \{q_0\}$
Final state: $F = \{q_1\}$

Step 2: Convert NFA to Regular Grammar
Create variables for each state: Let $q_0$ be $S$ and $q_1$ be $A$ .
Productions are mapped as follows:

$\delta(q_0, a) = q_0 \implies S \rightarrow aS$
$\delta(q_0, a) = q_1 \implies S \rightarrow aA$
$\delta(q_0, b) = q_0 \implies S \rightarrow bS$
Since $q_1$ ( $A$ ) is a final state, add an epsilon production:
$A \rightarrow \epsilon$

Alternatively, mapping directly to a terminal if ending in a final state:

$S \rightarrow aS | bS | a$

Final Regular Grammar:
$G = (\{S, A\}, \{a, b\}, P, S)$
where $P$ consists of:
$S \rightarrow aS | bS | aA$
$A \rightarrow \epsilon$
(Or simplified: $S \rightarrow aS | bS | a$ )

Explain the equivalence between Finite Automata (FA) and Regular Grammars.

Finite Automata (FA) and Regular Grammars are two different ways to represent the exact same class of languages (Regular Languages/Type 3).

Equivalence means:

For every Regular Grammar, there exists an FA that accepts the language generated by it.
For every FA, there exists a Regular Grammar that generates the language accepted by it.

1. Grammar to FA:
Given a Right Linear Grammar, we can construct an NFA.

Non-terminals become states. The Start symbol becomes the initial state.
A rule $A \rightarrow aB$ translates to a transition from state $A$ to state $B$ on input symbol 'a'.
A rule $A \rightarrow a$ translates to a transition from state $A$ to a newly created final state on input 'a' (or $A$ can be a final state if $A \rightarrow \epsilon$ ).

2. FA to Grammar:
Given a DFA or NFA, we construct a Right Linear Grammar.

Every state $q_i$ becomes a non-terminal variable.
A transition $\delta(q_i, a) = q_j$ becomes the production rule $q_i \rightarrow a q_j$ .
If $q_k$ is a final state, we add the production $q_k \rightarrow \epsilon$ .
Because this translation is systematic and reversible, FA and Regular Grammars are computationally equivalent.

What are the rules for defining a Context-Free Grammar (CFG)? Provide an example of a language generated by a CFG that is not Regular.

Rules for Defining Context-Free Grammar (Type 2):
A CFG is a 4-tuple $G = (V, T, P, S)$ , where the primary restriction lies in its production rules $P$ .

Every production rule must be of the exact form: $A \rightarrow \alpha$
Left side ( $A$ ): Must be exactly one non-terminal variable ( $A \in V$ ).
Right side ( $\alpha$ ): Can be any string of variables and terminals ( $\alpha \in (V \cup T)^*$ ), including the empty string $\epsilon$ .
The substitution of $A$ by $\alpha$ depends only on the presence of $A$ , regardless of its surrounding context.

Example of a Non-Regular CFG Language:
The language $L = \{ a^n b^n | n \ge 1 \}$ is Context-Free but not Regular. It requires counting the number of 'a's to ensure an equal number of 'b's, which finite automata cannot do, but a PDA (CFG) can accomplish using memory.
Grammar for $L$ :
$S \rightarrow aSb | ab$
Here, the single non-terminal $S$ on the left side satisfies the CFG constraint, and the nested structure allows it to generate symmetrical strings.

Construct a Regular Grammar to generate the language $L$ consisting of strings over $\{a, b\}$ ending with 'abb'.

Step 1: Understand the Language and construct a DFA/NFA
The regular expression for strings ending in 'abb' is: $(a+b)^* abb$
Let's construct an NFA for this RE:

State $S$ (Start): Loop on 'a' and 'b'. Transition to $A$ on 'a' to start the "abb" sequence.
State $A$ : Transition to $B$ on 'b'.
State $B$ : Transition to $C$ (Final state) on 'b'.
Transitions:
$\delta(S, a) = \{S, A\}$
$\delta(S, b) = \{S\}$
$\delta(A, b) = \{B\}$
$\delta(B, b) = \{C\}$

Step 2: Convert NFA to Regular Grammar
Map states to Non-terminals: $S, A, B, C$ . Make $C$ the final state.
Productions based on transitions:
$S \rightarrow aS | bS | aA$
$A \rightarrow bB$
$B \rightarrow bC$
$C \rightarrow \epsilon$
(Alternatively, instead of $B \rightarrow bC$ and $C \rightarrow \epsilon$ , write $B \rightarrow b$ )

Final Right Linear Grammar:
$S \rightarrow aS | bS | aA$
$A \rightarrow bB$
$B \rightarrow b$

Unit2 Unit4