Unit 2 - Notes

CSE322 8 min read

Unit 2: REGULAR EXPRESSIONS AND REGULAR SETS

1. Regular Expressions (RE)

A Regular Expression is a algebraic notation used to describe a Regular Language. It serves as a declarative method to define strings accepted by a Finite Automata (FA).

1.1 Recursive Definition

For a finite alphabet $\Sigma$ :

$\emptyset$ is a regular expression denoting the empty set.
$\epsilon$ (epsilon) is a regular expression denoting the set $\{\epsilon\}$ (language containing only the empty string).
$a$ is a regular expression denoting the set $\{a\}$ , for all $a \in \Sigma$ .
If $R$ and $S$ are regular expressions denoting languages and , then:
- $R + S$ (Union) is an RE denoting $L(R) \cup L(S)$ .
- $R \cdot S$ (Concatenation) is an RE denoting $L(R) \cdot L(S)$ (often written as $RS$ ).
- $R^*$ (Kleene Closure) is an RE denoting $(L(R))^*$ .

1.2 Operator Precedence

To avoid excessive parentheses, the precedence order (highest to lowest) is:

Kleene Star ( $*$ )
Concatenation ( $\cdot$ )
Union ( $+$ or $|$ )

1.3 Identities for Regular Expressions

Two regular expressions $P$ and $Q$ are equivalent ( $P = Q$ ) if they represent the same language.

Basic Identities:

$\emptyset + R = R$
$\emptyset R = R \emptyset = \emptyset$
$\epsilon R = R \epsilon = R$
$\epsilon^* = \epsilon$
$\emptyset^* = \epsilon$
$R + R = R$ (Idempotent Law)
$R^* R^* = R^*$
$R R^* = R^* R = R^+$
$(R^*)^* = R^*$
$R^* = \epsilon + R + R^2 + \dots$

Distributive Laws:

$P(Q + R) = PQ + PR$
$(P + Q)R = PR + QR$

Important Identity (Shifting Rule):

$(PQ)^* P = P(QP)^*$

Identities involving Union and Closure:

$(P + Q)^* = (P^* Q^*)^* = (P^* + Q^*)^*$

2. Finite Automata and Regular Expressions

Regular expressions and Finite Automata are equivalent in power.

For every Regular Expression, there exists an FA (NFA- $\epsilon$ ) that accepts it.
For every FA, there exists a Regular Expression that describes the language accepted by it.

2.1 Transition Systems Containing Null Moves (NFA- $\epsilon$ )

An NFA with $\epsilon$ -moves allows the automaton to transition from one state to another without consuming any input symbol.

Formal Definition:
An NFA- $\epsilon$ is a 5-tuple $M = (Q, \Sigma, \delta, q_0, F)$ where:

$Q$ : Finite set of states.
$\Sigma$ : Finite set of input symbols.
$q_0$ : Initial state.
$F$ : Set of final states.
$\delta$ : Transition function defined as $\delta : Q \times (\Sigma \cup \{\epsilon\}) \rightarrow 2^Q$ .

2.2 $\epsilon$ -Closure

The $\epsilon$ -Closure of a state $q$ , denoted as $\epsilon\text{-closure}(q)$ , is the set of all states reachable from $q$ using only $\epsilon$ -transitions (including $q$ itself).

Algorithm to find $\epsilon$ -closure(q):

Start with set $S = \{q\}$ .
For every state $p$ in $S$ , if there is a transition $\delta(p, \epsilon) = r$ , add $r$ to $S$ .
Repeat step 2 until no new states can be added.

2.3 Conversion of NFA- $\epsilon$ to DFA

To convert an NFA with null moves to a Deterministic Finite Automata (DFA), we use the Subset Construction Algorithm modified for $\epsilon$ -closures.

Steps:

Start State: The start state of the DFA is $A = \epsilon\text{-closure}(q_0)$ .
Transitions: For a set of states $A$ (which is a single state in the resulting DFA) and input symbol $a \in \Sigma$ , calculate the next state:
$\delta_{DFA}(A, a) = \epsilon\text{-closure}\left( \bigcup_{q \in A} \delta_{NFA}(q, a) \right)$
Translation: From all states in A, find where input 'a' takes you, then take the $\epsilon$ -closure of those results.
Final States: Any set in the DFA containing at least one final state from the NFA is a final state in the DFA.

3. Algebraic Methods using Arden's Theorem

Arden's Theorem is a powerful tool for finding the Regular Expression corresponding to a Finite Automata by solving a system of linear equations.

3.1 Arden's Theorem Statement

Let $P$ and $Q$ be two regular expressions over $\Sigma$ .
If $P$ does not contain $\epsilon$ (null string), then the equation:
$R = Q + RP$
has a unique solution given by:
$R = QP^*$

3.2 Application of Arden's Theorem (FA to RE)

Method:

Write an equation for every state $q_i$ in the FA.
$q_i = \sum (\text{incoming state} \cdot \text{transition symbol})$
Note: If $q_i$ is the start state, add $\epsilon$ to the equation.
Solve the system of equations for the Final State(s) using substitution and Arden's Theorem.
If there are multiple final states, the resulting RE is the sum (Union) of the expressions derived for each final state.

4. Construction of Finite Automata Equivalent to a Regular Expression

We typically use Thompson’s Construction Method to convert a Regular Expression into an NFA- $\epsilon$ . This method is structural and builds the FA inductively.

Basic Building Blocks

For RE = $a$ :
[q0] --a--> ((qf))
For RE = $R + S$ (Union):
Create a new start state and new final state. Add $\epsilon$ -transitions from the new start to the start of $R$ and $S$ . Add $\epsilon$ -transitions from the final states of $R$ and $S$ to the new final state.
For RE = $R \cdot S$ (Concatenation):
Merge the final state of $R$ with the start state of $S$ (or add an $\epsilon$ -transition between them).
For RE = $R^*$ (Iteration):
- Add new start state ( $S_{new}$ ) and new final state ( $F_{new}$ ).
- Transition $S_{new} \to \text{Start}(R)$ via $\epsilon$ .
- Transition $\text{Final}(R) \to F_{new}$ via $\epsilon$ .
- Loop back: $\text{Final}(R) \to \text{Start}(R)$ via $\epsilon$ .
- Skip: $S_{new} \to F_{new}$ via $\epsilon$ .

5. Equivalence of Two Finite Automata and Two Regular Expressions

5.1 Equivalence of Two Regular Expressions

Two REs, $R_1$ and $R_2$ , are equivalent iff $L(R_1) = L(R_2)$ .
Testing Method:

Convert $R_1$ to DFA $M_1$ .
Convert $R_2$ to DFA $M_2$ .
Minimize both $M_1$ and $M_2$ .
Check if the minimized DFAs are Isomorphic (identical structure except for state names).

5.2 Equivalence of Two Finite Automata

Two FAs, $M_1$ and $M_2$ , are equivalent if they accept the same language.
Method 1: Product Construction (Difference Method)
Construct a new machine accepting $(L(M_1) \cap \overline{L(M_2)}) \cup (\overline{L(M_1)} \cap L(M_2))$ . If this language is empty, the FAs are equivalent.

Method 2: Equivalence Algorithm

Treat $M_1$ and $M_2$ as a single disconnected graph.
Apply the DFA minimization algorithm.
If the start state of $M_1$ and the start state of $M_2$ end up in the same equivalence class, the machines are equivalent.

6. Closure Properties of Regular Sets

If a class of languages is closed under an operation, applying that operation to languages in the class results in a language that is also in the class. Regular languages are closed under:

Union: If $L_1$ and $L_2$ are regular, $L_1 \cup L_2$ is regular. ( $R_1 + R_2$ )
Intersection: If $L_1$ and $L_2$ are regular, $L_1 \cap L_2$ is regular. (Proof via De Morgan's laws or Product Automata).
Complement: If $L$ is regular, $\Sigma^* - L$ is regular. (Flip final and non-final states in DFA).
Concatenation: If $L_1$ and $L_2$ are regular, $L_1 \cdot L_2$ is regular.
Kleene Closure: If $L$ is regular, $L^*$ is regular.
Difference: $L_1 - L_2 = L_1 \cap \overline{L_2}$ . Since regular sets are closed under intersection and complement, they are closed under difference.
Reversal: If $L$ is regular, $L^R$ is regular.
Homomorphism and Inverse Homomorphism.

7. Pumping Lemma for Regular Sets

The Pumping Lemma is a tool used primarily to prove that a language is NOT regular. It describes a property that all regular languages must possess (necessary condition).

7.1 Formal Statement

Let $L$ be a regular language. There exists a constant $p$ (the pumping length) such that for any string $w \in L$ with length $|w| \ge p$ , we can split $w$ into three parts, $w = xyz$ , satisfying the following conditions:

$|xy| \le p$
$|y| > 0$ (y is not empty)
For all $i \ge 0$ , the string $xy^iz \in L$ . (We can "pump" $y$ any number of times).

7.2 Steps to Prove Non-Regularity

To prove $L$ is not regular (Proof by Contradiction):

Assume $L$ is regular. Let $p$ be the pumping length.
Choose a string $w \in L$ such that $|w| \ge p$ . (Choose $w$ strategically to create a contradiction).
According to the lemma, $w$ can be split into $xyz$ .
Analyze cases for the split $xyz$ based on $|xy| \le p$ and $|y| > 0$ .
Find an $i$ such that $xy^iz \notin L$ .
This contradicts the Pumping Lemma. Therefore, the assumption is false, and $L$ is not regular.

Common Example: $L = \{ a^n b^n \mid n \ge 0 \}$ is not regular.

8. Myhill-Nerode Theorem

The Myhill-Nerode theorem provides a necessary and sufficient condition for a language to be regular. It characterizes regular languages based on the number of equivalence classes of a specific relation.

8.1 Indistinguishability

Two strings $x$ and $y$ are distinguishable with respect to language $L$ if there exists a string $z$ such that exactly one of $xz$ or $yz$ is in $L$ . If no such $z$ exists, $x$ and $y$ are indistinguishable (equivalent), denoted $x \equiv_L y$ .

8.2 Theorem Statement

The following statements are equivalent:

$L$ is a regular language.
$L$ is the union of some of the equivalence classes of a right-invariant equivalence relation of finite index.
The relation $\equiv_L$ has a finite number of equivalence classes (finite index).

8.3 Applications

Minimization of DFA: The number of states in the minimal DFA for $L$ is exactly the number of equivalence classes of $\equiv_L$ .
Proving Non-Regularity: If the relation $\equiv_L$ has infinite equivalence classes, then $L$ is not regular.

9. Summary of Conversions (Equivalence Chain)

To ensure mastery of the unit, understand that all these models describe the exact same class of languages (Regular Languages):

$\text{DFA} \iff \text{NFA} \iff \text{NFA-}\epsilon \iff \text{Regular Expression}$

RE $\to$ NFA- $\epsilon$ : Thompson's Construction.
NFA- $\epsilon$ $\to$ NFA/DFA: Subset Construction with $\epsilon$ -closure.
DFA $\to$ RE: Arden's Theorem or State Elimination Method.
Minimization: Myhill-Nerode or Table Filling Algorithm.

Unit 1

Unit 3