Unit 5 - Notes

CSE322 9 min read

Unit 5: PUSHDOWN AUTOMATA AND PARSING

1. Description and Model of Pushdown Automata (PDA)

A Pushdown Automaton (PDA) is essentially a Finite Automaton (FA) equipped with an auxiliary memory storage called a Stack. The stack operates on a Last-In-First-Out (LIFO) basis.

The Need for PDA

Finite Automata cannot handle languages that require unbounded memory (counting), such as . A PDA uses the stack to "remember" the number of 's by pushing them onto the stack and matching them with 's by popping.

The Model

A PDA consists of three main components:

  1. Input Tape: Holds the input string. The input head reads one symbol at a time and moves right.
  2. Finite Control: The "brain" of the automaton, similar to a DFA/NFA, which is in a specific state at any given time.
  3. Stack: An infinite tape that can hold symbols. The machine can:
    • Push: Add a symbol to the top.
    • Pop: Remove a symbol from the top.
    • Read: Only the top symbol of the stack is accessible.

2. Representation of PDA

Formally, a PDA is defined as a 7-tuple:

Where:

  • : A finite set of states.
  • : A finite set of input symbols (Input Alphabet).
  • : A finite set of stack symbols (Stack Alphabet). Note that and can overlap.
  • : The initial state ().
  • : The initial stack symbol (), which is on the stack at the start.
  • : The set of final/accepting states ().
  • : The transition function, defined as:

    Interpretation: means:

    • Current state is .
    • Input symbol read is (or for empty move).
    • Top of stack is .
    • The machine moves to state and replaces with string on the stack.
      • If , it is a Pop operation.
      • If , the stack remains unchanged.
      • If , it is a Push operation (pushing ).

Instantaneous Description (ID)

An ID describes the configuration of the PDA at a specific moment. It is represented as a triplet:

  • : Current state.
  • : Remaining input string.
  • : Current stack content (leftmost symbol is top of stack).

3. Acceptance by PDA

A PDA can define a language in two distinct ways. Both methods are equivalent in power.

A. Acceptance by Final State

The PDA accepts string if, after reading the entire input, the machine enters a state belonging to . The stack content does not matter.

B. Acceptance by Empty Stack

The PDA accepts string if, after reading the entire input, the stack becomes empty. The final state of the control does not matter.

Note: If a language is accepted by empty stack, there exists a PDA that accepts it by final state, and vice versa.

4. Pushdown Automata: NDPDA and DPDA

Non-Deterministic PDA (NDPDA)

In an NDPDA, for a given state, input symbol, and stack symbol, there may be multiple possible moves (transitions).

  • Characteristic: The transition function maps to a subset of (a set of possibilities).
  • Power: NDPDAs accept the full class of Context-Free Languages (CFLs).

Deterministic PDA (DPDA)

A DPDA has at most one legal move for any configuration.

  • Conditions for Determinism:
    1. for all .
    2. If is defined (non-empty), then must be empty for all . (No conflict between reading an input and making an -move).
  • Power: DPDAs accept Deterministic Context-Free Languages (DCFLs).

Comparison: Deterministic vs. Non-Deterministic

Feature Deterministic PDA (DPDA) Non-Deterministic PDA (NDPDA)
Moves Single unique move for any config. Multiple choices permitted.
Language Class Accepts DCFLs. Accepts all CFLs.
Ambiguity Handles unambiguous grammars only. Can handle ambiguous grammars.
Power Less powerful than NDPDA. More powerful than DPDA.
Equivalence to FA DPDA NDPDA. NFA DFA, but this does not hold for PDAs.

Crucial Note: Unlike Finite Automata where DFA and NFA are equivalent, NDPDAs are strictly more powerful than DPDAs.

5. Context-Free Languages and PDA

The Fundamental Theorem

There is a strong equivalence between Context-Free Grammars (CFG) and Pushdown Automata:

  1. For every Context-Free Language (generated by a CFG), there exists an NDPDA that accepts .
  2. For every PDA , the language accepted by is a Context-Free Language.

Equivalence Construction

1. CFG to PDA

Given a CFG , we construct a PDA that simulates the leftmost derivation of .

  • Start: Push onto the stack.
  • Transitions:
    • If the stack top is a Variable : Pop and push the right-hand side of a production (non-deterministically).
    • If the stack top is a Terminal : Read input . If they match, pop . If not, crash (reject).

2. PDA to CFG

Given a PDA, we can construct a grammar where variables represent moving between states while popping specific symbols from the stack. The variables are usually of the form , representing a computation starting in state with on the stack and ending in state with popped.

6. Closure Properties of CFLs

Context-Free Languages are closed under:

  1. Union: If and are CFLs, is a CFL.
  2. Concatenation: If and are CFLs, is a CFL.
  3. Kleene Star: If is a CFL, is a CFL.
  4. Reversal: If is a CFL, is a CFL.
  5. Homomorphism and Inverse Homomorphism.

Context-Free Languages are NOT closed under:

  1. Intersection: If and are CFLs, is not necessarily a CFL.
    • Example: and are CFLs, but is not.
  2. Complementation: If is a CFL, is not necessarily a CFL. (This follows from De Morgan's laws and non-closure under intersection).
  3. Difference: is not necessarily a CFL.

Note: DCFLs (Deterministic CFLs) are closed under Complementation but not under Union or Intersection.


PARSING

7. Introduction to Parsing

Parsing (or Syntax Analysis) is the process of checking whether a given string of symbols can be generated by a specific grammar. It involves constructing a Parse Tree (or Derivation Tree) for the input string.

Goals of Parsing

  • Check syntax validity.
  • Produce a parse tree to be used for semantic analysis and code generation.

8. Classification of Parsing Techniques

Parsing is broadly classified into two categories based on how the parse tree is constructed:

A. Top-Down Parsing

  • Direction: Starts from the Start Symbol () and attempts to derive the input string .
  • Tree Construction: Builds the tree from the Root down to the Leaves.
  • Method: Uses Leftmost Derivation.
  • Challenges:
    • Backtracking: If a chosen production rule turns out to be wrong, the parser must undo steps.
    • Left Recursion: If , the parser may enter an infinite loop. This must be eliminated.
    • Left Factoring: Required when multiple productions start with the same symbol (e.g., ).

B. Bottom-Up Parsing

  • Direction: Starts from the input string and attempts to reduce it to the Start Symbol ().
  • Tree Construction: Builds the tree from the Leaves up to the Root.
  • Method: Uses Reverse of Rightmost Derivation.
  • Mechanism: Uses "Shift" (move input to stack) and "Reduce" (replace substring on stack with a non-terminal) operations. This is often called Shift-Reduce Parsing.
  • Handle: The substring currently being reduced is called the "handle".

9. LL(k) Grammars and Properties

LL(k) parsers are a type of Deterministic Top-Down parser.

  • First 'L': Scans input from Left to right.
  • Second 'L': Produces a Leftmost derivation.
  • 'k': Uses k symbols of lookahead to make decisions.

LL(1) Grammar

The most common form is LL(1). A grammar is LL(1) if the parsing table has no multiple entries (no conflicts).

Properties/Conditions for LL(1):
For any two distinct productions and :

  1. and must be disjoint sets.
  2. If , then and must be disjoint.

Limitations:

  • Cannot handle Left Recursive grammars.
  • Cannot handle Ambiguous grammars.

10. LR(k) Grammars and Properties

LR(k) parsers are a type of Deterministic Bottom-Up parser.

  • 'L': Scans input from Left to right.
  • 'R': Constructs a Rightmost derivation in reverse.
  • 'k': Uses k symbols of lookahead.

Types of LR Parsers

  1. LR(0): No lookahead. Simplest but least powerful.
  2. SLR(1) (Simple LR): Uses FOLLOW sets for reduction decisions. More powerful than LR(0).
  3. LALR(1) (Look-Ahead LR): Merges states of LR(1) to reduce table size. Standard in tools like YACC/Bison.
  4. CLR(1) (Canonical LR): Full lookahead. Most powerful but has very large parsing tables.

Properties of LR Grammars

  • Power: LR(k) grammars cover strictly more languages than LL(k). All LL(k) grammars are LR(k), but not vice versa.
  • Detecting Errors: LR parsers detect syntax errors as soon as possible on a left-to-right scan.
  • Determinism: They handle virtually all deterministic context-free languages.
  • Conflicts: If a grammar is not LR, the parsing table will have conflicts:
    • Shift-Reduce Conflict: Parser doesn't know whether to shift input or reduce stack.
    • Reduce-Reduce Conflict: Parser has two valid rules to reduce the stack content.

Summary Comparison: Top-Down vs Bottom-Up

Feature Top-Down (LL) Bottom-Up (LR)
Start Point Root () Leaves (Input String)
Derivation Leftmost Reverse Rightmost
Recursion Support Cannot handle Left Recursion Handles Left Recursion
Lookahead Usually 1 (LL(1)) Usually 1 (LALR, SLR)
Complexity Easier to implement manually (Recursive Descent) Harder to implement manually; uses tools
Power Less powerful More powerful