Unit 4 - Notes

CSE205

Unit 4: Trees and Recursion

1. Binary Trees

Introduction

A Binary Tree is a non-linear hierarchical data structure. Unlike arrays or linked lists (which are linear), trees store data in a parent-child relationship.

Definition: A binary tree is a finite set of elements that is either:

  1. Empty (null).
  2. Partitioned into three disjoint subsets:
    • A single node called the Root.
    • Two disjoint sets which are themselves binary trees, called the Left Subtree and the Right Subtree.

Key Constraint: Each node in a binary tree can have at most two children (degree 2).

Important Terminologies

  • Root: The topmost node of the tree.
  • Leaf (Terminal Node): A node with no children.
  • Internal Node: A node with at least one child.
  • Depth/Level: The distance from the root (Root is at level 0).
  • Height: The length of the longest path from a node to a leaf.

Special Types of Binary Trees

A. Complete Binary Tree

A binary tree is Complete if:

  1. Every level, except possibly the last, is completely filled.
  2. All nodes in the last level are as far left as possible.
    • Properties: Ideal for array-based implementation (heaps).
    • Total Nodes: If height is , max nodes .

B. Extended Binary Tree (2-Tree)

A binary tree is converted into an extended binary tree by adding special nodes (square nodes) to every null link of the original tree.

  • Internal Nodes (Circles): The original data-holding nodes.
  • External Nodes (Squares): The dummy nodes replacing NULL pointers.
  • Property: If there are internal nodes, there are external nodes. This concept is often used to calculate Weighted Path Length (WPL) for optimization.

Memory Representation

There are two primary ways to store binary trees in memory:

1. Sequential Representation (Array-based)

Nodes are stored in a linear array using relative positioning.

  • The Root is stored at index $0$.

  • If a node is at index :

    • Left Child is at index .
    • Right Child is at index .
    • Parent is at index .
  • Advantages: Fast access; ideal for Complete Binary Trees.

  • Disadvantages: Wastes memory for skewed (sparse) trees; fixed size.

2. Linked Representation (Pointer-based)

Nodes are dynamically allocated in memory. Each node consists of three parts: a data field and two pointer fields.

Structure Definition (C-style):

C
struct Node {
    int data;
    struct Node* left;
    struct Node* right;
};

  • Advantages: Dynamic size; efficient memory usage for all tree shapes; easier insertion/deletion.
  • Disadvantages: Overhead of storing pointers.

2. Binary Search Tree (BST)

Introduction

A Binary Search Tree (BST) is a specific type of binary tree organized to allow efficient data retrieval.

Properties:
For any given node :

  1. The value of all nodes in the Left Subtree is less than the value of .
  2. The value of all nodes in the Right Subtree is greater than the value of .
  3. Duplicate values are generally not allowed.

Operations

A. Searching

  1. Start at the Root.
  2. If the tree is empty, return NULL (not found).
  3. If target == root->data, return Root.
  4. If target < root->data, recurse to the Left Subtree.
  5. If target > root->data, recurse to the Right Subtree.
    • Complexity: Best/Avg: , Worst (skewed): .

B. Insertion

  1. Perform a search logic to find the correct location.
  2. Traverse down the tree until a NULL pointer is hit where the node should be.
  3. Create the new node and link it to the parent.

C. Deletion

Deletion is complex because the tree structure must be maintained. There are three cases:

  1. Node is a Leaf: Simply remove the node (set parent's pointer to NULL).
  2. Node has One Child: Bypass the node. Link the parent of the node directly to the child of the node.
  3. Node has Two Children:
    • Find the In-order Successor (smallest node in the right subtree) OR In-order Predecessor (largest node in the left subtree).
    • Replace the value of the node to be deleted with the Successor's value.
    • Delete the Successor node (which will now be a Case 1 or Case 2 scenario).

3. Tree Traversals

Traversal is the process of visiting every node in a tree exactly once.

A. Pre-order Traversal (Root - Left - Right)

Used to create a copy of the tree or get prefix expression.
Algorithm:

  1. Visit Root.
  2. Traverse Left Subtree.
  3. Traverse Right Subtree.

C
void preOrder(struct Node* root) {
    if (root == NULL) return;
    printf("%d ", root->data); // Visit
    preOrder(root->left);      // Left
    preOrder(root->right);     // Right
}

B. In-order Traversal (Left - Root - Right)

Crucial Property: In a BST, in-order traversal yields values in sorted (ascending) order.
Algorithm:

  1. Traverse Left Subtree.
  2. Visit Root.
  3. Traverse Right Subtree.

C
void inOrder(struct Node* root) {
    if (root == NULL) return;
    inOrder(root->left);       // Left
    printf("%d ", root->data); // Visit
    inOrder(root->right);      // Right
}

C. Post-order Traversal (Left - Right - Root)

Used to delete the tree (delete children before parent) or get postfix expression.
Algorithm:

  1. Traverse Left Subtree.
  2. Traverse Right Subtree.
  3. Visit Root.

C
void postOrder(struct Node* root) {
    if (root == NULL) return;
    postOrder(root->left);     // Left
    postOrder(root->right);    // Right
    printf("%d ", root->data); // Visit
}


4. Recursion

Introduction

Recursion is a programming technique where a function calls itself to solve a problem. A complex problem is broken down into smaller, self-similar sub-problems.

Components of a Recursive Function:

  1. Base Case (Termination Condition): The condition under which the function stops calling itself. Without this, a stack overflow occurs (infinite recursion).
  2. Recursive Case: The part where the function calls itself with modified arguments, moving closer to the base case.

Memory in Recursion:
Recursion uses the System Stack. Every time a function calls itself, an Activation Record (containing local variables, parameters, and return address) is pushed onto the stack. When the base case is reached, the stack unwinds.


5. Recursive Implementation of Algorithms

A. Towers of Hanoi

A mathematical puzzle involving three rods (Source, Auxiliary, Destination) and disks of different sizes.
Rules:

  1. Only one disk can be moved at a time.
  2. A larger disk cannot be placed on top of a smaller disk.

Algorithm:
To move disks from Source to Destination:

  1. Move disks from Source to Auxiliary (using Destination as buffer).
  2. Move the largest disk () from Source to Destination.
  3. Move disks from Auxiliary to Destination (using Source as buffer).

Complexity: .

C
void towerOfHanoi(int n, char source, char dest, char aux) {
    if (n == 1) {
        printf("Move disk 1 from %c to %c\n", source, dest);
        return;
    }
    towerOfHanoi(n - 1, source, aux, dest);
    printf("Move disk %d from %c to %c\n", n, source, dest);
    towerOfHanoi(n - 1, aux, dest, source);
}

B. Merge Sort

A Divide and Conquer algorithm. It divides the input array into two halves, calls itself for the two halves, and then merges the two sorted halves.

Steps:

  1. Divide: Find the middle point to divide the array into two halves.
  2. Conquer: Call Merge Sort for the first half and the second half recursively.
  3. Combine: Merge the two sorted halves into a single sorted array.

Complexity: in all cases.
Space: (requires auxiliary array).

C
void mergeSort(int arr[], int l, int r) {
    if (l < r) {
        int m = l + (r - l) / 2;
        mergeSort(arr, l, m);     // Sort first half
        mergeSort(arr, m + 1, r); // Sort second half
        merge(arr, l, m, r);      // Merge sorted halves
    }
}

C. Quick Sort

Another Divide and Conquer algorithm, also known as Partition-Exchange sort.

Steps:

  1. Pick a Pivot: Choose an element (first, last, random, or median).
  2. Partition: Rearrange the array so that:
    • Elements smaller than the pivot are on the left.
    • Elements greater than the pivot are on the right.
    • The pivot is in its final sorted position.
  3. Recursion: Recursively apply the above steps to the sub-array on the left of the pivot and the sub-array on the right.

Complexity:

  • Best/Average: .
  • Worst: (occurs when the array is already sorted and the first/last element is picked as pivot).
    Space: (stack space).

C
void quickSort(int arr[], int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high); // Partition index
        quickSort(arr, low, pi - 1);  // Before pivot
        quickSort(arr, pi + 1, high); // After pivot
    }
}