Unit 3 - Notes

CSE408 8 min read

Unit 3: Divide and Conquer and Order Statistics

1. The Divide and Conquer Paradigm

Divide and Conquer is a fundamental algorithm design strategy that works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

Divide: Break the given problem into sub-problems of same type.
Conquer: Recursively solve these sub-problems.
Combine: Appropriately combine the answers.

Merge Sort

Algorithm: Divides the unsorted list into $n$ sublists, each containing one element (a list of one element is considered sorted). Repeatedly merges sublists to produce new sorted sublists until there is only one sorted list remaining.
Recurrence Relation: $T(n) = 2T(n/2) + O(n)$
Time Complexity: Best, Average, and Worst Case: $O(n \log n)$
Space Complexity: $O(n)$ (Requires auxiliary space for merging).
Stability: Stable.

Quick Sort

Algorithm: Picks an element as a pivot and partitions the given array around the picked pivot. The elements smaller than the pivot are placed before it, and elements greater are placed after. The subarrays are recursively sorted.
Recurrence Relation: $T(n) = T(k) + T(n-k-1) + O(n)$ (where $k$ is the pivot index).
Time Complexity:
- Best/Average Case: $O(n \log n)$ (Pivot divides array roughly into halves)
- Worst Case: $O(n^2)$ (Array is already sorted, and pivot is the smallest/largest element)
Space Complexity: $O(\log n)$ average, $O(n)$ worst (call stack).
Stability: Not stable in its standard form.

Binary Search

Algorithm: Searches a sorted array by repeatedly dividing the search interval in half. If the value of the search key is less than the item in the middle of the interval, narrow the interval to the lower half. Otherwise, narrow it to the upper half.
Recurrence Relation: $T(n) = T(n/2) + O(1)$
Time Complexity: $O(\log n)$ worst and average; $O(1)$ best.

Multiplication of Large Integers (Karatsuba Algorithm)

Standard Multiplication: Requires $O(n^2)$ bit operations for two $n$ -digit numbers.
Karatsuba Algorithm: Reduces the multiplication of two -digit numbers to at most $3$ multiplications of -digit numbers (instead of 4).
- Let $X = x_1 \cdot 10^{n/2} + x_0$ and $Y = y_1 \cdot 10^{n/2} + y_0$ .
- Compute $P_1 = x_1 \cdot y_1$ , $P_2 = x_0 \cdot y_0$ , and $P_3 = (x_1 + x_0) \cdot (y_1 + y_0)$ .
- $X \cdot Y = P_1 \cdot 10^n + (P_3 - P_1 - P_2) \cdot 10^{n/2} + P_2$ .
Recurrence Relation: $T(n) = 3T(n/2) + O(n)$
Time Complexity: $O(n^{\log_2 3}) \approx O(n^{1.585})$

Strassen's Matrix Multiplication

Standard Method: Multiplying two $n \times n$ matrices takes $O(n^3)$ operations.
Divide and Conquer (Naive): $T(n) = 8T(n/2) + O(n^2) \Rightarrow O(n^3)$ .
Strassen’s Method: Reduces the 8 recursive matrix multiplications to 7 using clever algebraic substitutions.
Recurrence Relation: $T(n) = 7T(n/2) + O(n^2)$
Time Complexity: $O(n^{\log_2 7}) \approx O(n^{2.81})$

Closest-Pair Problem (Divide and Conquer)

Problem: Find the two closest points in a set of $n$ points in a 2D plane.
Algorithm:
1. Sort points by x-coordinate.
2. Divide the set into two halves by a vertical line $x = c$ .
3. Recursively find the minimum distance in both halves, let $d = \min(d_L, d_R)$ .
4. Create a strip of width $2d$ around the dividing line and find any pair within this strip closer than $d$ (requires checking at most 7 neighbors per point if sorted by y-coordinate).
Time Complexity: $O(n \log n)$

Convex-Hull Problem (Divide and Conquer / QuickHull)

Problem: Find the smallest convex polygon containing all points in a given set.
Algorithm (Divide and Conquer): Sort points by x-coordinate, recursively find the convex hull of the left and right halves, and merge the two hulls by finding upper and lower tangents in linear time.
Time Complexity: $O(n \log n)$ .

2. Solving Recurrences

Recurrences express the running time of a recursive algorithm.

Substitution Method

Guess the form of the solution (e.g., $T(n) = O(n \log n)$ ).
Verify by mathematical induction.
Solve for constants.
- Use case: When you have a strong intuition about the bound.

Recursion-Tree Method

Draw a tree where nodes represent the cost of a single subproblem.
Sum the costs within each level of the tree.
Sum the per-level costs to determine the total cost.
- Use case: Great for visualizing and generating a good guess for the substitution method.

Master Method

Provides a cookbook method for solving recurrences of the form: $T(n) = aT(n/b) + f(n)$ , where $a \ge 1, b > 1$ .
Compare $f(n)$ with $n^{\log_b a}$ :

Case 1: If $f(n) = O(n^{\log_b a - \epsilon})$ for some $\epsilon > 0$ , then $T(n) = \Theta(n^{\log_b a})$ .
Case 2: If $f(n) = \Theta(n^{\log_b a} \log^k n)$ for $k \ge 0$ , then $T(n) = \Theta(n^{\log_b a} \log^{k+1} n)$ .
Case 3: If $f(n) = \Omega(n^{\log_b a + \epsilon})$ for some $\epsilon > 0$ and $af(n/b) \le cf(n)$ for $c < 1$ , then $T(n) = \Theta(f(n))$ .

3. Decrease and Conquer

A paradigm where you reduce a problem to a smaller instance of the same problem, solve the smaller instance, and extend the solution to the original problem.

Insertion Sort

Decrease by one: Assume $A[1 \dots i-1]$ is sorted. Insert $A[i]$ into its correct position to make $A[1 \dots i]$ sorted.
Time Complexity: $O(n^2)$ worst/average, $O(n)$ best.

Depth-First Search (DFS) and Breadth-First Search (BFS)

DFS: Explores as far as possible along each branch before backtracking. Uses a Stack (or recursion).
BFS: Explores the neighbor nodes first, before moving to the next level neighbors. Uses a Queue.
Time Complexity: $O(V + E)$ for both (using adjacency lists).

Connected Components

Using DFS or BFS, we can traverse an undirected graph. Every time we start a new traversal from an unvisited vertex, we discover a new connected component.
Time Complexity: $O(V + E)$ .

Topological Sort

Problem: Linear ordering of vertices in a Directed Acyclic Graph (DAG) such that for every directed edge $uv$ , vertex $u$ comes before $v$ .
Algorithm: Perform DFS. When a node finishes exploring all its neighbors (post-order), push it to a stack. Pop the stack to get the topological order. Alternatively, use Kahn's algorithm (in-degree counting).
Time Complexity: $O(V + E)$ .

4. Transform and Conquer

Solve a problem by transforming it into a simpler or more convenient instance, or representing it in a different data structure.

Presorting

Sorting data before solving a problem can often yield faster algorithms.
Examples: Finding element uniqueness ( $O(n \log n)$ via presorting vs $O(n^2)$ naive), computing the mode, or finding closest pairs in 1D.

Balanced Search Trees

Transforming a list into a balanced BST (like AVL Tree or Red-Black Tree) ensures that Search, Insert, and Delete operations take $O(\log n)$ worst-case time, avoiding the $O(n)$ worst-case of unbalanced trees.

Heaps and Heapsort

Heap: A nearly complete binary tree satisfying the heap property (Max-Heap or Min-Heap).
Heapsort Algorithm:
1. Build a Max-Heap from the input array ( $O(n)$ ).
2. Repeatedly swap the root (maximum element) with the last element, reduce heap size, and Heapify the root ( $O(\log n)$ per swap).
Time Complexity: $O(n \log n)$ (in-place sorting).

Hashing

Transforms keys into array indices using a hash function.
Allows average $O(1)$ time complexity for search, insert, and delete operations.
Must handle collisions (e.g., via Chaining or Open Addressing).

5. Order Statistics and Sorting Algorithms

Minimum and Maximum (Simultaneous)

Naive: Find min ( $n-1$ comparisons) and max ( $n-1$ comparisons) independently. Total $= 2n - 2$ .
Optimized: Compare elements in pairs. Compare the larger to the current max, and the smaller to the current min. Total comparisons $= 3n/2 - 2$ .

Selection Sort and Bubble Sort

Selection Sort: Repeatedly finds the minimum element from the unsorted part and puts it at the beginning. $O(n^2)$ time.
Bubble Sort: Repeatedly swaps adjacent elements if they are in the wrong order. $O(n^2)$ time.

Linear Time Sorting Algorithms (Non-Comparison Based)

Comparison-based sorting has a lower bound of $\Omega(n \log n)$ . The following sort in linear time under specific conditions:

Counting Sort

Concept: Counts the occurrences of each unique element in the input. Assumes elements are integers in a range $[0, k]$ .
Time Complexity: $O(n + k)$ . Stable sort. Used when $k = O(n)$ .

Radix Sort

Concept: Sorts elements digit by digit, starting from the least significant digit (LSD) to the most significant digit (MSD), using a stable sub-sort (like Counting Sort).
Time Complexity: $O(d(n + k))$ where $d$ is the number of digits and $k$ is the base (e.g., 10).

Bucket Sort

Concept: Divides the input into several uniformly distributed "buckets." Each bucket is sorted individually (e.g., using Insertion Sort), and then concatenated.
Time Complexity: Average $O(n)$ if the input is uniformly distributed. Worst-case $O(n^2)$ if all elements hash to the same bucket.

Unit 2

Unit 4