Unit 1 - Notes

CSE408 7 min read

Unit 1: Foundations of Algorithm

1. Algorithms

An Algorithm is a sequence of unambiguous, step-by-step instructions for solving a problem within a finite amount of time. It takes a set of input values and produces a set of output values.

Key Characteristics of an Algorithm:

Finiteness: Must always terminate after a finite number of steps.
Definiteness: Each step must be precisely defined; actions to be carried out must be rigorously and unambiguously specified for each case.
Input: Has zero or more valid inputs provided before the algorithm begins.
Output: Has one or more outputs that have a specified relation to the inputs.
Effectiveness: Operations must be basic enough to be done exactly and in a finite length of time by someone using pencil and paper.

2. Fundamentals of Algorithmic Problem Solving

Designing an algorithm involves a sequence of well-defined steps:

Understanding the Problem: Read the problem description carefully, identify the inputs and desired outputs, and recognize any constraints.
Ascertaining the Capabilities of the Computational Device: Decide if the problem requires sequential execution (Von Neumann architecture) or parallel computing.
Choosing between Exact and Approximate Problem Solving: Determine if an exact solution is required or if an approximation is acceptable (especially for NP-hard problems).
Deciding on Appropriate Data Structures: Select the most efficient way to store and organize the data (e.g., arrays, graphs, trees).
Algorithm Design Techniques: Apply a general approach to solving the problem (e.g., divide-and-conquer, dynamic programming).
Methods of Specifying an Algorithm: Write the algorithm using pseudocode or a flowchart.
Proving an Algorithm's Correctness: Use mathematical induction or loop invariants to prove the algorithm yields the correct output for all legitimate inputs.
Analyzing an Algorithm: Evaluate time and space complexity.
Coding an Algorithm: Translate the algorithm into a specific programming language.

3. Basic Algorithm Design Techniques

Algorithm design techniques (or strategies) are general approaches to solving problems algorithmically.

Brute Force: A straightforward approach based directly on the problem statement.
Divide and Conquer: Divides the problem into smaller subproblems, solves them recursively, and combines their solutions.
Decrease and Conquer: Reduces a problem to a smaller instance of the same problem, solves it, and extends the solution.
Transform and Conquer: Modifies the problem into a more easily solvable one (e.g., sorting data before searching).
Greedy Approach: Makes a locally optimal choice at each step with the hope of finding a global optimum.
Dynamic Programming: Breaks a problem into overlapping subproblems and stores the results to avoid redundant calculations.
Backtracking & Branch and Bound: Systematically searches for a solution by exploring all possibilities and pruning unpromising paths.

4. Analyzing Algorithm

Once an algorithm is designed, it must be analyzed based on several criteria:

Correctness: Does it do what it is supposed to do?
Time Efficiency: How fast does the algorithm run?
Space Efficiency: How much extra memory does the algorithm require?
Simplicity: Is the algorithm easy to understand and implement?
Generality: Can the algorithm handle a wide range of inputs or be adapted for similar problems?

5. Fundamental Data Structures

Linear Data Structures

Data elements are arranged in a sequential manner.

Arrays: A collection of elements identified by index or key, stored in contiguous memory locations. Allows $O(1)$ access time but fixed size.
Linked Lists: A sequence of nodes where each node contains data and a reference (link) to the next node. Allows efficient insertions/deletions but requires sequential access.
Stacks: A Last-In-First-Out (LIFO) structure. Operations include push (insert) and pop (remove).
Queues: A First-In-First-Out (FIFO) structure. Operations include enqueue (insert) and dequeue (remove).

Graphs and Trees

Non-linear data structures used to represent complex relationships.

Graphs: Consist of a set of vertices (nodes) and a set of edges connecting them. Can be directed or undirected, weighted or unweighted. Represented using Adjacency Matrices or Adjacency Lists.
Trees: A connected, acyclic graph.
- Rooted Tree: One vertex is designated as the root.
- Binary Tree: Each node has at most two children (left and right).
- Binary Search Tree (BST): A binary tree where the left child's value is less than the parent's, and the right child's is greater.

6. Fundamentals of the Analysis of Algorithm Efficiency

Measuring of Input Size

The efficiency of an algorithm is analyzed as a function of the input size, denoted as $n$ .

Examples: For sorting, $n$ is the number of elements in the array. For matrix multiplication, $n$ is the dimension of the matrices. For graphs, size is often measured by the number of vertices $|V|$ and edges $|E|$ .

Units for Measuring Running Time

Instead of measuring physical time (which depends on hardware, compiler, etc.), we count the number of times the basic operation is executed.

Basic Operation: The operation that contributes most heavily to the running time, usually the most time-consuming operation in the innermost loop.
Let $c_{op}$ be the execution time of the basic operation, and $C(n)$ be the number of times it is executed. The estimated running time is $T(n) \approx c_{op} \times C(n)$ .

Order of Growth

We are primarily interested in how the running time grows as the input size $n$ increases toward infinity. Small constant factors and lower-order terms are ignored.

Common growth rates: $O(1) < O(\log n) < O(n) < O(n \log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)$

Worst-Case, Best-Case, and Average-Case Efficiencies

Worst-Case Efficiency: The maximum number of basic operations executed for any input of size $n$ . Guarantees the algorithm will not run longer than this.
Best-Case Efficiency: The minimum number of basic operations executed for any input of size $n$ . Usually occurs under ideal conditions (e.g., sorting an already sorted array).
Average-Case Efficiency: The expected number of basic operations executed for a typical, randomly generated input of size $n$ . Requires probabilistic assumptions about the input distribution.

7. Asymptotic Notations and Basic Efficiency Classes

Asymptotic notations are formal mathematical tools used to describe the order of growth of algorithms.

O (Big-Oh) Notation: Upper Bound

$t(n) \in O(g(n))$ if there exist positive constants $c$ and $n_0$ such that:
$t(n) \leq c \cdot g(n) \text{ for all } n \geq n_0$
Meaning: $t(n)$ grows no faster than $g(n)$ . It provides a worst-case scenario.

$\Omega$ (Big-Omega) Notation: Lower Bound

$t(n) \in \Omega(g(n))$ if there exist positive constants $c$ and $n_0$ such that:
$t(n) \geq c \cdot g(n) \text{ for all } n \geq n_0$
Meaning: $t(n)$ grows at least as fast as $g(n)$ . It provides a best-case scenario.

$\Theta$ (Big-Theta) Notation: Tight Bound

$t(n) \in \Theta(g(n))$ if there exist positive constants $c_1, c_2,$ and $n_0$ such that:
$c_1 \cdot g(n) \leq t(n) \leq c_2 \cdot g(n) \text{ for all } n \geq n_0$
Meaning: $t(n)$ grows at the same rate as $g(n)$ .

Useful Properties Involving the Asymptotic Notations

Theorem: If $t_1(n) \in O(g_1(n))$ and $t_2(n) \in O(g_2(n))$ , then:
$t_1(n) + t_2(n) \in O(\max\{g_1(n), g_2(n)\})$
(This property also applies to $\Omega$ and $\Theta$ ).
Multiplication: If $t_1(n) \in O(g_1(n))$ and $t_2(n) \in O(g_2(n))$ , then $t_1(n) \cdot t_2(n) \in O(g_1(n) \cdot g_2(n))$ .
Transitivity: If $f(n) \in O(g(n))$ and $g(n) \in O(h(n))$ , then $f(n) \in O(h(n))$ .

Using Limits for Comparing Orders of Growth

Limits provide a rigorous mathematical way to compare the asymptotic growth of two functions $t(n)$ and $g(n)$ .
We evaluate:
$L = \lim_{n \to \infty} \frac{t(n)}{g(n)}$

Based on the value of $L$ , we can conclude:

If $L = 0$ : $t(n)$ has a strictly smaller order of growth than $g(n)$ . ( $t(n) \in O(g(n))$ but $t(n) \notin \Theta(g(n))$ ).
If $L = c > 0$ : $t(n)$ has the same order of growth as $g(n)$ . ( $t(n) \in \Theta(g(n))$ ).
If $L = \infty$ : $t(n)$ has a strictly greater order of growth than $g(n)$ . ( $t(n) \in \Omega(g(n))$ but $t(n) \notin \Theta(g(n))$ ).

L'Hôpital's Rule and Stirling's Formula are frequently used to compute these limits for complex functions involving exponentials and factorials.

Unit 2