Unit 1 - Notes
Unit 1: Foundations of Algorithm
1. Algorithms
An Algorithm is a sequence of unambiguous, step-by-step instructions for solving a problem within a finite amount of time. It takes a set of input values and produces a set of output values.
Key Characteristics of an Algorithm:
- Finiteness: Must always terminate after a finite number of steps.
- Definiteness: Each step must be precisely defined; actions to be carried out must be rigorously and unambiguously specified for each case.
- Input: Has zero or more valid inputs provided before the algorithm begins.
- Output: Has one or more outputs that have a specified relation to the inputs.
- Effectiveness: Operations must be basic enough to be done exactly and in a finite length of time by someone using pencil and paper.
2. Fundamentals of Algorithmic Problem Solving
Designing an algorithm involves a sequence of well-defined steps:
- Understanding the Problem: Read the problem description carefully, identify the inputs and desired outputs, and recognize any constraints.
- Ascertaining the Capabilities of the Computational Device: Decide if the problem requires sequential execution (Von Neumann architecture) or parallel computing.
- Choosing between Exact and Approximate Problem Solving: Determine if an exact solution is required or if an approximation is acceptable (especially for NP-hard problems).
- Deciding on Appropriate Data Structures: Select the most efficient way to store and organize the data (e.g., arrays, graphs, trees).
- Algorithm Design Techniques: Apply a general approach to solving the problem (e.g., divide-and-conquer, dynamic programming).
- Methods of Specifying an Algorithm: Write the algorithm using pseudocode or a flowchart.
- Proving an Algorithm's Correctness: Use mathematical induction or loop invariants to prove the algorithm yields the correct output for all legitimate inputs.
- Analyzing an Algorithm: Evaluate time and space complexity.
- Coding an Algorithm: Translate the algorithm into a specific programming language.
3. Basic Algorithm Design Techniques
Algorithm design techniques (or strategies) are general approaches to solving problems algorithmically.
- Brute Force: A straightforward approach based directly on the problem statement.
- Divide and Conquer: Divides the problem into smaller subproblems, solves them recursively, and combines their solutions.
- Decrease and Conquer: Reduces a problem to a smaller instance of the same problem, solves it, and extends the solution.
- Transform and Conquer: Modifies the problem into a more easily solvable one (e.g., sorting data before searching).
- Greedy Approach: Makes a locally optimal choice at each step with the hope of finding a global optimum.
- Dynamic Programming: Breaks a problem into overlapping subproblems and stores the results to avoid redundant calculations.
- Backtracking & Branch and Bound: Systematically searches for a solution by exploring all possibilities and pruning unpromising paths.
4. Analyzing Algorithm
Once an algorithm is designed, it must be analyzed based on several criteria:
- Correctness: Does it do what it is supposed to do?
- Time Efficiency: How fast does the algorithm run?
- Space Efficiency: How much extra memory does the algorithm require?
- Simplicity: Is the algorithm easy to understand and implement?
- Generality: Can the algorithm handle a wide range of inputs or be adapted for similar problems?
5. Fundamental Data Structures
Linear Data Structures
Data elements are arranged in a sequential manner.
- Arrays: A collection of elements identified by index or key, stored in contiguous memory locations. Allows access time but fixed size.
- Linked Lists: A sequence of nodes where each node contains data and a reference (link) to the next node. Allows efficient insertions/deletions but requires sequential access.
- Stacks: A Last-In-First-Out (LIFO) structure. Operations include
push(insert) andpop(remove). - Queues: A First-In-First-Out (FIFO) structure. Operations include
enqueue(insert) anddequeue(remove).
Graphs and Trees
Non-linear data structures used to represent complex relationships.
- Graphs: Consist of a set of vertices (nodes) and a set of edges connecting them. Can be directed or undirected, weighted or unweighted. Represented using Adjacency Matrices or Adjacency Lists.
- Trees: A connected, acyclic graph.
- Rooted Tree: One vertex is designated as the root.
- Binary Tree: Each node has at most two children (left and right).
- Binary Search Tree (BST): A binary tree where the left child's value is less than the parent's, and the right child's is greater.
6. Fundamentals of the Analysis of Algorithm Efficiency
Measuring of Input Size
The efficiency of an algorithm is analyzed as a function of the input size, denoted as .
- Examples: For sorting, is the number of elements in the array. For matrix multiplication, is the dimension of the matrices. For graphs, size is often measured by the number of vertices and edges .
Units for Measuring Running Time
Instead of measuring physical time (which depends on hardware, compiler, etc.), we count the number of times the basic operation is executed.
- Basic Operation: The operation that contributes most heavily to the running time, usually the most time-consuming operation in the innermost loop.
- Let be the execution time of the basic operation, and be the number of times it is executed. The estimated running time is .
Order of Growth
We are primarily interested in how the running time grows as the input size increases toward infinity. Small constant factors and lower-order terms are ignored.
- Common growth rates:
Worst-Case, Best-Case, and Average-Case Efficiencies
- Worst-Case Efficiency: The maximum number of basic operations executed for any input of size . Guarantees the algorithm will not run longer than this.
- Best-Case Efficiency: The minimum number of basic operations executed for any input of size . Usually occurs under ideal conditions (e.g., sorting an already sorted array).
- Average-Case Efficiency: The expected number of basic operations executed for a typical, randomly generated input of size . Requires probabilistic assumptions about the input distribution.
7. Asymptotic Notations and Basic Efficiency Classes
Asymptotic notations are formal mathematical tools used to describe the order of growth of algorithms.
O (Big-Oh) Notation: Upper Bound
if there exist positive constants and such that:
Meaning: grows no faster than . It provides a worst-case scenario.
(Big-Omega) Notation: Lower Bound
if there exist positive constants and such that:
Meaning: grows at least as fast as . It provides a best-case scenario.
(Big-Theta) Notation: Tight Bound
if there exist positive constants and such that:
Meaning: grows at the same rate as .
Useful Properties Involving the Asymptotic Notations
- Theorem: If and , then:
(This property also applies to and ). - Multiplication: If and , then .
- Transitivity: If and , then .
Using Limits for Comparing Orders of Growth
Limits provide a rigorous mathematical way to compare the asymptotic growth of two functions and .
We evaluate:
Based on the value of , we can conclude:
- If : has a strictly smaller order of growth than . ( but ).
- If : has the same order of growth as . ().
- If : has a strictly greater order of growth than . ( but ).
L'Hôpital's Rule and Stirling's Formula are frequently used to compute these limits for complex functions involving exponentials and factorials.