Unit 1 - Notes
CSE205
Unit 1: Introduction, Arrays, Sorting and Searching
1. Basic Concepts and Notations
1.1 Data Structure Defined
A data structure is a specialized format for organizing, processing, retrieving, and storing data. It is a mathematical or logical model of a particular organization of data.
- Choice of Data Structure: The choice depends on the underlying operation. For example, looking up a specific item (searching) versus organizing items (sorting).
1.2 Algorithm Defined
An algorithm is a finite set of instructions that, if followed, performs a particular task.
Characteristics:
- Input: Zero or more inputs.
- Output: At least one output.
- Definiteness: Instructions must be clear and unambiguous.
- Finiteness: The algorithm must terminate after a finite number of steps.
- Effectiveness: Operations must be feasible.
2. Complexity Analysis: Time, Space, and Trade-off
To evaluate an algorithm's efficiency, we analyze its consumption of resources, primarily Time and Space.
2.1 Time Complexity
Time complexity is the amount of computer time it takes to run an algorithm. It is estimated by counting the number of elementary operations performed by the algorithm, assuming that each operation takes a fixed amount of time.
- It is expressed as a function of the input size , denoted as .
2.2 Space Complexity
Space complexity is the amount of memory space required by the algorithm in its life cycle.
- : Fixed part (instruction space, simple variables, constants).
- : Variable part (dependent on instance characteristics, e.g., recursion stack, dynamic memory).
2.3 Time-Space Trade-off
The Space-Time trade-off states that an algorithm can run faster (less time) if it uses more memory (more space), and vice-versa.
- Example: Storing a lookup table (Hash Map) consumes high memory but provides access time, whereas recalculating values on the fly saves memory but consumes CPU time.
3. Asymptotic Notations
Asymptotic notations are mathematical tools to represent the time complexity of algorithms for asymptotic analysis (how the algorithm performs as input size grows towards infinity).
3.1 Big O Notation () - The Upper Bound
Represents the Worst Case scenario. It defines the upper bound of an algorithm's running time.
- Definition: if there exist positive constants and such that for all .
- Meaning: The algorithm will take at most time.
3.2 Omega Notation () - The Lower Bound
Represents the Best Case scenario. It defines the lower bound.
- Definition: if there exist positive constants and such that for all .
- Meaning: The algorithm will take at least time.
3.3 Theta Notation () - The Tight Bound
Represents the Average Case (or exact bound). It sandwiches the function between upper and lower bounds.
- Definition: if there exist positive constants and such that for all .
- Meaning: The running time grows exactly at the same rate as .
4. Basic Data Structures
Data structures are generally categorized into two types:
4.1 Primitive Data Structures
Basic structures directly operated upon by machine instructions (e.g., Integers, Floating-point numbers, Characters, Pointers).
4.2 Non-Primitive Data Structures
Derived from primitive data structures.
- Linear Data Structures: Elements are arranged in a sequence.
- Arrays: Fixed-size sequential collection.
- Linked Lists: Dynamic sequential collection using pointers.
- Stacks: LIFO (Last In First Out).
- Queues: FIFO (First In First Out).
- Non-Linear Data Structures: Elements are arranged hierarchically or interconnectedly.
- Trees: Hierarchical relationship (Parent-Child).
- Graphs: Network relationship (Nodes and Edges).
5. Linear Arrays
5.1 Definition
An array is a collection of homogeneous (same type) data elements stored in contiguous memory locations.
- Index: Elements are accessed via an index (subscript).
5.2 Memory Representation of Linear Arrays
In memory, a linear array is stored in consecutive memory cells. The computer needs the address of the first element (Base Address) to calculate the address of any element.
Formula for 1D Array Address Calculation:
Where:
- : Address of the element at index .
- : Base address (address of the first element).
- : Word size (size of each element in bytes).
- : Index of the element to find.
- : Lower Bound (index of the first element, usually 0 or 1).
Example:
If Base Address = 2000, = 4 bytes, = 0. Find address of .
6. Array Operations and Complexity Analysis
6.1 Traversal
Visiting every element in the array exactly once (e.g., to print or sum).
- Algorithm: Iterate from to (Upper Bound).
- Time Complexity:
6.2 Insertion
Adding a new element at a specific position.
- Logic: To insert at index , all elements from to must be shifted one position to the right to create space.
- Time Complexity:
- Best Case: Insert at end ().
- Worst Case: Insert at beginning ().
- Average: .
6.3 Deletion
Removing an element from a specific position.
- Logic: To delete at index , all elements from to must be shifted one position to the left to fill the gap.
- Time Complexity: (due to shifting).
6.4 Merging
Combining two sorted arrays into a single sorted array.
- Logic: Compare elements from both arrays (A and B). Place the smaller element into the new array (C) and increment the pointer for that specific array. Repeat until one array is empty, then copy remaining elements.
- Time Complexity: , where and are sizes of the two arrays.
7. Searching Algorithms
7.1 Linear Search (Sequential Search)
Checks every element in the list sequentially until the desired element is found or the list ends.
- Algorithm Logic:
- Start from index 0.
- Compare
targetwitharray[i]. - If match found, return
i. - If end reached without match, return -1.
- Requirement: Array does not need to be sorted.
- Complexity:
- Best Case: (Element at first position).
- Worst Case: (Element at last position or not present).
7.2 Binary Search
A divide-and-conquer algorithm that repeatedly divides the search interval in half.
- Requirement: The array MUST be sorted.
- Algorithm Logic:
- Set
low = 0,high = n-1. - While
low <= high:- Calculate
mid = (low + high) / 2. - If
array[mid] == target, returnmid. - If
array[mid] < target, ignore left half (low = mid + 1). - If
array[mid] > target, ignore right half (high = mid - 1).
- Calculate
- Return -1 (Not found).
- Set
- Complexity:
- Best Case: (Element at mid).
- Worst/Average Case: .
8. Sorting Algorithms
Sorting is the process of arranging data in a specific order (ascending or descending).
8.1 Bubble Sort
Repeatedly swaps adjacent elements if they are in the wrong order. The largest element "bubbles up" to the correct position in each pass.
- Steps:
- Compare and . If , swap.
- Compare and , swap if needed.
- Repeat until end of array. This places the largest number at the end.
- Repeat the pass for the remaining unsorted part (, , ...).
- Complexity:
- Worst/Average: (Nested loops).
- Best: (If optimized with a 'swapped' flag and array is already sorted).
- Space: (In-place).
8.2 Selection Sort
Divides the array into a sorted part (left) and unsorted part (right). It repeatedly finds the minimum element from the unsorted part and puts it at the beginning.
- Steps:
- Assume element at index is the minimum.
- Scan from to . If a smaller value is found, update minimum index.
- Swap the found minimum with element at index .
- Increment .
- Complexity:
- All Cases: (It always scans the remaining list).
- Space: .
8.3 Insertion Sort
Builds the final sorted array one item at a time. It picks an element and places it into its correct position within the already sorted sub-list.
- Steps:
- Start from the second element (index 1).
- Compare it with elements before it.
- Shift elements greater than the key to the right.
- Insert the key into the correct empty position.
- Complexity:
- Worst/Average: .
- Best: (When the array is already sorted).
- Usage: Very efficient for small datasets or nearly sorted arrays.
8.4 Complexity Comparison Summary
| Algorithm | Best Case Time | Average Case Time | Worst Case Time | Space Complexity |
|---|---|---|---|---|
| Linear Search | ||||
| Binary Search | ||||
| Bubble Sort | ||||
| Insertion Sort | ||||
| Selection Sort |