Unit 5 - Notes
Unit 5: Data Structures
1. Introduction to Data Structures
Definition
A data structure is a specialized format for organizing, processing, retrieving, and storing data. It defines the relationship between the data, the operations that can be performed on the data, and the algorithms used to access it.
Classification
Data structures are generally classified into two main categories:
- Linear Data Structures: Elements are arranged in a sequential manner, where each element is attached to its previous and next adjacent elements.
- Examples: Arrays, Linked Lists, Stacks, Queues.
- Non-Linear Data Structures: Elements are arranged hierarchically or interconnectedly without a specific sequence. One element can be connected to other elements.
- Examples: Trees, Graphs.
Abstract Data Types (ADT)
An ADT describes the behavior of a data structure from the perspective of a user of the data, specifically:
- Domain: The set of values the data can hold.
- Operations: The set of operations available (interface).
- Implementation Independent: It does not specify how the data is stored or how operations are implemented mathematically.
2. Linked Lists
A Linked List is a linear data structure where elements (nodes) are not stored at contiguous memory locations. The elements are linked using pointers.
Structure of a Node
Each node typically consists of two parts:
- Data: The value or information.
- Pointer/Reference: The address of the next node.
Types of Linked Lists
1. Singly Linked List
- Structure: Navigation is forward only. The last node points to
NULL. - Node Representation:
[Data | Next ->] - Use case: Simple implementations of stacks/queues.
2. Doubly Linked List (DLL)
- Structure: Each node contains a pointer to the previous node and the next node.
- Node Representation:
[<- Prev | Data | Next ->] - Advantage: Can be traversed in both directions.
- Disadvantage: Higher memory overhead (extra pointer).
3. Circular Linked List
- Structure: The last node points back to the first node instead of
NULL. Can be singly or doubly linked. - Use case: Round-robin scheduling algorithms.
Key Operations & Complexity
| Operation | Singly Linked List | Doubly Linked List |
|---|---|---|
| Access/Search | ||
| Insertion (at head) | ||
| Insertion (at end) | (or with tail pointer) | (with tail pointer) |
| Deletion (given node) | (to find prev node) |
3. Stacks and Queues
A. Stacks
A linear data structure that follows the LIFO (Last In First Out) principle.
Core Operations
- Push: Add an element to the top. ()
- Pop: Remove the top element. ()
- Peek/Top: View the top element without removing it. ()
- isEmpty: Check if the stack is empty.
Applications
- Function Call Management: Handling recursion via the system call stack.
- Expression Parsing: Converting Infix to Postfix/Prefix notation.
- Syntax Parsing: Checking for balanced parentheses
(( ))in compilers. - Undo Mechanisms: In text editors.
B. Queues
A linear data structure that follows the FIFO (First In First Out) principle.
Core Operations
- Enqueue: Add an element to the rear. ()
- Dequeue: Remove an element from the front. ()
- Front/Rear: View the first/last items. ()
Applications
- CPU Scheduling: Processes waiting for execution.
- IO Buffers: Keyboard keystrokes or printer spooling.
- Breadth-First Search (BFS): Graph traversal algorithm.
Types of Queues
- Simple Queue: Standard FIFO.
- Circular Queue: The last position connects back to the first to utilize empty space created by dequeuing.
- Priority Queue: Elements are dequeued based on priority, not arrival time.
- Deque (Double-ended Queue): Insertions and deletions allowed at both ends.
4. Trees
A non-linear hierarchical data structure consisting of a set of nodes connected by edges.
Terminology
- Root: The top node.
- Child: A node directly connected to another node when moving away from the root.
- Leaf: A node with no children.
- Height: The length of the longest path from a node to a leaf.
- Depth: The length of the path from the root to the node.
A. Binary Trees
A tree in which each node has at most two children (Left child and Right child).
Combinatorial Properties:
- Max nodes at level : .
- Max nodes in a binary tree of height : .
- Min height for nodes: .
Traversals:
- Pre-order (Root, Left, Right): Used to copy trees.
- In-order (Left, Root, Right): Gives sorted output for BSTs.
- Post-order (Left, Right, Root): Used to delete trees or evaluate postfix expressions.
B. Binary Search Tree (BST)
A binary tree with the property:
- Value of Left Subtree Nodes < Value of Root
- Value of Right Subtree Nodes > Value of Root
- Search/Insert/Delete Complexity:
- Average:
- Worst Case (Skewed Tree):
C. Balanced Trees
To prevent BSTs from becoming skewed (resembling linked lists), self-balancing trees are used to keep height .
AVL Trees (Adelson-Velsky and Landis)
- Strictly balanced.
- Balance Factor: Height(Left Subtree) - Height(Right Subtree). Must be .
- Rebalancing: Uses Rotations (Left, Right, Left-Right, Right-Left) when insertions violate the balance factor.
5. Graphs
A Graph consists of a set of vertices and a set of edges .
Types of Graphs
- Directed (Digraph): Edges have direction ().
- Undirected: Edges have no direction ().
- Weighted: Edges carry a cost/weight.
- Cyclic/Acyclic: Contains a cycle / No cycles.
Graph Representations
- Adjacency Matrix:
- A 2D array of size .
- (or weight) if edge exists, $0$ otherwise.
- Space: (Good for dense graphs).
- Adjacency List:
- An array of Linked Lists/Vectors. Index contains a list of vertices connected to vertex .
- Space: (Good for sparse graphs).
Graph Algorithms
1. Traversal
- Breadth-First Search (BFS):
- Uses a Queue.
- Explores neighbor nodes first before moving to next level neighbors.
- Good for finding shortest path in unweighted graphs.
- Depth-First Search (DFS):
- Uses a Stack (or Recursion).
- Explores as far as possible along each branch before backtracking.
- Good for cycle detection and topological sorting.
2. Minimum Spanning Tree (MST)
Used in weighted undirected graphs to connect all vertices with minimum total edge weight, without cycles.
- Prim's Algorithm: Grows a tree from a starting vertex (Greedy).
- Kruskal's Algorithm: Adds edges from lowest weight to highest, skipping those that form cycles (Greedy + Union-Find).
3. Shortest Path
- Dijkstra’s Algorithm: Finds shortest path from source to all nodes (Non-negative weights only).
- Bellman-Ford: Can handle negative weights.
6. Hashing
Hashing is a technique to convert a range of key values into a range of indexes of an array (Hash Table).
Hash Functions
A function that maps a key to an integer index.
- Properties of a good hash function:
- Fast to compute.
- Minimizes collisions.
- Uniformly distributes keys across the table.
- Common Methods:
- Division Method: (where is table size, usually a prime number).
- Multiplication Method: Multiply by a constant , take fractional part, multiply by .
Collision Resolution
A collision occurs when .
- Chaining (Open Hashing):
- Each bucket in the hash table points to a linked list. All keys mapping to the same index are stored in that list.
- Open Addressing (Closed Hashing): All elements are stored within the hash table array.
- Linear Probing: If slot is full, check next slot ().
- Quadratic Probing: Check
- Double Hashing: Use a second hash function to determine step size.
Applications
- Symbol Tables: In compilers to store variable names and scopes.
- Database Indexing: Fast data retrieval.
- Caches: Browser caches or server caches.
- Sets/Maps: Implementing Set or Dictionary data structures in standard libraries (e.g., Python
dict, JavaHashMap).
7. Heaps
A Heap is a special tree-based data structure that satisfies the Heap Property. It is almost always implemented as a Complete Binary Tree using an array.
Types of Heaps
- Max-Heap: The key at the root is the keys of its children. The largest element is at the root.
- Min-Heap: The key at the root is the keys of its children. The smallest element is at the root.
Array Representation
For a node at index (0-based):
- Parent:
- Left Child:
- Right Child:
Core Operations
- Insertion ():
- Add element to the end (bottom-left most available spot).
- Heapify Up: Swap with parent until heap property is restored.
- Extraction ():
- Remove the root (Max or Min).
- Move the last element of the heap to the root.
- Heapify Down: Swap with the larger (Max-Heap) or smaller (Min-Heap) child until property is restored.
Applications
- Priority Queues: Efficiently retrieving the highest/lowest priority item.
- Heap Sort: Sorting algorithm with time complexity.
- Graph Algorithms: Used in Prim's (MST) and Dijkstra's (Shortest Path) to efficiently select the next edge/vertex.
- Order Statistics: Finding the -th largest or smallest element in an array.