Unit 5 - Notes

CSE357 8 min read

Unit 5: Data Structures

1. Introduction to Data Structures

Definition

A data structure is a specialized format for organizing, processing, retrieving, and storing data. It defines the relationship between the data, the operations that can be performed on the data, and the algorithms used to access it.

Classification

Data structures are generally classified into two main categories:

Linear Data Structures: Elements are arranged in a sequential manner, where each element is attached to its previous and next adjacent elements.
- Examples: Arrays, Linked Lists, Stacks, Queues.
Non-Linear Data Structures: Elements are arranged hierarchically or interconnectedly without a specific sequence. One element can be connected to other elements.
- Examples: Trees, Graphs.

Abstract Data Types (ADT)

An ADT describes the behavior of a data structure from the perspective of a user of the data, specifically:

Domain: The set of values the data can hold.
Operations: The set of operations available (interface).
Implementation Independent: It does not specify how the data is stored or how operations are implemented mathematically.

2. Linked Lists

A Linked List is a linear data structure where elements (nodes) are not stored at contiguous memory locations. The elements are linked using pointers.

Structure of a Node

Each node typically consists of two parts:

Data: The value or information.
Pointer/Reference: The address of the next node.

Types of Linked Lists

1. Singly Linked List

Structure: Navigation is forward only. The last node points to NULL.
Node Representation: [Data | Next ->]
Use case: Simple implementations of stacks/queues.

2. Doubly Linked List (DLL)

Structure: Each node contains a pointer to the previous node and the next node.
Node Representation: [<- Prev | Data | Next ->]
Advantage: Can be traversed in both directions.
Disadvantage: Higher memory overhead (extra pointer).

3. Circular Linked List

Structure: The last node points back to the first node instead of NULL. Can be singly or doubly linked.
Use case: Round-robin scheduling algorithms.

Key Operations & Complexity

Operation	Singly Linked List	Doubly Linked List
Access/Search	$O(n)$	$O(n)$
Insertion (at head)	$O(1)$	$O(1)$
Insertion (at end)	$O(n)$ (or $O(1)$ with tail pointer)	$O(1)$ (with tail pointer)
Deletion (given node)	$O(n)$ (to find prev node)	$O(1)$

3. Stacks and Queues

A. Stacks

A linear data structure that follows the LIFO (Last In First Out) principle.

Core Operations

Push: Add an element to the top. ( $O(1)$ )
Pop: Remove the top element. ( $O(1)$ )
Peek/Top: View the top element without removing it. ( $O(1)$ )
isEmpty: Check if the stack is empty.

Applications

Function Call Management: Handling recursion via the system call stack.
Expression Parsing: Converting Infix to Postfix/Prefix notation.
Syntax Parsing: Checking for balanced parentheses (( )) in compilers.
Undo Mechanisms: In text editors.

B. Queues

A linear data structure that follows the FIFO (First In First Out) principle.

Core Operations

Enqueue: Add an element to the rear. ( $O(1)$ )
Dequeue: Remove an element from the front. ( $O(1)$ )
Front/Rear: View the first/last items. ( $O(1)$ )

Applications

CPU Scheduling: Processes waiting for execution.
IO Buffers: Keyboard keystrokes or printer spooling.
Breadth-First Search (BFS): Graph traversal algorithm.

Types of Queues

Simple Queue: Standard FIFO.
Circular Queue: The last position connects back to the first to utilize empty space created by dequeuing.
Priority Queue: Elements are dequeued based on priority, not arrival time.
Deque (Double-ended Queue): Insertions and deletions allowed at both ends.

4. Trees

A non-linear hierarchical data structure consisting of a set of nodes connected by edges.

Terminology

Root: The top node.
Child: A node directly connected to another node when moving away from the root.
Leaf: A node with no children.
Height: The length of the longest path from a node to a leaf.
Depth: The length of the path from the root to the node.

A. Binary Trees

A tree in which each node has at most two children (Left child and Right child).

Combinatorial Properties:

Max nodes at level $l$ : $2^l$ .
Max nodes in a binary tree of height $h$ : $2^{h+1} - 1$ .
Min height for $n$ nodes: $\lfloor \log_2 n \rfloor$ .

Traversals:

Pre-order (Root, Left, Right): Used to copy trees.
In-order (Left, Root, Right): Gives sorted output for BSTs.
Post-order (Left, Right, Root): Used to delete trees or evaluate postfix expressions.

B. Binary Search Tree (BST)

A binary tree with the property:

Value of Left Subtree Nodes < Value of Root
Value of Right Subtree Nodes > Value of Root
Search/Insert/Delete Complexity:
- Average: $O(\log n)$
- Worst Case (Skewed Tree): $O(n)$

C. Balanced Trees

To prevent BSTs from becoming skewed (resembling linked lists), self-balancing trees are used to keep height $\approx \log n$ .

AVL Trees (Adelson-Velsky and Landis)

Strictly balanced.
Balance Factor: Height(Left Subtree) - Height(Right Subtree). Must be $-1, 0, \text{or } 1$ .
Rebalancing: Uses Rotations (Left, Right, Left-Right, Right-Left) when insertions violate the balance factor.

5. Graphs

A Graph $G = (V, E)$ consists of a set of vertices $V$ and a set of edges $E$ .

Types of Graphs

Directed (Digraph): Edges have direction ( $A \to B$ ).
Undirected: Edges have no direction ( $A - B$ ).
Weighted: Edges carry a cost/weight.
Cyclic/Acyclic: Contains a cycle / No cycles.

Graph Representations

Adjacency Matrix:
- A 2D array of size $V \times V$ .
- $A[i][j] = 1$ (or weight) if edge exists, $0$ otherwise.
- Space: $O(V^2)$ (Good for dense graphs).
Adjacency List:
- An array of Linked Lists/Vectors. Index $i$ contains a list of vertices connected to vertex $i$ .
- Space: $O(V + E)$ (Good for sparse graphs).

Graph Algorithms

1. Traversal

Breadth-First Search (BFS):
- Uses a Queue.
- Explores neighbor nodes first before moving to next level neighbors.
- Good for finding shortest path in unweighted graphs.
Depth-First Search (DFS):
- Uses a Stack (or Recursion).
- Explores as far as possible along each branch before backtracking.
- Good for cycle detection and topological sorting.

2. Minimum Spanning Tree (MST)

Used in weighted undirected graphs to connect all vertices with minimum total edge weight, without cycles.

Prim's Algorithm: Grows a tree from a starting vertex (Greedy).
Kruskal's Algorithm: Adds edges from lowest weight to highest, skipping those that form cycles (Greedy + Union-Find).

3. Shortest Path

Dijkstra’s Algorithm: Finds shortest path from source to all nodes (Non-negative weights only).
Bellman-Ford: Can handle negative weights.

6. Hashing

Hashing is a technique to convert a range of key values into a range of indexes of an array (Hash Table).

Hash Functions

A function $h(k)$ that maps a key $k$ to an integer index.

Properties of a good hash function:
- Fast to compute.
- Minimizes collisions.
- Uniformly distributes keys across the table.
Common Methods:
- Division Method: $h(k) = k \pmod m$ (where $m$ is table size, usually a prime number).
- Multiplication Method: Multiply $k$ by a constant $A (0 < A < 1)$ , take fractional part, multiply by $m$ .

Collision Resolution

A collision occurs when $h(k_1) = h(k_2)$ .

Chaining (Open Hashing):
- Each bucket in the hash table points to a linked list. All keys mapping to the same index are stored in that list.
Open Addressing (Closed Hashing): All elements are stored within the hash table array.
- Linear Probing: If slot is full, check next slot ( $index + 1$ ).
- Quadratic Probing: Check $index + 1^2, index + 2^2 \dots$
- Double Hashing: Use a second hash function to determine step size.

Applications

Symbol Tables: In compilers to store variable names and scopes.
Database Indexing: Fast data retrieval.
Caches: Browser caches or server caches.
Sets/Maps: Implementing Set or Dictionary data structures in standard libraries (e.g., Python dict, Java HashMap).

7. Heaps

A Heap is a special tree-based data structure that satisfies the Heap Property. It is almost always implemented as a Complete Binary Tree using an array.

Types of Heaps

Max-Heap: The key at the root is $\ge$ the keys of its children. The largest element is at the root.
Min-Heap: The key at the root is $\le$ the keys of its children. The smallest element is at the root.

Array Representation

For a node at index $i$ (0-based):

Parent: $\lfloor (i-1)/2 \rfloor$
Left Child: $2i + 1$
Right Child: $2i + 2$

Core Operations

Insertion ( $O(\log n)$ ):
- Add element to the end (bottom-left most available spot).
- Heapify Up: Swap with parent until heap property is restored.
Extraction ( $O(\log n)$ ):
- Remove the root (Max or Min).
- Move the last element of the heap to the root.
- Heapify Down: Swap with the larger (Max-Heap) or smaller (Min-Heap) child until property is restored.

Applications

Priority Queues: Efficiently retrieving the highest/lowest priority item.
Heap Sort: Sorting algorithm with $O(n \log n)$ time complexity.
Graph Algorithms: Used in Prim's (MST) and Dijkstra's (Shortest Path) to efficiently select the next edge/vertex.
Order Statistics: Finding the $k$ -th largest or smallest element in an array.

Unit 4

Unit 6

Unit 5 - Notes

Table of Contents

Unit 5: Data Structures

1. Introduction to Data Structures

Definition

Classification

Abstract Data Types (ADT)

2. Linked Lists

Structure of a Node

Types of Linked Lists

1. Singly Linked List

2. Doubly Linked List (DLL)

3. Circular Linked List

Key Operations & Complexity

3. Stacks and Queues

A. Stacks

Core Operations

Applications

B. Queues

Core Operations

Applications

Types of Queues

4. Trees

Terminology

A. Binary Trees

B. Binary Search Tree (BST)

C. Balanced Trees

AVL Trees (Adelson-Velsky and Landis)

5. Graphs

Types of Graphs

Graph Representations

Graph Algorithms

1. Traversal

2. Minimum Spanning Tree (MST)

3. Shortest Path

6. Hashing

Hash Functions

Collision Resolution

Applications

7. Heaps

Types of Heaps

Array Representation

Core Operations

Applications