Unit5 - Subjective Questions
CSE211 • Practice Questions with Detailed Answers
Explain the concept of Memory Hierarchy. Illustrate the hierarchy with a diagram and discuss the relationship between size, speed, and cost.
Memory Hierarchy is a structural organization of memory in a computer system that arranges memory types based on their speed, cost, and capacity. The goal is to provide a memory system with the speed of the fastest memory and the capacity of the cheapest memory.
Levels of Memory Hierarchy:
- Registers: Located inside the CPU. Fastest, smallest, and most expensive.
- Cache Memory: SRAM used to bridge the speed gap between CPU and RAM. Levels include L1, L2, and L3.
- Main Memory (RAM): DRAM that holds currently executing programs and data.
- Auxiliary Memory (Secondary Storage): Magnetic disks (HDD), SSDs. High capacity, non-volatile, slower, and cheaper.
- Backup Storage: Magnetic tapes, Optical disks.
Key Relationships:
- Speed: Increases from bottom to top (Registers are fastest).
- Cost per bit: Increases from bottom to top.
- Capacity: Increases from top to bottom (Auxiliary memory is largest).
- Access Frequency: The CPU accesses the top levels more frequently due to the Locality of Reference.
What is Locality of Reference? Differentiate between Temporal Locality and Spatial Locality.
Locality of Reference is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. This principle forms the basis for the effectiveness of Cache memory.
Types of Locality:
-
Temporal Locality:
- Concept: If a particular memory location is referenced, it is likely to be referenced again in the near future.
- Example: Loops in code, counters, subroutines.
- Cache Implication: Keep recently accessed data in the cache.
-
Spatial Locality:
- Concept: If a particular memory location is referenced, it is likely that nearby memory locations will be referenced soon.
- Example: Sequential code execution, traversing arrays or matrices.
- Cache Implication: Fetch data in blocks (cache lines) rather than single words.
Describe the Direct Mapping technique used in Cache memory. What are its advantages and disadvantages?
Direct Mapping is the simplest cache mapping technique where each block of main memory maps to only one specific line in the cache.
Mechanism:
- The physical address is divided into three fields: Tag, Line (Index), and Word (Offset).
- The formula used is: , where is the cache line number, is the main memory block number, and is the total number of lines in the cache.
Advantages:
- Simplicity: Easiest to implement in hardware.
- Cost: Less expensive because it requires simpler logic for tag matching.
- Access Time: Fast address translation.
Disadvantages:
- Conflict Misses: Since a memory block maps to a fixed cache line, if the program repeatedly accesses two different blocks that map to the same line, the data will be constantly swapped (thrashing), reducing performance.
- Fixed Location: Poor flexibility compared to associative mapping.
Explain Set-Associative Mapping. How does it improve upon Direct and Associative mapping?
Set-Associative Mapping is a hybrid approach that combines the simplicity of direct mapping with the flexibility of associative mapping.
Mechanism:
- The cache is divided into sets, and each set contains lines (blocks). This is called a -way set-associative cache.
- Formula: , where is the main memory block number.
- A block from main memory can be placed in any of the lines within the specific set determined by the index.
Address Format:
- Tag: Used to identify the block within the set.
- Set Index: Used to select the specific set.
- Word Offset: Used to identify the word within the block.
Comparison:
- Vs. Direct: It reduces the conflict miss ratio because a block can occupy any line within a set, not just a single specific line.
- Vs. Associative: It is cheaper and faster to implement than fully associative mapping because the tag search is limited to the lines within a set, rather than the entire cache.
Discuss the methods of Writing into Cache. Differentiate between Write-through and Write-back policies.
When the CPU executes a store instruction, data must be written to memory. This creates a consistency issue between Cache and Main Memory.
1. Write-through Policy:
- Mechanism: Data is written to both the Cache and the Main Memory simultaneously.
- Advantages: Ensures Main Memory always contains valid data; simplifies recovery on power failure.
- Disadvantages: Generates high memory traffic, causing a bottleneck as the CPU must wait for the slower main memory write to complete.
2. Write-back (Copy-back) Policy:
- Mechanism: Data is written only to the Cache. The location in the cache is marked with a Dirty Bit (or Update Bit).
- Update: The data is written to Main Memory only when the cache line is evicted (replaced).
- Advantages: Reduces memory traffic significantly; faster write operations.
- Disadvantages: Main Memory may contain stale data; complex to implement (requires dirty bit logic); potential data loss if cache fails before write-back.
Compare Main Memory and Auxiliary Memory based on volatility, access method, and role in the computer system.
Comparison between Main Memory and Auxiliary Memory:
| Feature | Main Memory (Primary) | Auxiliary Memory (Secondary) |
|---|---|---|
| Volatility | Volatile (RAM loses data when power is off). | Non-volatile (Data persists without power). |
| Access Method | Random Access (Direct access to any address). | Sequential or Direct/Sector Access (slower). |
| Role | Holds instructions and data currently being executed by the CPU. | Used for long-term storage of programs, files, and large data sets. |
| Speed | Very Fast (Nanoseconds). | Slow (Milliseconds). |
| Cost | Expensive per bit. | Cheap per bit. |
| Examples | DRAM, SRAM. | Magnetic Disks, SSD, Tapes, Optical Disks. |
| CPU Interaction | CPU communicates directly. | CPU accesses via I/O processors; data must move to RAM first. |
What is Virtual Memory? Explain the concept of Paging and how Logical Addresses are mapped to Physical Addresses.
Virtual Memory is a memory management capability of an OS that creates the illusion of a memory space larger than the actual physical RAM. It allows programs to run even if they are larger than the main memory by using the hard disk as an extension.
Paging:
- The logical address space (program) is divided into fixed-size blocks called Pages.
- The physical memory is divided into fixed-size blocks called Frames of the same size.
- Pages are loaded into frames non-contiguously.
Address Mapping:
- Logical Address: generated by CPU, divided into Page Number (p) and Offset (d).
- Page Table: A data structure stored in memory that maps Page Number to Frame Number.
- Translation:
- The system looks up in the Page Table.
- It retrieves the corresponding Frame Number ().
- The Physical Address is formed by concatenating and ().
What is a TLB (Translation Lookaside Buffer) and why is it essential in Virtual Memory systems?
A Translation Lookaside Buffer (TLB) is a small, specialized, high-speed hardware cache located inside the CPU that stores recent translations of virtual memory to physical memory.
The Problem:
In a standard paging system, every data access requires two memory accesses:
- Access the Page Table (in RAM) to get the frame number.
- Access the actual data (in RAM).
This doubles the memory access time, reducing performance.
The Solution (TLB):
- The TLB stores the most recently used Page-to-Frame mappings (associative memory).
- When the CPU generates a virtual address, the TLB is checked first.
- TLB Hit: If the page number is found, the frame number is retrieved immediately (no RAM access for table).
- TLB Miss: If not found, the Page Table in RAM is accessed, and the new mapping is loaded into the TLB for future use.
- Result: Drastically reduces the effective memory access time.
Explain the concept of Page Fault. Briefly describe the FIFO and LRU page replacement algorithms.
Page Fault: A page fault occurs when a program attempts to access a page that is mapped in the virtual address space but is not currently loaded in the physical main memory (RAM). The OS must interrupt execution, retrieve the page from the disk (swap space), and load it into a free frame.
Page Replacement Algorithms:
When a page fault occurs and memory is full, a page must be removed to make space.
-
FIFO (First-In, First-Out):
- Logic: Replaces the oldest page that has been in memory the longest.
- Pros: Simple to implement using a queue.
- Cons: Does not consider usage; might remove a heavily used variable initialized early in the program (Belady's Anomaly).
-
LRU (Least Recently Used):
- Logic: Replaces the page that has not been used for the longest period of time.
- Pros: Better performance approximation of optimal replacement; handles locality of reference well.
- Cons: Harder to implement; requires tracking timestamps or stack history for every access.
Define Parallel Processing. Explain Flynn's Taxonomy for classifying computer architectures.
Parallel Processing is a method of computation in which many instructions or data are processed simultaneously to reduce the total execution time. It involves utilizing multiple processors or functional units concurrently.
Flynn's Taxonomy:
Classifies architectures based on the number of Instruction Streams and Data Streams.
-
SISD (Single Instruction, Single Data):
- Standard Von Neumann architecture (Uniprocessor).
- Sequential execution.
-
SIMD (Single Instruction, Multiple Data):
- One control unit issues one instruction to multiple processing elements.
- Each processor executes the same instruction on different data items.
- Application: Vector processors, GPU, Array processors.
-
MISD (Multiple Instruction, Single Data):
- Multiple instructions operate on a single data stream.
- Application: Rare; sometimes used in fault tolerance (redundancy) or specialized pipelining.
-
MIMD (Multiple Instruction, Multiple Data):
- Multiple processors execute different instructions on different data sets.
- Application: Modern multi-core processors, Supercomputers, Distributed systems.
Explain the concept of Pipelining. Describe a 4-segment instruction pipeline.
Pipelining is an implementation technique where multiple instructions are overlapped in execution. It is analogous to an assembly line. The computer pipeline is divided into stages, where each stage completes a part of an instruction.
4-Segment Instruction Pipeline:
A typical instruction execution is decomposed into four stages:
-
IF (Instruction Fetch):
- Fetch the instruction from memory.
- Increment the Program Counter (PC).
-
ID (Instruction Decode):
- Decode the instruction to determine the operation.
- Fetch operands from registers.
-
EX (Execute):
- Perform the ALU operation (arithmetic/logic) or calculate the effective address for load/store.
-
WB (Write Back):
- Write the result back into the register file or memory.
Benefit: Ideally, one instruction completes every clock cycle (CPI = 1), increasing throughput significantly compared to non-pipelined execution.
Derive the Speedup formula for a -segment pipeline executing tasks compared to a non-pipelined system. What is the maximum theoretical speedup?
Let:
- = number of segments (stages) in the pipeline.
- = time duration of one clock cycle (assumed equal for all stages).
- = number of tasks (instructions) to be executed.
1. Non-Pipelined Execution Time ():
Each task takes time (since all stages must finish for one task before the next starts).
2. Pipelined Execution Time ():
- The first task takes cycles to fill the pipeline.
- The remaining tasks complete, one per cycle.
- Total cycles = .
3. Speedup Ratio ():
Maximum Theoretical Speedup:
When is very large (), the term becomes negligible compared to .
Therefore, the maximum theoretical speedup is equal to the number of pipeline stages ().
What are Pipeline Hazards? Explain Data Hazards, Structural Hazards, and Control Hazards.
Pipeline Hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle.
-
Structural Hazards:
- Cause: Hardware resource conflicts. Occurs when two instructions try to use the same hardware resource at the same time (e.g., fetching an instruction and accessing data from memory simultaneously if only one bus exists).
- Solution: Separate Instruction and Data caches (Harvard Architecture).
-
Data Hazards:
- Cause: Data dependencies. An instruction depends on the result of a previous instruction that has not yet completed (e.g.,
ADD R1, R2, R3followed immediately bySUB R4, R1, R5). - Solution: Operand Forwarding (Bypassing) or Pipeline Stalls (Bubbles).
- Cause: Data dependencies. An instruction depends on the result of a previous instruction that has not yet completed (e.g.,
-
Control (Branch) Hazards:
- Cause: Changes in program flow (Branches/Jumps). The pipeline fetches instructions sequentially, but a branch might change the PC, rendering fetched instructions useless.
- Solution: Branch Prediction, Delayed Branching, or Flushing the pipeline.
List and explain the key characteristics of Multiprocessors.
A Multiprocessor system is an interconnection of two or more CPUs that share memory and I/O equipment. Key characteristics include:
- MIMD Architecture: They generally follow the Multiple Instruction, Multiple Data model where processors execute different programs independently.
- Shared Memory vs. Distributed Memory:
- Processors may share a common main memory (Tightly Coupled).
- Or each processor may have its own local memory (Loosely Coupled/Distributed).
- Interconnection Structure: The mechanism used to connect processors and memory (e.g., Common Bus, Crossbar Switch, Hypercube).
- Parallelism: capable of executing multiple processes (Process Level Parallelism) or threads simultaneously.
- Reliability & Fault Tolerance: Failure of one processor does not halt the system; the workload is redistributed (graceful degradation).
- OS Support: Requires sophisticated Operating Systems to handle scheduling, resource allocation, and synchronization between processors.
Distinguish between Tightly Coupled and Loosely Coupled multiprocessor systems.
Tightly Coupled Multiprocessors (Shared Memory):
- Memory: All processors share a global main memory.
- Communication: Processors communicate via shared variables in the memory.
- Speed: High-speed data exchange.
- Latency: Low latency but suffers from memory contention/conflicts.
- Interconnection: usually via Bus or Crossbar switch.
Loosely Coupled Multiprocessors (Distributed Memory):
- Memory: Each processor has its own local memory.
- Communication: Processors communicate via Message Passing (sending data packets over a network).
- Speed: Slower communication overhead due to message protocols.
- Scalability: Highly scalable (e.g., Clusters, Supercomputers).
- Interconnection: LAN, WAN, or specialized links.
Explain the Time-Shared Common Bus interconnection structure for multiprocessors. What are its limitations?
Time-Shared Common Bus is the simplest interconnection structure where a number of processors, memory modules, and I/O units are connected to a single common path (bus).
Mechanism:
- Only one processor can communicate with the memory or another processor at any given time.
- A Bus Arbitration logic is required to resolve conflicts when multiple processors request the bus simultaneously.
Advantages:
- Simple hardware design.
- Low cost.
- Easy to add new devices (extensibility).
Limitations:
- Bottleneck: The single bus becomes a bottleneck as the number of processors increases.
- Bandwidth: Total system bandwidth is limited by the bus speed.
- Reliability: If the bus fails, the entire system fails.
- Suitable only for a small number of processors.
Describe the Crossbar Switch and Multiport Memory interconnection structures.
1. Multiport Memory:
- Structure: The memory module itself has multiple ports (access points). Each port is connected to a specific processor.
- Operation: Internal logic resolves conflicts if two processors access the same memory module simultaneously.
- Pros: High transfer rate.
- Cons: Expensive memory control logic; large number of cables.
2. Crossbar Switch:
- Structure: A matrix of switches connecting processors to memory modules. There is a switch point at each intersection.
- Operation: Allows simultaneous connections between different processor-memory pairs. (e.g., P1 can talk to M2 while P2 talks to M4).
- Non-Blocking: It is a non-blocking network (unless the destination is the same).
- Pros: Highest potential bandwidth; simultaneous transfers.
- Cons: Hardware complexity increases squarely ( switches); high cost for large systems.
What is a Multistage Switching Network? Explain the Omega Network structure.
Multistage Switching Network:
Ideally, a crossbar is best, but too expensive for large . A multistage network connects inputs to outputs using fewer switches arranged in stages. It balances cost and performance.
Omega Network:
- Components: Uses interchange switches. Each switch has two inputs and two outputs and can be set to 'Straight' or 'Swap' (Cross).
- Topology: For processors, there are stages.
- Connection: Stages are connected using a 'Perfect Shuffle' pattern.
- Blocking: It is a Blocking Network, meaning that establishing a path from Input A to Output B might prevent a simultaneous connection from Input C to Output D due to a shared path/switch conflict.
- Cost: Lower complexity () compared to Crossbar ().
Explain the Cache Coherence problem in multiprocessor systems and name the solutions.
Cache Coherence Problem:
In a multiprocessor system with shared memory, each processor has its own private cache. If Processor A reads variable into its cache and modifies it, but Processor B also has a copy of in its cache, Processor B's copy becomes invalid (stale). If B reads later, it might read the old value, leading to data inconsistency.
Solutions:
- Software Solutions: The OS or Compiler marks shared data as non-cacheable or explicitly flushes caches at critical points.
- Hardware Solutions (Protocols):
- Snooping Protocol (Bus-based): All caches monitor (snoop) the bus. If a 'write' signal is detected for a block they hold, they either invalidate their copy or update it.
- Directory-based Protocol: A central directory maintains the state of every memory block (who has it, is it modified?). Used in systems without a common bus (e.g., distributed).
Describe the Hypercube Interconnection structure for parallel processing.
Hypercube (Binary n-cube) Interconnection:
It is a loosely coupled network topology used to connect processors.
Structure:
- Processors are placed at the vertices of an -dimensional cube.
- Nodes: Each node contains a CPU and local memory.
- Addressing: Each node is assigned a binary address of bits.
- Links: Two nodes are connected if their binary addresses differ by exactly one bit position.
Properties:
- Diameter: The maximum distance between any two nodes is hops.
- Degree: Each node is connected to other nodes.
- Scalability: Good, but wiring complexity increases with dimension.
- Example: A 3-cube has 8 nodes (), and each node connects to 3 neighbors.