Unit 2 - Practice Quiz

INT312 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 Which of the following components provides the distributed storage in the Hadoop Architecture?

Hadoop Architecture Easy
A. YARN
B. HDFS
C. Hive
D. MapReduce

2 What are the two core components of the original Apache Hadoop framework?

Hadoop Architecture Easy
A. HDFS and MapReduce
B. YARN and Zookeeper
C. Spark and Kafka
D. HBase and Pig

3 Hadoop is primarily optimized for which type of data processing?

Hadoop Architecture Easy
A. Real-time processing
B. Interactive querying
C. Stream processing
D. Batch processing

4 What does HDFS stand for?

Hadoop Storage: HDFS Easy
A. Hadoop Distributed File System
B. Hyper Distributed File Storage
C. High Data File System
D. Hadoop Data Format System

5 What is the default block size in HDFS for Hadoop 2.x and later?

Hadoop Storage: HDFS Easy
A. 64 MB
B. 128 MB
C. 256 MB
D. 512 MB

6 How does HDFS primarily achieve fault tolerance?

Hadoop Storage: HDFS Easy
A. By replicating data blocks across multiple nodes
B. By continuously backing up to the cloud
C. By encrypting all files
D. By using a relational database

7 Which data access model does HDFS follow?

Hadoop Storage: HDFS Easy
A. Write-once, read-many
B. Write-many, read-many
C. Write-once, read-once
D. Write-many, read-once

8 In the MapReduce paradigm, what is the role of the Reduce function?

Hadoop MapReduce paradigm Easy
A. To split the data into smaller chunks
B. To aggregate and summarize intermediate results
C. To store the final data in a relational database
D. To filter and map data to key-value pairs

9 What is the primary data structure passed between the Map and Reduce phases?

Hadoop MapReduce paradigm Easy
A. Arrays
B. XML nodes
C. Key-Value pairs
D. JSON objects

10 Which phase occurs directly between the Map phase and the Reduce phase to group data by keys?

Hadoop MapReduce paradigm Easy
A. File Writing
B. Data Splitting
C. Data Ingestion
D. Shuffle and Sort

11 In MapReduce terminology, what is an 'InputSplit'?

MapReduce Terminology Easy
A. A command to divide the cluster into smaller networks
B. A physical file on the disk
C. A logical representation of data processed by a single Map task
D. An error that splits a job into two

12 What does a 'RecordReader' do in a MapReduce job?

MapReduce Terminology Easy
A. It combines the outputs of multiple Reducers
B. It reads the final output from HDFS
C. It translates an InputSplit into key-value pairs for the Mapper
D. It monitors the health of the DataNodes

13 What is the term for the output produced by the Mapper before it reaches the Reducer?

MapReduce Terminology Easy
A. Aggregated Data
B. Raw Data
C. Intermediate Data
D. Final Output

14 In HDFS, which node is responsible for storing the metadata about the file system?

Hadoop - Namenode, DataNode Easy
A. NameNode
B. DataNode
C. JobTracker
D. TaskTracker

15 What is the primary function of a DataNode in HDFS?

Hadoop - Namenode, DataNode Easy
A. To schedule MapReduce jobs
B. To run the JobTracker
C. To manage user permissions
D. To store the actual data blocks

16 What happens if the NameNode fails in a traditional Hadoop 1.x cluster (without High Availability)?

Hadoop - Namenode, DataNode Easy
A. MapReduce jobs switch to local mode automatically
B. The entire HDFS becomes inaccessible
C. The cluster continues to operate normally
D. A DataNode automatically becomes the new NameNode

17 In the MapReduce version 1 (MRv1) architecture, which component manages the resources and schedules jobs across the cluster?

Job Tracker and TaskTracker Easy
A. DataNode
B. NameNode
C. JobTracker
D. TaskTracker

18 Where does a TaskTracker typically run in a Hadoop MRv1 cluster?

Job Tracker and TaskTracker Easy
A. On a dedicated master node
B. On the same node as a DataNode
C. Outside the Hadoop cluster
D. On the NameNode

19 When running a typical Word Count program in Hadoop, what is the expected output format?

word count on command line Easy
A. A list of unique words alongside their frequency of occurrence
B. A single integer representing the total number of words
C. A compressed zip file of all words
D. A graphical chart of word frequencies

20 Which command is commonly used on the command line to execute a compiled MapReduce JAR file?

word count on command line Easy
A. hdfs execute
B. hadoop run
C. hadoop jar
D. mapreduce start

21 A user wants to store a file of size 300 MB in HDFS with a configured block size of 128 MB. Assuming the replication factor is set to 3, how many physical block replicas will be stored across the cluster in total?

Hadoop Storage: HDFS Medium
A. 6 blocks
B. 3 blocks
C. 12 blocks
D. 9 blocks

22 In a Hadoop cluster configured with Rack Awareness and a replication factor of 3, how does the cluster typically place the replicas to ensure fault tolerance while optimizing write bandwidth?

Hadoop Architecture Medium
A. One replica is placed on the local rack, and the other two are placed on two different nodes in a different rack.
B. The first two replicas are placed on the local node, and the third is placed on a remote rack.
C. Each replica is placed on a completely different rack in the data center.
D. All three replicas are placed on different nodes within the same rack.

23 What is the primary architectural advantage of the 'Data Locality' principle in Hadoop?

Hadoop Architecture Medium
A. It guarantees that all localized databases are synchronized with the NameNode.
B. It moves data across the network to specialized compute nodes to increase processing speed.
C. It schedules computational tasks on the node where the data physically resides, minimizing network congestion.
D. It ensures that data is stored locally on the client machine before being uploaded to HDFS.

24 An administrator notices that the NameNode is running out of RAM, even though the cluster's total storage capacity is mostly empty. What is the most likely cause of this issue?

Hadoop Storage: HDFS Medium
A. The Secondary NameNode has failed to back up the data properly.
B. The DataNodes are sending heartbeats too frequently.
C. The cluster is storing an excessive number of very small files.
D. The replication factor is set too high, consuming extra RAM.

25 What is the actual role of the Secondary NameNode in a standard Hadoop cluster?

Hadoop Storage: HDFS Medium
A. It periodically downloads the fsimage and edits files, merges them, and uploads the updated fsimage to the primary NameNode.
B. It serves as an instant failover node if the primary NameNode crashes.
C. It manages metadata for secondary storage devices attached to DataNodes.
D. It acts as a load balancer for client read/write requests to the NameNode.

26 During an HDFS write operation, a client wants to write a block with a replication factor of 3. How is the data practically transferred to the DataNodes?

Hadoop Storage: HDFS Medium
A. The client writes it locally, and HDFS automatically replicates it in the background after the file is closed.
B. The NameNode coordinates the transfer by receiving the data from the client and pushing it to the DataNodes.
C. The client sends the data to the first DataNode, which pipes it to the second, which in turn pipes it to the third.
D. The client sends the block simultaneously to all three DataNodes.

27 If a MapReduce job is configured with zero Reducers (setNumReduceTasks(0)), what is the final output of the job?

Hadoop MapReduce paradigm Medium
A. The job fails because at least one reducer is required to aggregate data.
B. The job simply verifies data integrity but writes no output.
C. The output consists of the unsorted key-value pairs exactly as outputted by the Map tasks, stored in HDFS.
D. The output consists of the sorted key-value pairs directly from the Map phase, stored in HDFS.

28 How does a Combiner optimize a MapReduce job that calculates the total sales per region?

Hadoop MapReduce paradigm Medium
A. It automatically adjusts the number of map tasks based on cluster availability.
B. It runs on the Reducer node to filter out invalid records before the final reduction.
C. It merges small files in HDFS into larger ones before the Map phase begins.
D. It performs a local aggregation of map output data on the Map node, reducing the amount of data sent across the network during the shuffle phase.

29 During the Shuffle and Sort phase of MapReduce, what specific guarantee is provided to the Reducer regarding its input?

Hadoop MapReduce paradigm Medium
A. The Reducer will receive data split into blocks matching the HDFS block size.
B. All keys assigned to a single reducer will arrive in randomized order to prevent data skew.
C. Each reducer will receive an exactly equal amount of data, regardless of key distribution.
D. Values associated with the same key are grouped together, and the keys are presented to the Reducer in sorted order.

30 In the context of MapReduce Terminology, what is the primary difference between an HDFS Block and an InputSplit?

MapReduce Terminology Medium
A. They are identical concepts; Hadoop uses the terms interchangeably depending on the version.
B. An HDFS Block is a physical division of data on disk, whereas an InputSplit is a logical division of data that defines the input for a single Map task.
C. An InputSplit is a physical chunk of data handled by the JobTracker, while a Block is an abstract data structure used by the Reducer.
D. An InputSplit determines the number of Reducers, while an HDFS Block determines the number of Mappers.

31 Why must keys emitted by the Mapper implement the WritableComparable interface in Hadoop?

MapReduce Terminology Medium
A. To ensure that values can be logically split across multiple reducers.
B. Because Hadoop requires all data types to inherit from standard Java Collections.
C. So they can be serialized over the network and sorted during the shuffle phase.
D. So they can be compressed securely before writing to HDFS.

32 What component is directly responsible for converting raw input data (e.g., lines of a text file) into the initial <key, value> pairs processed by the Mapper?

MapReduce Terminology Medium
A. The OutputCommitter
B. The Partitioner
C. The InputSplitter
D. The RecordReader

33 In a MapReduce v1 (MRv1) architecture, what is the primary role of the JobTracker?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Medium
A. To store the metadata of HDFS files and direct clients to the correct DataNodes.
B. To execute the individual map and reduce tasks assigned by the NameNode.
C. To allocate resources, schedule jobs, monitor TaskTrackers, and re-execute failed tasks.
D. To merge the edit logs into the fsimage to keep the NameNode from crashing.

34 A TaskTracker node unexpectedly loses power while executing a Map task. How does the cluster recognize and handle this failure?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Medium
A. The TaskTracker reboots and resumes the task from the last saved checkpoint in HDFS.
B. The NameNode detects missing block heartbeats and reschedules the Map task on another rack.
C. The JobTracker stops receiving heartbeats from the TaskTracker, marks it as dead, and schedules the incomplete task on another available TaskTracker.
D. The JobTracker immediately fails the entire MapReduce job to prevent data corruption.

35 When a JobTracker determines that a specific Map task is running unusually slow compared to others in the same job, what mechanism can it use to mitigate this?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Medium
A. Data Rebalancing
B. Speculative Execution
C. Dynamic Partitioning
D. Garbage Collection

36 Which of the following best describes the relationship between DataNodes and TaskTrackers in a traditional Hadoop 1.x cluster?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Medium
A. TaskTrackers manage the metadata while DataNodes handle the actual computations.
B. DataNodes act as a backup for TaskTrackers in case the JobTracker fails.
C. They are entirely separate entities running on isolated hardware to prevent CPU and I/O contention.
D. They typically run on the same physical machines to enable data locality for MapReduce tasks.

37 If a DataNode successfully writes its block but its disk subsequently fails, how does the NameNode eventually find out about the missing data?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Medium
A. The DataNode sends regular block reports along with its heartbeats to the NameNode.
B. The Secondary NameNode scans the disks and updates the primary NameNode.
C. The JobTracker notifies the NameNode when a map task fails to read the data.
D. The NameNode continually pings all block locations asynchronously to verify integrity.

38 A user attempts to run a pre-compiled word count MapReduce job using the command: hadoop jar wc.jar WordCount /user/data/input /user/data/output. However, the job immediately fails before running any map tasks. What is the most likely cause?

word count on command line Medium
A. The input directory is empty, which throws a fatal execution exception.
B. The jar file lacks a combiner class, which is mandatory for WordCount.
C. The /user/data/output directory already exists in HDFS.
D. The user forgot to specify the number of reducers in the command arguments.

39 When executing a WordCount program via the command line (hadoop jar wordcount.jar org.example.WordCount /input /output), what happens to the output data produced by the Reducers?

word count on command line Medium
A. It is appended directly to the input files to keep data localized.
B. It is written as multiple part files (e.g., part-r-00000) inside the /output directory in HDFS.
C. It is printed directly to the terminal stdout.
D. It is stored in the local file system of the node where the command was executed.

40 In a Hadoop High Availability (HA) cluster, what prevents the 'split-brain' scenario where two NameNodes both think they are active and attempt to alter the filesystem simultaneously?

Hadoop Architecture Medium
A. The Secondary NameNode acts as an arbiter to vote on the true active node.
B. The JobTracker coordinates a distributed lock that limits metadata edits.
C. Fencing mechanisms are configured to isolate or power off the previously active NameNode.
D. DataNodes will only send heartbeats to the IP address with the lowest latency.

41 During an HDFS write operation, if the second DataNode in the replication pipeline fails while receiving a block, what is the immediate sequence of actions taken by the HDFS client and the remaining DataNodes?

Hadoop Storage: HDFS Hard
A. The entire block is discarded, the client requests a completely new pipeline from the NameNode, and the write operation restarts from the beginning.
B. The client reports the failure to the NameNode, which immediately allocates a new DataNode to maintain the replication factor before continuing the write.
C. The pipeline is closed, the failed DataNode is removed, the remaining DataNodes are given a new generation stamp, and the write resumes with the remaining DataNodes.
D. The first DataNode caches the data in memory, waits for the NameNode to restart the second DataNode, and then resumes the data transfer.

42 In the MRv1 architecture, what happens if a TaskTracker stops sending heartbeats to the JobTracker due to a temporary network partition that exceeds the timeout period?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Hard
A. The JobTracker marks the TaskTracker as dead, fails its running tasks, and reschedules them on other nodes, while the TaskTracker pauses its tasks.
B. The JobTracker marks the TaskTracker as dead and reschedules its tasks; when the partition resolves, the TaskTracker attempts to reconnect and is instructed to kill its old tasks.
C. The TaskTracker promotes itself to an independent JobTracker for its local tasks and merges results back when the network is restored.
D. The JobTracker delegates the tracking to a Standby JobTracker, which polls the TaskTracker directly until the network recovers.

43 How does the MapReduce framework handle speculative execution when a task is straggling due to systemic data skew (e.g., one Reducer receives 90% of the data) rather than hardware degradation?

Hadoop MapReduce paradigm Hard
A. It splits the skewed Reducer task into multiple sub-reducers, effectively parallelizing the heavy partition.
B. It successfully mitigates the delay by launching a speculative task that dynamically re-partitions the skewed data.
C. It detects data skew via the partitioner metrics and automatically disables speculative execution for that specific task.
D. It launches a speculative task, but both the original and speculative tasks will take equally long since the skew is inherent to the data, potentially wasting cluster resources.

44 Suppose an HDFS file is 130 MB and the block size is 64 MB. The file contains textual records where a single logical record spans across the boundary of the first and second block. How does TextInputFormat handle the InputSplit boundary to ensure data integrity?

MapReduce Terminology Hard
A. The framework copies the overflowing record entirely into the second block before assigning the InputSplits to the Mappers.
B. The JobTracker detects the split boundary violation and merges the two blocks into a single 128 MB InputSplit processed by one Mapper.
C. The first Map task processes exactly 64 MB. The second Map task reads the truncated record from the beginning of the second block, resulting in a data parsing error.
D. The first Map task processes the first block and reads past the 64 MB boundary into the second block until the end of the current record. The second Map task ignores the first partial record in its block.

45 Which of the following best describes the structural transition of metadata when the NameNode is restarted, specifically regarding the FsImage and EditLog?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Hard
A. The Secondary NameNode takes over client requests while the primary NameNode merges the FsImage and EditLog into a new FsImage.
B. The NameNode loads the FsImage into memory, leaves the EditLog untouched, and asynchronously merges them in the background while serving clients.
C. The NameNode discards the old FsImage, regenerates it entirely from the block reports of the DataNodes, and then replays the EditLog.
D. The NameNode applies the EditLog to the FsImage in memory, creates a new FsImage on disk, and truncates the old EditLog before accepting new client requests.

46 When executing a Word Count job via the Hadoop command line using hadoop jar, what is the effect of setting -D mapreduce.job.reduces=0?

word count on command line Hard
A. The framework automatically uses a Combiner to act as the Reducer, yielding partially aggregated counts per Map task.
B. The job executes normally, but the final output is merged into a single file by the JobTracker instead of Reducers.
C. The Map output is written directly to HDFS without a shuffle, sort, or reduce phase, resulting in output files containing unaggregated key-value pairs.
D. The job fails immediately because a MapReduce job strictly requires at least one Reducer.

47 In a Hadoop cluster configured with Rack Awareness, if a client running on a DataNode requests to write a file with a replication factor of 3, how does the block placement policy distribute the replicas?

Hadoop Architecture Hard
A. All three replicas are placed on different racks to maximize fault tolerance.
B. Replica 1 on the local node, Replica 2 on a random node in a different rack, Replica 3 on another node in that same different rack.
C. Replica 1 on a random node in a different rack, Replica 2 and 3 on different nodes in the local rack.
D. Replica 1 on the local node, Replica 2 on a random node in the same rack, Replica 3 on a random node in a different rack.

48 An HDFS client opens a file for appending (append()). Simultaneously, a network partition isolates the client from the NameNode but not from the DataNodes. How does HDFS handle lease management for this file?

Hadoop Storage: HDFS Hard
A. The NameNode's lease for the client expires after the hard limit (usually 1 hour). The NameNode initiates lease recovery, closing the file and potentially discarding uncommitted blocks.
B. The DataNodes detect the lack of NameNode heartbeats and automatically revoke the client's write access, saving partial blocks.
C. The client immediately receives an IOException from the DataNodes because DataNodes require continuous token validation from the NameNode during appends.
D. The client continues to write to the DataNodes indefinitely; the NameNode cannot intervene until the network is restored.

49 Consider a MapReduce job where the map output keys are custom objects representing composite keys: [String category, Long timestamp]. You want the Reducer to process data grouped by category, but sorted internally by timestamp. Which components must be explicitly configured to achieve this Secondary Sorting?

Hadoop MapReduce paradigm Hard
A. A custom Combiner to pre-sort by timestamp and a Partitioner on category.
B. A custom Partitioner on category, a custom GroupingComparator on category, and a custom SortComparator on [category, timestamp].
C. A custom Partitioner on [category, timestamp], and a custom GroupingComparator on timestamp.
D. Only a custom SortComparator on [category, timestamp] is required; Hadoop inherently handles the grouping.

50 What is the primary constraint placed on the Combiner function in the MapReduce paradigm to ensure the correctness of the final output?

MapReduce Terminology Hard
A. Its input key-value types must match the output key-value types, and the operation it performs must be both commutative and associative.
B. It must guarantee execution exactly once per Map output split before the data is shuffled.
C. It must implement the WritableComparable interface to ensure intermediate data is sortable.
D. It must be an exact programmatic clone of the Mapper class.

51 How does HDFS ensure data integrity during a read operation if a client detects a checksum mismatch for a block?

Hadoop Storage: HDFS Hard
A. The DataNode dynamically reconstructs the block from parity bits stored on the local disk before sending it to the client.
B. The client throws a ChecksumException, terminating the application immediately without retry.
C. The client reports the bad block and the DataNode to the NameNode, then proceeds to read from another replica of the block.
D. The NameNode detects the mismatch via a heartbeat, marks the DataNode as dead, and routes the client to a secondary NameNode.

52 In a heavily utilized MRv1 cluster, the JobTracker must schedule tasks based on data locality. If a node has a free slot, but no pending Map tasks have data local to that node, what is the default delay scheduling strategy often used by fair/capacity schedulers?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Hard
A. The JobTracker immediately assigns a non-local task to utilize the free slot, prioritizing cluster utilization over locality.
B. The JobTracker waits for a short, configurable period of time before assigning a non-local task, hoping a task with local data becomes available.
C. The JobTracker preempts a running task on another node to migrate it to the node with the free slot.
D. The JobTracker assigns a Reduce task instead, since Reduce tasks do not depend on data locality.

53 During the Shuffle and Sort phase of MapReduce, what dictates the transition of map output data from memory to disk on the Mapper side?

Hadoop MapReduce paradigm Hard
A. The Mapper stores all key-value pairs in JVM heap memory until the map task finishes, at which point the entire dataset is flushed to disk simultaneously.
B. The OutputCommitter evaluates the block size limit; once 64 MB of data is accumulated, the framework initiates a blocking write to HDFS.
C. The Mapper writes directly to the disk cache of the operating system; Hadoop relies on the OS to flush data to disk asynchronously.
D. Data is buffered in a circular in-memory buffer; when the buffer reaches a certain threshold (e.g., 80%), a background thread begins spilling the contents to disk while the Mapper continues writing to the remaining space.

54 A user executes a Word Count job from the command line using a compressed input file (input.txt.gz). What determines whether Hadoop can split this compressed file into multiple InputSplits?

word count on command line Hard
A. The file size; if it exceeds the HDFS block size, Hadoop forces a split regardless of the compression algorithm.
B. The InputFormat class used; TextInputFormat automatically decompresses and splits all formats, while SequenceFileInputFormat does not.
C. The compression codec used; algorithms like Gzip do not support splitting, so the entire file must be processed by a single Mapper, whereas bzip2 is splittable.
D. The command-line argument -D mapreduce.input.fileinputformat.split.maxsize; it overrides any compression limitations.

55 In the context of the WritableComparable interface, which is strictly required for MapReduce keys, what is the specific purpose of the compareTo() and readFields() methods, respectively?

MapReduce Terminology Hard
A. compareTo() handles sorting of keys during the shuffle phase; readFields() deserializes the object state from an incoming DataInput stream.
B. compareTo() ensures uniqueness for the GroupingComparator; readFields() serializes the object state into a byte array.
C. compareTo() evaluates the equality of values; readFields() reads configuration properties from the JobContext.
D. compareTo() dictates which Reducer a key is assigned to; readFields() writes the object to HDFS.

56 Which of the following describes the most critical limitation of the MRv1 architecture (JobTracker/TaskTracker) that ultimately necessitated the shift to YARN (Yet Another Resource Negotiator)?

Hadoop Architecture Hard
A. TaskTrackers were incapable of running Java Virtual Machines (JVMs), requiring all map tasks to execute as native C++ threads.
B. The JobTracker could only process unstructured data, making it incompatible with SQL-like query engines such as Hive or Pig.
C. The JobTracker was deeply tightly coupled with both cluster resource management and job lifecycle scheduling, creating a massive scalability bottleneck around 4,000 nodes.
D. MRv1 required NameNodes to participate in MapReduce shuffle operations, overloading HDFS metadata operations.

57 If the JobTracker JVM fails and undergoes a restart in a classic MRv1 setup, what is the fate of the currently executing jobs and the TaskTrackers?

Hadoop - Namenode, DataNode, Job Tracker and TaskTracker Hard
A. The JobTracker recovers the exact state of all tasks from the FsImage and seamlessly reconnects to the TaskTrackers.
B. TaskTrackers independently continue running tasks and hold the results in a distributed cache until the JobTracker reconnects.
C. All running jobs fail entirely because the job metadata and task state held in the JobTracker's memory are lost; TaskTrackers reconnect to the new JobTracker as empty nodes.
D. The Secondary JobTracker instantaneously promotes itself, ensuring zero downtime and continuous task execution.

58 In MapReduce, DistributedCache is used to broadcast side data. If an application utilizes DistributedCache.addCacheArchive(), how does the TaskTracker process this payload before task execution?

Hadoop MapReduce paradigm Hard
A. It queries the NameNode for the archive contents dynamically via RPC calls every time a task requests a file.
B. It un-archives the file automatically on the local disk of the worker node, and provides the path to the task via symlinks in the task's working directory.
C. It loads the archive strictly into the JVM heap space of each Mapper, making it accessible via standard memory references.
D. It copies the archive to the HDFS block pool on the node, strictly enforcing replication logic before task initialization.

59 Regarding data localization, what distinguishes a Rack-local task from a Node-local task in Hadoop MapReduce?

MapReduce Terminology Hard
A. Node-local tasks execute within the JVM of the JobTracker; Rack-local tasks execute on the remote TaskTracker nodes.
B. Node-local tasks fetch data via HTTP; Rack-local tasks fetch data via RPC over the top-of-rack switch.
C. Node-local tasks process data residing on the same DataNode as the TaskTracker; Rack-local tasks process data residing on a different DataNode but within the same network switch.
D. Node-local tasks are Map tasks; Rack-local tasks are strictly Reduce tasks.

60 Hadoop employs an abstraction called SequenceFile for storing binary key-value pairs. Within the architecture, what is the structural advantage of using SequenceFile.CompressionType.BLOCK over RECORD compression?

Hadoop Architecture Hard
A. BLOCK compression forces the file to align exactly with HDFS block boundaries (e.g., 128 MB), preventing InputSplits from spanning across nodes.
B. BLOCK compression disables sync markers, relying entirely on the NameNode metadata to locate record boundaries.
C. BLOCK compression stores the key uncompressed and the value compressed, allowing for faster key sorting during the shuffle phase.
D. BLOCK compression compresses multiple records together as a single block, achieving much higher compression ratios than compressing individual records, while maintaining splittability.