Unit 5 - Practice Quiz

INT312 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is Apache HBase primarily classified as?

HBase Basics Easy
A. A graph database
B. A NoSQL, column-oriented database
C. A relational database management system (RDBMS)
D. An in-memory caching system

2 Which underlying file system does Apache HBase typically use to store its data?

HBase Architecture Easy
A. EXT4
B. Amazon S3 exclusively
C. Hadoop Distributed File System (HDFS)
D. NTFS

3 Which centralized service is used by HBase to maintain configuration information and distributed synchronization?

ZooKeeper Integration Easy
A. Apache Hive
B. Apache Kafka
C. Apache ZooKeeper
D. Apache Pig

4 In the HBase architecture, which node is responsible for monitoring RegionServers and assigning regions to them?

HMaster Easy
A. RegionServer
B. DataNode
C. HMaster
D. NameNode

5 Which component in HBase actually handles the read and write requests from the clients?

RegionServer Easy
A. RegionServer
B. NameNode
C. ZooKeeper
D. HMaster

6 What uniquely identifies a specific row in an HBase table?

HBase Data Model Easy
A. Column Qualifier
B. Primary Key
C. Timestamp
D. Row Key

7 In HBase, columns are grouped into logical and physical sets called what?

Column Families Easy
A. Column Families
B. Row Keys
C. Tables
D. Namespaces

8 Which of the following MUST be predefined when creating an HBase table?

HBase Schema Easy
A. The exact number of rows
B. The data type of every cell
C. Column Families
D. Every single column name

9 Apache HBase is written in which programming language?

HBase Basics Easy
A. Scala
B. C++
C. Python
D. Java

10 Does Apache HBase natively support SQL queries out of the box?

HBase vs RDBMS Easy
A. Yes, it uses MySQL as its query engine.
B. Yes, but only for UPDATE statements.
C. No, it does not support SQL natively without extra tools like Apache Phoenix.
D. Yes, it is fully SQL compliant.

11 What is an HBase 'Region'?

HBase Architecture Easy
A. The main configuration file
B. A geographical location of a server
C. A backup of the entire database
D. A continuous range of sorted rows stored together

12 According to the CAP theorem, which two properties does HBase primarily guarantee?

HBase Properties Easy
A. Availability and Partition Tolerance (AP)
B. Consistency and Availability (CA)
C. None of the above
D. Consistency and Partition Tolerance (CP)

13 How does HBase handle multiple versions of data stored in the same cell?

HBase Data Model Easy
A. It overwrites the old data immediately.
B. It cannot store multiple versions of data.
C. It stores them using different Row Keys.
D. It differentiates them using a Timestamp.

14 Which API operation is used to insert or update data in an HBase table?

HBase Operations Easy
A. UPDATE
B. PUT
C. INSERT
D. POST

15 Which API operation is used to fetch a single, specific row of data from HBase?

HBase Operations Easy
A. FETCH
B. PULL
C. GET
D. SELECT

16 In HBase, what is a Column Qualifier?

HBase Data Model Easy
A. The part of the column name that identifies a specific column within a Column Family
B. The data type of the column
C. The name of the database
D. A constraint that prevents null values

17 If an HMaster node fails in a highly available HBase cluster, what happens?

HBase Architecture Easy
A. All data is deleted.
B. A backup HMaster is elected to take over.
C. The NameNode takes over HMaster duties.
D. The entire cluster immediately shuts down.

18 Why would a system choose HBase over plain HDFS?

HBase vs HDFS Easy
A. HDFS cannot store large files.
B. HBase is much cheaper to install than HDFS.
C. HBase provides fast, random read/write access to data, whereas HDFS is designed for sequential batch processing.
D. HBase does not require any servers.

19 Which operation is used to iterate over a range of rows in an HBase table?

HBase Operations Easy
A. GET
B. SCAN
C. ITERATE
D. LOOP

20 Which famous Google technology paper inspired the creation of Apache HBase?

HBase Basics Easy
A. Spanner
B. Bigtable
C. Google File System (GFS)
D. MapReduce

21 In an HBase cluster, how is the state and configuration of the distributed environment maintained to ensure high availability?

HBase Architecture Medium
A. Through Apache ZooKeeper, which manages cluster coordination and tracks server failures
B. Through a dedicated relational database like MySQL
C. By storing the metadata directly in HDFS blocks replicated across the cluster
D. By electing a backup HMaster that continuously polls RegionServers

22 If a user needs to retrieve a specific version of a value in an HBase table, which combination of coordinates is required?

HBase Data Model Medium
A. Row Key, Column Family, Column Qualifier, Timestamp
B. Row Key, Column Family, Column Qualifier
C. Table Name, Partition Key, Clustering Key
D. Row Key, Region ID, Column Family

23 What happens automatically when an HBase table's region grows beyond its configured maximum file size?

Region and RegionServers Medium
A. The region dynamically splits into two roughly equal child regions.
B. The RegionServer rejects further write requests until data is deleted.
C. The data is compressed and moved to an archive table.
D. The region is flushed to a secondary storage cluster.

24 Which of the following scenarios is a valid reason to choose Apache HBase over directly storing files in HDFS?

HBase vs HDFS Medium
A. The application requires streaming massive datasets for sequential batch processing.
B. The application needs to store large, immutable video files.
C. The application relies on complex SQL joins and multi-row transactions.
D. The application requires low-latency, random read and write access to billions of rows.

25 Why is it generally recommended to keep the number of column families in an HBase table small (typically 1 to 3)?

Column Families Medium
A. Because HBase has a hard limit of 3 column families per table.
B. Because ZooKeeper cannot track metadata for more than 3 column families.
C. Because flushing occurs per region; flushing one column family forces the others to flush, creating many small HFiles.
D. Because read performance degrades exponentially as column families cannot be compressed.

26 When a client sends a write request to a RegionServer, what is the correct operational sequence before the server acknowledges the write as successful?

HBase Write Path Medium
A. Write to BlockCache -> Write to Write-Ahead Log (WAL) -> Acknowledge client
B. Write to MemStore -> Write to HFile -> Acknowledge client
C. Write to HFile directly -> Acknowledge client
D. Write to Write-Ahead Log (WAL) -> Write to MemStore -> Acknowledge client

27 How are rows physically sorted within an HBase table?

HBase Data Model Medium
A. Chronologically by their insertion timestamp
B. Numerically by a system-generated auto-incrementing ID
C. Randomly based on the hash of the Column Family
D. Lexicographically by their Row Key

28 In the event of a RegionServer crash, which HBase component is responsible for reassigning its regions to other healthy RegionServers?

HMaster and ZooKeeper Medium
A. The Client Driver
B. Apache ZooKeeper
C. The HMaster
D. The NameNode

29 What is the key difference between Minor Compaction and Major Compaction in HBase?

Compaction Medium
A. Minor compaction runs only on the HMaster, while Major compaction runs on RegionServers.
B. Minor compaction merges smaller adjacent HFiles, while Major compaction merges all HFiles into one and removes deleted/expired cells.
C. Minor compaction merges all HFiles in a region, while Major compaction only merges MemStores.
D. Minor compaction deletes expired cells, while Major compaction only re-indexes the HFiles.

30 A developer is inserting sequential time-series data into HBase using the timestamp as the row key. What performance issue is likely to occur?

Row Key Design Medium
A. Data corruption due to overlapping timestamps
B. Automatic rejection of sequential keys by the HMaster
C. Region hotspotting, where all new writes hit a single RegionServer
D. ZooKeeper synchronization timeout

31 During a read request, if a RegionServer does not find the requested data in the BlockCache or MemStore, what does it do to optimize the search on disk?

HBase Read Path Medium
A. It scans all HFiles sequentially until the data is found.
B. It uses Bloom filters to skip HFiles that definitely do not contain the requested row key.
C. It queries the HMaster for the data location.
D. It requests the client to search the MapReduce output.

32 Which of the following guarantees does Apache HBase provide in the context of the CAP Theorem?

HBase vs RDBMS Medium
A. Consistency and Partition Tolerance (CP)
B. Consistency and Availability (CA)
C. Eventual Consistency and High Availability
D. Availability and Partition Tolerance (AP)

33 What is the primary purpose of the Write-Ahead Log (WAL) located on a RegionServer?

Region and RegionServers Medium
A. To track the history of schema changes applied by the HMaster.
B. To audit user access and log analytical metrics for MapReduce.
C. To recover data that has not yet been flushed from the MemStore in the event of a server crash.
D. To cache frequently read rows to improve read performance.

34 How does an HBase client initially discover the location of the hbase:meta table to perform read/write operations?

HBase Architecture Medium
A. By asking the NameNode of the underlying HDFS cluster
B. By broadcasting a UDP request to all RegionServers
C. By querying Apache ZooKeeper
D. By connecting directly to the local HMaster process

35 How does Apache HBase internally store data types such as Strings, Integers, or custom objects?

Data Types Medium
A. As uninterpreted arrays of bytes
B. As JSON documents
C. As serialized XML strings
D. As heavily typed native Java objects

36 Which of the following is NOT a responsibility of the HMaster in HBase?

HMaster and ZooKeeper Medium
A. Handling DDL operations like create, alter, and drop tables
B. Serving client read and write requests for user tables
C. Monitoring all RegionServer instances in the cluster
D. Assigning regions to RegionServers at startup

37 To solve the region hotspotting problem caused by sequentially increasing row keys, which technique involves adding a deterministic prefix based on the original key?

Row Key Design Medium
A. Row Compression
B. Hashing
C. Salting
D. Key Reversal

38 In HBase, when a cell is updated or deleted, what physically happens to the old data immediately?

HBase Data Model Medium
A. The old data is overwritten in place on the disk.
B. The old data is moved to a temporary trash bin in HDFS.
C. The old data remains, and a new version is written; deletes are marked with a tombstone marker.
D. The old data is immediately stripped from the HFile through an synchronous compaction.

39 What is the primary function of the BlockCache in a RegionServer?

HBase Read Path Medium
A. To store region metadata temporarily while communicating with ZooKeeper.
B. To cache frequently read data blocks in memory to speed up subsequent reads.
C. To buffer incoming write requests before appending to the WAL.
D. To group small HFiles together before executing a major compaction.

40 If an HBase table defines a Column Family named metrics, which of the following represents a valid column qualifier creation process?

Column Families Medium
A. Qualifiers are auto-generated by the HMaster sequentially.
B. Qualifiers are dynamically created by the client at the time of data insertion.
C. Qualifiers are extracted automatically from the Row Key hash.
D. Qualifiers must be strictly defined in the table schema before data insertion.

41 A telecommunications company uses Apache HBase to store call detail records (CDRs). They initially designed the RowKey as [Timestamp]-[CallerID]. They are experiencing severe write bottlenecks on a single RegionServer during peak hours. Which of the following RowKey redesign strategies effectively eliminates this 'hot-spotting' while maintaining optimal performance for time-range queries?

HBase Data Model & RowKey Design Hard
A. Reversing the timestamp before appending the CallerID.
B. Hashing the CallerID and prepending a modulo-based bucket ID to the original RowKey.
C. Moving to a purely random UUID RowKey for perfectly uniform distribution.
D. Salting the RowKey by prepending a randomly generated byte array.

42 During a high-throughput write operation in HBase, a RegionServer crashes immediately after writing a batch of mutations to the MemStore and appending them to the Write-Ahead Log (WAL), but before a flush to an HFile occurs. How does HBase ensure data durability and consistency in this scenario?

HBase Architecture & Write Path Hard
A. The HMaster instructs the client to replay the failed mutations directly to the newly assigned RegionServer.
B. Zookeeper detects the failure and immediately promotes the secondary MemStore replica on a different RegionServer to active.
C. HDFS automatically replicates the un-flushed MemStore blocks to another running RegionServer.
D. The HMaster splits the abandoned WAL into separate files per region and assigns them to new RegionServers for replay during region initialization.

43 An HBase cluster is experiencing high disk I/O and CPU utilization due to frequent major compactions. A data engineer proposes disabling major compactions entirely and relying solely on minor compactions. What is the primary negative consequence of this approach?

Compaction & Performance Hard
A. The Write-Ahead Log (WAL) will never be rolled, eventually filling up the entire HDFS capacity.
B. Data locality across HDFS DataNodes will permanently drop to zero percent, requiring cross-rack reads for all queries.
C. Deleted cells (tombstones) and expired versions will never be purged, leading to infinitely growing storage and degraded read performance.
D. The MemStore will fill up faster, leading to more frequent flushes and OutOfMemory (OOM) errors.

44 When an HBase client performs a Get request for a newly introduced RowKey for the very first time after the cluster has been restarted, what is the exact sequence of network interactions it performs to locate the correct RegionServer?

HBase Read Path & Client Routing Hard
A. Queries HMaster -> Queries hbase:meta RegionServer -> Queries target RegionServer.
B. Queries Zookeeper -> Queries target RegionServer directly using cached metadata.
C. Queries Zookeeper -> Queries hbase:meta RegionServer -> Queries target RegionServer.
D. Queries Zookeeper -> Queries HMaster -> Queries hbase:meta RegionServer -> Queries target RegionServer.

45 HBase provides strong consistency for row-level operations. Which internal mechanism guarantees that concurrent reads do not see partial updates from a parallel write operation to the same row?

HBase Consistency & MVCC Hard
A. Synchronous replication to all HDFS DataNodes before returning an acknowledgment to the client.
B. Multi-Version Concurrency Control (MVCC) utilizing a sequence ID (Write Number) advanced upon completion of the MemStore update.
C. Exclusive row-level write locks acquired in the BlockCache.
D. Distributed locks managed by Zookeeper on the target RowKey.

46 In a highly available HBase cluster, the Active HMaster node suddenly crashes due to a hardware failure. Assuming the Zookeeper ensemble and all RegionServers remain healthy, what is the immediate impact on client applications executing heavy read and write (DML) workloads?

HBase Architecture & High Availability Hard
A. The cluster will become strictly read-only to prevent split-brain scenarios until a new HMaster is elected.
B. Reads will succeed using cached region locations, but writes will fail because the WAL cannot be rotated.
C. Reads and writes will continue normally for existing regions, but schema changes (DDL) and handling of region splits/failures will halt until a Backup HMaster takes over.
D. All reads and writes will fail immediately with a MasterNotRunningException.

47 An HBase administrator configures hbase.regionserver.global.memstore.size to 0.5 (50% of heap) and hfile.block.cache.size to 0.4 (40% of heap) on a RegionServer with a 32GB heap. During a sustained, mixed read/write workload, the RegionServer repeatedly crashes with OutOfMemoryError. What is the fundamental architectural constraint violated here?

MemStore & BlockCache Hard
A. HBase strictly prohibits MemStore configurations exceeding 40% because of WAL serialization overhead.
B. The combined allocation of MemStore and BlockCache must strictly remain below 0.8 (80%) of the total heap to leave adequate room for internal RegionServer processing and RPC queues.
C. The BlockCache must always be strictly larger than the MemStore size to handle flush operations.
D. The combined memory allocation limits garbage collection, requiring G1GC which is incompatible with a 32GB heap.

48 HBase achieves data locality by ensuring HFiles are written to HDFS DataNodes residing on the same physical machine as the RegionServer. Under which of the following circumstances is data locality temporarily lost, and how is it subsequently restored?

HBase Architecture & HDFS Hard
A. Lost when client write volume exceeds WAL capacity; restored when WALs are archived.
B. Lost when a RegionServer fails and its regions are moved to another server; restored during the next Major Compaction.
C. Lost during MemStore flushes; restored when Zookeeper triggers a locality sync.
D. Lost during HDFS NameNode failover; restored automatically by the HDFS Balancer script.

49 A developer configures a ROWCOL Bloom Filter on an HBase column family to optimize high-latency read queries. For which of the following access patterns will this specific Bloom Filter provide the most significant performance improvement?

HBase Read Path & Bloom Filters Hard
A. Get operations querying a specific RowKey and a highly specific subset of Column Qualifiers that rarely exist.
B. Scan operations filtering on specific cell values using a ValueFilter.
C. Get operations querying the entire row (all column qualifiers) for a specific RowKey.
D. Scan operations retrieving all columns for a contiguous range of RowKeys.

50 Consider an HBase table where a client inserts a cell at RowKey R1, Column Family CF1, Qualifier Q1, with a specific timestamp . Later, the client issues a Delete operation for R1:CF1:Q1 without specifying a timestamp. Assuming default behavior, what exactly does HBase write to the storage engine?

HBase Data Model Hard
A. A tombstone marker for R1:CF1:Q1 with the server's current timestamp, which shadows all older versions during reads.
B. A DeleteFamily marker at the row level that masks the entire R1:CF1 combination regardless of qualifier.
C. It directly modifies the MemStore and HFile to physically erase the bytes associated with .
D. A tombstone marker for R1:CF1:Q1 explicitly matched to , leaving any versions with visible.

51 In the context of the CAP theorem, Apache HBase is classified as a CP (Consistent and Partition Tolerant) system. In the event of a network partition separating the HMaster and several RegionServers from Zookeeper, how does HBase sacrifice Availability to maintain Consistency?

HBase and CAP Theorem Hard
A. RegionServers switch to read-only mode, serving stale data from BlockCache until Zookeeper reconnects.
B. HBase transparently routes all requests to a backup HDFS cluster, ensuring availability but sacrificing read latency.
C. The HMaster forcefully formats the WALs of the partitioned RegionServers, causing temporary write unavailability but ensuring zero data divergence.
D. The partitioned RegionServers voluntarily shut down (suicide) because they lose their ephemeral nodes in Zookeeper, making their hosted regions temporarily unavailable.

52 A data science team needs to calculate the real-time sum of a specific numerical column across millions of rows in an HBase table. Retrieving all rows to the client application is too slow due to network overhead. Which HBase feature provides the most efficient, distributed execution for this aggregation directly on the RegionServers?

HBase Advanced Features Hard
A. Endpoint Coprocessors.
B. Client-side caching with Scan.setBatch().
C. Observer Coprocessors.
D. HBase MapReduce Integration.

53 HFiles use a block-based format (default 64KB block size) to store data on HDFS. Within the HFile structure, how does the Read Path rapidly locate a specific RowKey without scanning the entire file?

HBase Architecture & HFile Hard
A. By loading the HFile's Data Block Index (located via the Trailer) into memory, which maps the start keys of data blocks to their physical offsets.
B. By utilizing an embedded B+ Tree structure stored at the beginning of the HFile.
C. By querying the hbase:meta table, which holds the exact byte offsets for every RowKey in the cluster.
D. By sequentially scanning the WAL to find the most recent memory offset for the HFile.

54 A developer heavily uses reverse scans (Scan.setReversed(true)) to fetch the most recent records from a time-series HBase table where RowKeys are monotonically increasing timestamps. They observe significant performance degradation compared to forward scans. What is the fundamental architectural reason for this degradation?

HBase Performance Tuning Hard
A. Reverse scans bypass the BlockCache entirely to avoid cache poisoning.
B. HFiles and MemStores use a forward-linked skip-list and internal block encodings (like prefix encoding) optimized strictly for forward sequential access.
C. Zookeeper must continually re-calculate region boundaries during a reverse scan, causing high CPU overhead.
D. Reverse scans require the RegionServer to perform an on-the-fly Major Compaction before returning data.

55 To improve write throughput on a massive HBase cluster, the administration enables the MultiWAL feature. What is the specific bottleneck this feature aims to resolve?

HBase Write Path Hard
A. The inability of a single WAL to replicate to multiple HDFS data centers simultaneously.
B. The Zookeeper locking overhead when multiple RegionServers attempt to write to the same region simultaneously.
C. The low throughput of a single HDFS write pipeline when writing WAL edits sequentially to a single file.
D. The single-threaded nature of the MemStore flush operation.

56 A column family CF1 is configured with VERSIONS => 3 and MIN_VERSIONS => 1, and a TTL (Time-To-Live) of 86400 seconds (1 day). If 5 updates are made to a specific cell within the last hour, and 1 update was made 2 days ago, what will a major compaction retain?

HBase Data Model Hard
A. Only 1 update (the 2-day old one) because MIN_VERSIONS overrides TTL, and the recent ones exceed the version limit.
B. The most recent update only, as TTL aggressively purges any multi-versioned data to save space.
C. All 5 updates from the last hour (ignoring max versions to satisfy MIN_VERSIONS); the 2-day-old update is dropped.
D. The 3 most recent updates from the last hour; the older 2 updates and the 2-day-old update are dropped.

57 HBase relies on a Log-Structured Merge-Tree (LSM-Tree) architecture rather than a traditional B-Tree. Which of the following best describes the primary advantage of this choice for Big Data workloads?

Log-Structured Merge-Tree Hard
A. LSM-Trees convert random, concurrent write operations into sequential disk I/O, maximizing write throughput.
B. LSM-Trees inherently support cross-row atomic transactions, which B-Trees cannot provide.
C. LSM-Trees provide strictly bounded, predictable latencies for highly randomized point reads compared to B-Trees.
D. LSM-Trees eliminate the need for an in-memory buffer, allowing all writes to go directly to disk without CPU overhead.

58 Zookeeper tracks the status of RegionServers using ephemeral nodes in a specific directory (e.g., /hbase/rs). What edge-case occurs if a RegionServer experiences a severe Java "Stop-The-World" Garbage Collection pause lasting longer than the Zookeeper session timeout?

HBase Architecture & Zookeeper Hard
A. The RegionServer automatically transitions to a read-only state to serve stale data until GC finishes.
B. The HMaster pauses all DML operations across the entire cluster until the GC pause completes and the node responds.
C. Zookeeper assumes the RegionServer is dead, expires its session, and the HMaster begins reassigning its regions; when the GC finishes, the RegionServer shuts itself down.
D. Zookeeper dynamically extends the session timeout via a heartbeat retry mechanism, keeping the cluster stable.

59 You are pre-splitting an HBase table to prevent initial hot-spotting. The RowKeys are MD5 hashes represented as 32-character hexadecimal strings (0-9, a-f). You want to pre-split the table into exactly 16 regions. Which of the following represents the optimal set of split keys?

HBase Performance Tuning Hard
A. Allowing HBase to dynamically auto-split the table based on the ConstantSizeRegionSplitPolicy.
B. A set of 16 keys, where the first character iterates from '0' to 'f' (i.e., ['0', '1', '2', ..., 'e', 'f']).
C. A set of 15 keys based on the integer representation of the hashes divided by 16.
D. A set of 15 keys, where the first character iterates from '1' to 'f' (i.e., ['1', '2', '3', ..., 'e', 'f']).

60 Which of the following scenarios natively requires distributed transaction management libraries (like Apache Phoenix or Tephra) because it violates HBase's out-of-the-box ACID guarantees?

HBase Consistency Hard
A. Updating a cell while a concurrent thread is reading the same exact cell, requiring read-committed isolation.
B. Ensuring that a write to the MemStore is immediately durable even if the node crashes before flushing.
C. Atomically updating the name, age, and address columns of a single user within the same RowKey.
D. Atomically transferring a balance between two different RowKeys representing different user accounts.