Cassandra uses a peer-to-peer (masterless) architecture where all nodes are treated equally, ensuring there is no single point of failure.
Incorrect! Try again.
3Which protocol does Cassandra use for nodes to communicate and share state information about themselves and other nodes?
Gossip protocol
Easy
A.FTP
B.SMTP
C.HTTP/2
D.Gossip protocol
Correct Answer: Gossip protocol
Explanation:
The Gossip protocol is an internal communication protocol used by Cassandra nodes to discover each other and broadcast state information across the cluster.
Incorrect! Try again.
4In Cassandra, what does the 'Replication Factor' determine?
Replication and consistency levels
Easy
A.The number of disks per node
B.The number of users allowed to read data
C.The total number of copies of data across the cluster
D.The speed at which data is written
Correct Answer: The total number of copies of data across the cluster
Explanation:
The Replication Factor dictates how many copies (replicas) of a given row of data are stored across the nodes in a Cassandra cluster.
Incorrect! Try again.
5Which consistency level requires only a single replica node to acknowledge a read or write operation to be considered successful?
Replication and consistency levels
Easy
A.QUORUM
B.ANY
C.ONE
D.ALL
Correct Answer: ONE
Explanation:
Consistency level ONE ensures that an operation succeeds as soon as one replica acknowledges it, providing high availability and low latency.
Incorrect! Try again.
6During a write operation, where does Cassandra store the data in memory before flushing it to disk?
Read and write paths
Easy
A.Cache
B.SSTable
C.Memtable
D.Commit Log
Correct Answer: Memtable
Explanation:
Data is first written to an in-memory structure called a Memtable. Once the Memtable is full, it is flushed to disk as an SSTable.
Incorrect! Try again.
7What is the append-only file on disk used by Cassandra to ensure data durability and aid in crash recovery?
Read and write paths
Easy
A.Commit log
B.Bloom filter
C.Memtable
D.SSTable
Correct Answer: Commit log
Explanation:
The Commit log is an append-only file used to record every write operation. It ensures data is not lost if the node crashes before the Memtable is flushed to disk.
Incorrect! Try again.
8Which Cassandra object acts as a container for tables and is analogous to a database in a Relational Database Management System (RDBMS)?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Easy
A.Column Family
B.Cluster
C.Keyspace
D.Partition
Correct Answer: Keyspace
Explanation:
A Keyspace in Cassandra is the outermost container for data, similar to a database or schema in relational databases.
Incorrect! Try again.
9Which component of the primary key is responsible for determining the specific node where a row's data is stored?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Easy
A.Super column
B.Partition key
C.Clustering column
D.Secondary index
Correct Answer: Partition key
Explanation:
The Partition key determines the distribution of data across the cluster by dictating which node receives the data partition.
Incorrect! Try again.
10What is the primary function of clustering columns in Cassandra's data model?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Easy
A.To determine the node location
B.To create a secondary index
C.To encrypt the data
D.To sort data within a specific partition
Correct Answer: To sort data within a specific partition
Explanation:
Clustering columns dictate the on-disk sorting order of rows within a single partition.
Incorrect! Try again.
11What term is used to describe a partition in Cassandra that contains a very large number of rows/columns?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Easy
A.Fat partitions
B.Wide rows
C.Skinny rows
D.Tall tables
Correct Answer: Wide rows
Explanation:
A 'wide row' refers to a partition in Cassandra that has many clustering rows (or columns) inside it.
Incorrect! Try again.
12Which query language is used by developers to interact with Apache Cassandra?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.SQL (Structured Query Language)
B.Pig Latin
C.HQL (Hive Query Language)
D.CQL (Cassandra Query Language)
Correct Answer: CQL (Cassandra Query Language)
Explanation:
Cassandra Query Language (CQL) is the primary language used to communicate with Apache Cassandra, resembling standard SQL.
Incorrect! Try again.
13Which CQL command is used to add new records into a Cassandra table?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.INSERT
B.UPLOAD
C.ADD
D.PUT
Correct Answer: INSERT
Explanation:
The INSERT INTO command is used in CQL to add new rows of data into a table.
Incorrect! Try again.
14In CQL, what keyword must often be appended to a SELECT statement to query on a column that is not part of the primary key (assuming no secondary index)?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.ENABLE SCAN
B.FORCE FILTER
C.ALLOW FILTERING
D.BYPASS INDEX
Correct Answer: ALLOW FILTERING
Explanation:
Cassandra requires the ALLOW FILTERING clause to run queries that might scan all partitions, which can be highly inefficient and unpredictable.
Incorrect! Try again.
15Which CQL command is used to remove an entire keyspace and all of its contents?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.TRUNCATE KEYSPACE
B.REMOVE KEYSPACE
C.DROP KEYSPACE
D.DELETE KEYSPACE
Correct Answer: DROP KEYSPACE
Explanation:
The DROP KEYSPACE command removes a keyspace, along with all the tables and data contained within it.
Incorrect! Try again.
16What is the purpose of creating an index in Cassandra?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.To increase the replication factor
B.To trigger the compaction process
C.To allow efficient querying on a non-primary key column
D.To sort the partition key
Correct Answer: To allow efficient querying on a non-primary key column
Explanation:
Secondary indexes allow you to query a table using a column that is not part of the primary key.
Incorrect! Try again.
17Which Cassandra administration process merges multiple SSTables into a single new SSTable to reclaim disk space?
Cassandra Administration - compaction
Easy
A.Compaction
B.Replication
C.Flushing
D.Gossip
Correct Answer: Compaction
Explanation:
Compaction is the background process that merges SSTables, removing deleted data (tombstones) and old overwrites to save disk space and improve read speeds.
Incorrect! Try again.
18Because Cassandra uses a masterless architecture, what can be said about the capabilities of every node in the cluster?
A.Read operations are faster than write operations on all nodes.
B.Only one node can handle writes.
C.Every node can accept read and write requests equally.
D.Nodes must request permission from a master node to serve data.
Correct Answer: Every node can accept read and write requests equally.
Explanation:
In a masterless (peer-to-peer) architecture, all nodes are identical and any node can act as a coordinator to handle read and write requests from clients.
Incorrect! Try again.
19Which of the following is a valid native data type in CQL used for storing whole numbers?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.STRING
B.INT
C.VARCHAR
D.DECIMAL
Correct Answer: INT
Explanation:
In CQL, INT is a valid 32-bit signed integer data type. (STRING is not a type in CQL, it uses TEXT or VARCHAR for strings).
Incorrect! Try again.
20Which CQL command modifies an existing table structure, such as adding a new column?
CQL (Cassandra Query Language) – data types, creating keyspaces and tables, insert, update, delete, select, filtering, indexing
Easy
A.ALTER TABLE
B.CHANGE TABLE
C.UPDATE TABLE
D.MODIFY TABLE
Correct Answer: ALTER TABLE
Explanation:
The ALTER TABLE statement is used to alter an existing table's schema, like adding or dropping columns.
Incorrect! Try again.
21In a 5-node Cassandra cluster, node C experiences a sudden hardware failure. How does the Cassandra architecture ensure continued availability of data originally managed by node C?
Cassandra Architecture – peer-to-peer architecture
Medium
A.A master node detects the failure and reassigns node C's tokens to another node.
B.A leader election is triggered among the remaining nodes to elect a replacement for node C.
C.The cluster immediately locks all read and write operations until node C is manually restored.
D.Other nodes containing replicas of node C's data continue to serve read and write requests based on the replication factor.
Correct Answer: Other nodes containing replicas of node C's data continue to serve read and write requests based on the replication factor.
Explanation:
Cassandra uses a masterless, peer-to-peer architecture. If a node fails, there is no master to failover. Instead, requests are automatically routed to other nodes that hold replicas of the data, ensuring high availability.
Incorrect! Try again.
22How does a Cassandra node dynamically discover the current state, load, and availability of all other nodes in a large cluster?
Gossip protocol
Medium
A.By performing continuous DNS lookups on a centralized registry.
B.By querying the central Apache ZooKeeper service for metadata updates.
C.By broadcasting UDP multicast messages to every node in the datacenter simultaneously.
D.By exchanging state information continuously with up to three other nodes every second using the Gossip protocol.
Correct Answer: By exchanging state information continuously with up to three other nodes every second using the Gossip protocol.
Explanation:
The Gossip protocol is a decentralized, peer-to-peer communication mechanism where nodes exchange state information with a few random peers every second, quickly disseminating state changes across the entire cluster.
Incorrect! Try again.
23A keyspace is configured with a Replication Factor (RF) of 3. If a client executes a write operation with a Consistency Level of QUORUM, how many replica nodes must acknowledge the write for it to be successful?
Replication and consistency levels
Medium
A.2
B.1
C.4
D.3
Correct Answer: 2
Explanation:
The formula for QUORUM is . For an RF of 3, the calculation is . Thus, 2 nodes must acknowledge the write.
Incorrect! Try again.
24In a cluster with a Replication Factor of 3, which combination of Read and Write Consistency Levels provides 'Strong Consistency' (ensuring a read always reflects the latest write)?
Replication and consistency levels
Medium
A.Write ONE, Read ONE
B.Write ANY, Read QUORUM
C.Write QUORUM, Read QUORUM
D.Write LOCAL_ONE, Read ONE
Correct Answer: Write QUORUM, Read QUORUM
Explanation:
Strong consistency is achieved when . With QUORUM (2) + QUORUM (2) = 4, which is greater than the RF of 3.
Incorrect! Try again.
25When a client sends a write request to a Cassandra node, what is the exact sequence of internal components involved before acknowledging success to the client?
Read and write paths
Medium
A.It writes to the Memtable, then flushes directly to the SSTable.
B.It writes to the SSTable on disk, then updates the Commit Log.
C.It writes to the Commit Log on disk, then writes to the Memtable in memory.
D.It writes to the Memtable in memory, then asynchronously appends to the Commit Log.
Correct Answer: It writes to the Commit Log on disk, then writes to the Memtable in memory.
Explanation:
To ensure durability and speed, Cassandra first appends the write to the Commit Log (on disk) to survive crashes, and then updates the Memtable (in memory). Once written to both, it acknowledges the write.
Incorrect! Try again.
26During a read operation, if a Cassandra node discovers that data for the same row exists in the Memtable and multiple SSTables with different values, how does it resolve the conflict?
Read and write paths
Medium
A.It merges the data and uses the cell with the most recent timestamp (Last-Write-Wins).
B.It assumes the Memtable always contains the correct data because it is in memory.
C.It raises a version conflict exception to the client.
D.It prompts the Gossip protocol to vote on the correct value.
Correct Answer: It merges the data and uses the cell with the most recent timestamp (Last-Write-Wins).
Explanation:
Cassandra uses a Last-Write-Wins (LWW) conflict resolution mechanism. It compares the timestamps of the fragments found in Memtables and SSTables and returns the data with the highest timestamp.
Incorrect! Try again.
27You are designing a globally distributed application that requires Cassandra to replicate data across two different datacenters (e.g., US-East and EU-West). Which replication strategy should you use when creating the keyspace?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Medium
A.NetworkTopologyStrategy
B.GlobalReplicationStrategy
C.SimpleStrategy
D.DatacenterStrategy
Correct Answer: NetworkTopologyStrategy
Explanation:
NetworkTopologyStrategy is highly recommended for most deployments as it allows you to specify distinct replication factors for multiple datacenters, providing data locality and fault tolerance.
Incorrect! Try again.
28What specifically defines a 'wide row' in Cassandra's data model architecture?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Medium
A.A row that spans across multiple nodes in a cluster to balance the load.
B.A physical row on disk that exceeds 2 GB of storage space.
C.A table that has more than 100 column definitions in its CQL schema.
D.A single database partition that contains multiple logical rows organized by clustering columns.
Correct Answer: A single database partition that contains multiple logical rows organized by clustering columns.
Explanation:
In Cassandra, a 'wide row' refers to a partition (identified by a partition key) that contains multiple rows of data, sorted and uniquely identified by clustering columns.
Incorrect! Try again.
29Given a table created with PRIMARY KEY ((sensor_id, date), time), how is the data distributed across nodes and sorted on disk?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Medium
A.Distributed by time, sorted by sensor_id and date.
B.Distributed by a composite of sensor_id and date, sorted by time.
C.Distributed by sensor_id, sorted by date and time.
D.Distributed sequentially by all three fields combined.
Correct Answer: Distributed by a composite of sensor_id and date, sorted by time.
Explanation:
The inner parentheses define a composite partition key (sensor_id, date). The data is distributed based on the hash of this composite key. The remaining column, time, acts as the clustering column used to sort the data within that partition.
Incorrect! Try again.
30You execute an INSERT statement in CQL, but a row with the exact same primary key already exists in the table. What is the outcome?
CQL (Cassandra Query Language) – insert, update, delete, select
Medium
A.Cassandra creates a secondary partition for the duplicate row.
B.Cassandra silently ignores the query and retains the old data.
C.Cassandra overwrites the existing row with the new values without checking if it existed.
D.Cassandra throws a DuplicateKeyException.
Correct Answer: Cassandra overwrites the existing row with the new values without checking if it existed.
Explanation:
In Cassandra, an INSERT behaves as an 'upsert'. It does not perform a read-before-write to check for existence; it simply writes the new data with a newer timestamp, effectively overwriting previous data.
Incorrect! Try again.
31If you need to store an unordered collection of unique email addresses for a user within a single column, which CQL data type is the most appropriate?
CQL (Cassandra Query Language) – data types
Medium
A.tuple<text>
B.map<text, text>
C.set<text>
D.list<text>
Correct Answer: set<text>
Explanation:
The set collection data type stores a collection of unique elements. Unlike list, a set guarantees uniqueness and does not maintain the order of insertion, making it perfect for unique tags or email addresses.
Incorrect! Try again.
32How does Cassandra internally execute a CQL UPDATE statement?
CQL (Cassandra Query Language) – update
Medium
A.It writes the updated column values with a new timestamp as a new entry, without reading the existing data.
B.It locks the partition across all replica nodes to ensure serializability, then updates the data.
C.It reads the current row, modifies the specified columns, and writes the complete row back to disk.
D.It searches the SSTables for the existing row and modifies it in-place.
Correct Answer: It writes the updated column values with a new timestamp as a new entry, without reading the existing data.
Explanation:
Cassandra avoids reads during write operations to maintain high performance. An UPDATE simply writes a new version of the specified columns with a newer timestamp. Compaction later merges these with the old data.
Incorrect! Try again.
33When a user executes a DELETE query in CQL, how does Cassandra process this request to ensure replicas accurately reflect the deletion?
CQL (Cassandra Query Language) – delete
Medium
A.It flags the node's JVM to perform garbage collection on the specific row memory.
B.It inserts a marker called a 'Tombstone' with a timestamp indicating the data is deleted.
C.It immediately erases the data from the Memtable and SSTables.
D.It replaces the deleted column values with null pointers.
Correct Answer: It inserts a marker called a 'Tombstone' with a timestamp indicating the data is deleted.
Explanation:
Cassandra deletes data by writing a 'Tombstone', which is a marker indicating the deletion of a row or column. This tombstone propagates to replicas during read repairs or compaction to ensure the data is logically deleted across the cluster.
Incorrect! Try again.
34A developer runs a SELECT query with a WHERE clause on a column that is neither a partition key nor indexed. They append ALLOW FILTERING to make it execute. What is the performance impact?
CQL (Cassandra Query Language) – filtering
Medium
A.Cassandra temporarily builds an in-memory index for the duration of the query.
B.The query executes efficiently by utilizing the partition key cache.
C.Cassandra may scan all partitions on all nodes to find the requested data, causing severe performance degradation.
D.The query runs locally on a single node without contacting replicas, sacrificing consistency for speed.
Correct Answer: Cassandra may scan all partitions on all nodes to find the requested data, causing severe performance degradation.
Explanation:
ALLOW FILTERING forces Cassandra to process the query by potentially scanning the entire dataset (full table scan) across all nodes if the partition key is not provided, which is highly inefficient and resource-intensive.
Incorrect! Try again.
35In Cassandra, creating a Secondary Index is generally considered an anti-pattern and highly inefficient in which of the following scenarios?
CQL (Cassandra Query Language) – indexing
Medium
A.When querying an indexed column alongside a specific partition key.
B.When created on a column storing standard static categories like 'department'.
C.When created on a column with extremely high cardinality, such as a user's unique email or UUID.
D.When created on a clustering column to filter within a partition.
Correct Answer: When created on a column with extremely high cardinality, such as a user's unique email or UUID.
Explanation:
Secondary indexes in Cassandra are distributed. Indexing high cardinality columns (many unique values) forces Cassandra to query almost all nodes in the cluster to retrieve the results, leading to severe network overhead and latency.
Incorrect! Try again.
36During the installation and configuration of an Apache Cassandra node, which file is primarily modified to define the cluster_name, listen_address, and seeds list?
Installation of Apache Cassandra
Medium
A.cassandra-env.sh
B.jvm.options
C.cassandra.yaml
D.logback.xml
Correct Answer: cassandra.yaml
Explanation:
The cassandra.yaml file is the main configuration file for Cassandra. It contains essential settings required to form a cluster, such as the cluster name, seed nodes, and network addresses.
Incorrect! Try again.
37What is the primary function of the compaction process in Cassandra administration?
Cassandra Administration - compaction
Medium
A.To merge multiple SSTables into a single new SSTable, purging tombstones and keeping only the most recent data.
B.To force Memtables to flush their in-memory contents to disk immediately.
C.To automatically distribute data partitions evenly across newly added nodes.
D.To compress the network payload during node-to-node Gossip communication.
Correct Answer: To merge multiple SSTables into a single new SSTable, purging tombstones and keeping only the most recent data.
Explanation:
Compaction is a background process that merges multiple fragmented SSTables into new ones, evicting tombstones (expired deleted data) and obsolete versions of overwritten data to reclaim disk space and optimize read performance.
Incorrect! Try again.
38Which compaction strategy is optimized for time-series data workloads where older data is rarely queried or updated, allowing entire SSTables to be dropped when their TTL expires?
TWCS groups data into time windows. It is designed specifically for time-series data because it prevents the compaction of old data with new data, allowing older SSTables to be dropped entirely when their Time-To-Live (TTL) expires.
Incorrect! Try again.
39Assume a table employees is defined with PRIMARY KEY (department, employee_id). Which of the following SELECT queries will execute efficiently without requiring ALLOW FILTERING?
CQL (Cassandra Query Language) – select
Medium
A.SELECT * FROM employees WHERE department = 'Sales' AND employee_id = 1005;
B.SELECT * FROM employees WHERE employee_id = 1005;
C.SELECT * FROM employees WHERE department > 'IT';
D.SELECT * FROM employees WHERE department = 'Sales' ORDER BY department DESC;
Correct Answer: SELECT * FROM employees WHERE department = 'Sales' AND employee_id = 1005;
Explanation:
In Cassandra, you must provide the partition key (department) to execute an efficient read query. Providing both the partition key and the clustering column (employee_id) is valid and highly efficient.
Incorrect! Try again.
40You want to design a table to store temperature readings from sensors. The primary key is ((sensor_id), reading_time). By default, Cassandra sorts clustering columns in ascending order. How can you ensure the newest readings are retrieved first without specifying ORDER BY in your queries?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Medium
A.Declare reading_time DESC inside the PRIMARY KEY definition.
B.Create a secondary index on reading_time with a descending flag.
C.Specify WITH CLUSTERING ORDER BY (reading_time DESC) at the end of the CREATE TABLE statement.
D.It is impossible; sorting must always be handled by the client application.
Correct Answer: Specify WITH CLUSTERING ORDER BY (reading_time DESC) at the end of the CREATE TABLE statement.
Explanation:
Cassandra allows you to define the on-disk sorting order of clustering columns during table creation using the WITH CLUSTERING ORDER BY clause. This optimizes retrieval for queries that always require descending order.
Incorrect! Try again.
41In a 10-node Cassandra cluster using virtual nodes (vnodes) with num_tokens: 256, a single node catastrophically fails and its data must be rebuilt. Which of the following best describes the network traffic pattern during the streaming recovery process?
Cassandra Architecture – peer-to-peer architecture
Hard
A.Streaming is strictly constrained to the nodes residing in the same rack as the replacement node to avoid cross-rack traffic.
B.The cluster elects a temporary master node to coordinate the streaming of exactly 256 token ranges to the new node.
C.A single replica node containing the primary token range will stream all necessary data to the replacement node.
D.The replacement node will stream data simultaneously from all other 9 nodes in the cluster, distributing the load.
Correct Answer: The replacement node will stream data simultaneously from all other 9 nodes in the cluster, distributing the load.
Explanation:
Because virtual nodes (vnodes) randomly distribute 256 small token ranges for a given node across the entire ring, the replicas for those ranges are spread across all other nodes. Thus, the replacement node pulls small chunks of data from many different nodes simultaneously, distributing the rebuild load.
Incorrect! Try again.
42Cassandra utilizes the Phi Accrual Failure Detector as part of its Gossip protocol. How does this algorithm differ from a traditional heartbeat mechanism in determining node failures?
Gossip protocol
Hard
A.It outputs a continuous probability value representing the likelihood that a node has failed, adapting dynamically to network latency and jitter.
B.It mandates that a node is marked offline only if at least a QUORUM of nodes agree on the failure via Paxos.
C.It uses a fixed timeout threshold, but leverages UDP instead of TCP for faster heartbeat detection.
D.It increments a discrete 'generation number' every time a node misses a heartbeat, marking it dead when the number reaches a configured limit.
Correct Answer: It outputs a continuous probability value representing the likelihood that a node has failed, adapting dynamically to network latency and jitter.
Explanation:
The Phi Accrual Failure Detector calculates a continuous value (Phi) that represents the probability of a node failure based on historical heartbeat intervals. This makes it adaptive to varying network conditions (jitter) rather than relying on a rigid, binary timeout.
Incorrect! Try again.
43A cluster uses NetworkTopologyStrategy with two datacenters: DC1 with a Replication Factor (RF) of 3, and DC2 with an RF of 2. An application issues a write with Consistency Level EACH_QUORUM. If one node in DC1 goes offline, what happens to this write request?
Replication and consistency levels
Hard
A.It fails, because EACH_QUORUM requires all nodes in all datacenters to acknowledge the write.
B.It succeeds, because the LOCAL_QUORUM in DC1 is 2, and the 2 remaining nodes can acknowledge, while DC2's LOCAL_QUORUM is 2.
C.It fails, because DC1 cannot reach its LOCAL_QUORUM of 3 nodes.
D.It succeeds only if Hinted Handoff is enabled to temporarily store the missed write for the offline node.
Correct Answer: It succeeds, because the LOCAL_QUORUM in DC1 is 2, and the 2 remaining nodes can acknowledge, while DC2's LOCAL_QUORUM is 2.
Explanation:
EACH_QUORUM requires a quorum of nodes in every datacenter to acknowledge the write. Quorum is calculated as (RF / 2) + 1. For DC1 (RF=3), Quorum is 2. For DC2 (RF=2), Quorum is 2. Since DC1 still has 2 nodes online, it meets the required quorum.
Incorrect! Try again.
44During a write operation, power is lost to a Cassandra node immediately after an insert is written to the CommitLog but before the Memtable is flushed to an SSTable. What happens to this data when the node is restarted?
Read and write paths
Hard
A.During startup, Cassandra replays the CommitLog to reconstruct the Memtable, ensuring no data loss.
B.The data is recovered by reading the Bloom filter and generating a synthetic Memtable entry.
C.Cassandra automatically requests the missing data from the coordinator node using Hinted Handoff.
D.The data is permanently lost and must be recovered via an anti-entropy repair from replica nodes.
Correct Answer: During startup, Cassandra replays the CommitLog to reconstruct the Memtable, ensuring no data loss.
Explanation:
The primary purpose of the CommitLog is crash recovery. When a node restarts, it automatically replays all mutations in the CommitLog that were not yet flushed to SSTables, thereby rebuilding the Memtable exactly as it was before the crash.
Incorrect! Try again.
45Consider the read path in Cassandra. If an application queries a specific partition key, and the Bloom filter for a given SSTable returns a 'positive' result, but the data is actually absent from that SSTable (a false positive), what sequence of events occurs next?
Read and write paths
Hard
A.Cassandra throws a ReadTimeoutException because the false positive causes an infinite loop in the Compression Offset map.
B.Cassandra checks the Key Cache (if enabled), reads the Partition Summary, scans the Partition Index, seeks to the data block on disk, discovers the data is missing, and returns no data from this SSTable.
C.Cassandra bypasses the Key Cache, reads the Partition Summary, locates the data in the Partition Index, reads the data block, and throws a TombstoneOverwhelmingException.
D.Cassandra updates the Bloom filter to correct the false positive and immediately initiates a read repair.
Correct Answer: Cassandra checks the Key Cache (if enabled), reads the Partition Summary, scans the Partition Index, seeks to the data block on disk, discovers the data is missing, and returns no data from this SSTable.
Explanation:
A Bloom filter can yield false positives. When this happens, Cassandra continues the read path (Key Cache -> Partition Summary -> Partition Index -> disk seek) only to find no matching row on disk. It handles this gracefully by simply returning no data from that specific SSTable, albeit incurring an unnecessary disk I/O penalty.
Incorrect! Try again.
46Given the table schema: CREATE TABLE sensor_data (sensor_id uuid, year int, month int, ts timestamp, value double, PRIMARY KEY ((sensor_id, year, month), ts)). Which of the following queries will FAIL to execute without ALLOW FILTERING?
Cassandra Data Model – primary key, partition key, clustering columns
Hard
A.SELECT * FROM sensor_data WHERE sensor_id = 123 AND year = 2023 AND month = 10 AND ts > '2023-10-01';
B.SELECT * FROM sensor_data WHERE sensor_id = 123 AND year = 2023 AND month = 10;
C.SELECT * FROM sensor_data WHERE sensor_id = 123 AND year = 2023 AND month = 10 ORDER BY ts DESC;
D.SELECT * FROM sensor_data WHERE sensor_id = 123 AND year = 2023;
Correct Answer: SELECT * FROM sensor_data WHERE sensor_id = 123 AND year = 2023;
Explanation:
The partition key is a composite key consisting of (sensor_id, year, month). Cassandra requires the exact and complete partition key for a targeted read. Omitting month means Cassandra does not know which nodes hold the data, requiring a full cluster scan (ALLOW FILTERING).
Incorrect! Try again.
47A developer heavily deletes individual clustering rows within a very large partition (a 'wide row'). Sometime later, read queries on this partition begin failing with TombstoneOverwhelmingException. What is the architectural reason for this failure?
Cassandra Data Model – wide rows
Hard
A.Tombstones act as secondary indexes, and exceeding the threshold causes index corruption in wide rows.
B.Cassandra keeps all tombstones in the Memtable indefinitely, causing an OutOfMemoryError.
C.The Bloom filter size exceeds the JVM heap limit because it must index every deleted cell.
D.To satisfy the read, Cassandra must scan and keep in memory all tombstones up to tombstone_failure_threshold to filter out deleted data, which protects the node from memory exhaustion.
Correct Answer: To satisfy the read, Cassandra must scan and keep in memory all tombstones up to tombstone_failure_threshold to filter out deleted data, which protects the node from memory exhaustion.
Explanation:
When rows are deleted, Cassandra writes a tombstone. During a read, Cassandra must scan these tombstones to ensure deleted data is not returned. If a query scans more tombstones than the tombstone_failure_threshold (default 100,000), it aborts the query to prevent node memory exhaustion and GC pauses.
Incorrect! Try again.
48A table contains a column defined as frozen<map<text, text>>. How does the frozen keyword modify the behavior of this collection compared to a standard (non-frozen) collection?
CQL (Cassandra Query Language) – data types
Hard
A.It allows the map to be updated concurrently by multiple clients using lightweight transactions (LWT).
B.It prevents the collection from generating tombstones when the row is deleted, bypassing garbage collection pauses.
C.It forces the collection to be stored in the Partition Cache rather than on disk, improving read performance for small maps.
D.It serializes the entire collection into a single, immutable binary value, meaning elements cannot be added or updated individually; the whole map must be overwritten.
Correct Answer: It serializes the entire collection into a single, immutable binary value, meaning elements cannot be added or updated individually; the whole map must be overwritten.
Explanation:
The frozen keyword treats a collection (or UDT) as a single, immutable serialized value. You cannot update, append, or delete individual elements within a frozen collection; you must overwrite the entire collection value.
Incorrect! Try again.
49When creating a keyspace, why is NetworkTopologyStrategy generally preferred over SimpleStrategy even for a single-datacenter deployment at the beginning of a project?
CQL (Cassandra Query Language) – creating keyspaces and tables
Hard
A.NetworkTopologyStrategy automatically provisions vnodes, whereas SimpleStrategy restricts the cluster to single-token architecture.
B.SimpleStrategy distributes replicas on the same physical rack, risking simultaneous failure, while NetworkTopologyStrategy inherently forces rack-awareness.
C.SimpleStrategy places replicas contiguously on the token ring ignoring topology. If a second datacenter is ever added, migrating from SimpleStrategy to NetworkTopologyStrategy requires extensive downtime and manual intervention.
D.SimpleStrategy uses a fixed Gossip interval which causes network flooding, whereas NetworkTopologyStrategy dynamically adjusts Gossip frequency.
Correct Answer: SimpleStrategy places replicas contiguously on the token ring ignoring topology. If a second datacenter is ever added, migrating from SimpleStrategy to NetworkTopologyStrategy requires extensive downtime and manual intervention.
Explanation:
SimpleStrategy is purely based on token ring position and has no concept of datacenters or racks. Altering a keyspace from SimpleStrategy to NetworkTopologyStrategy later to support multi-DC setups can be complex and risky. Starting with NetworkTopologyStrategy allows seamless expansion to multiple datacenters.
Incorrect! Try again.
50A developer executes the following CQL statement: UPDATE users SET email = 'new@test.com' WHERE user_id = 999 IF email = 'old@test.com'; Which of the following describes the internal protocol used to execute this statement?
CQL (Cassandra Query Language) – insert, update, delete
Hard
A.Two-Phase Commit (2PC), locking the row across all replicas before applying the update.
B.Anti-entropy repair, triggering an immediate Merkle tree comparison to ensure the previous email value is consistent.
C.Hinted Handoff, utilizing a background queue to ensure the update is applied eventually.
D.Paxos consensus protocol, requiring four round-trips (Prepare/Promise, Read/Results, Propose/Accept, Commit/Ack) to ensure linearizability.
Correct Answer: Paxos consensus protocol, requiring four round-trips (Prepare/Promise, Read/Results, Propose/Accept, Commit/Ack) to ensure linearizability.
Explanation:
The IF condition in CQL triggers a Lightweight Transaction (LWT). Cassandra implements LWTs using the Paxos consensus protocol, which provides linearizable consistency at the cost of multiple network round-trips (typically 4 phases), making it significantly slower than a standard write.
Incorrect! Try again.
51Which of the following scenarios is the most inappropriate use case for a native Cassandra Secondary Index, leading to severe performance degradation (a 'scatter-gather' problem)?
CQL (Cassandra Query Language) – select, filtering, indexing
Hard
A.Indexing a clustering column to filter within a specific partition key.
B.Indexing a highly unique column (e.g., user_email) in a cluster with hundreds of nodes without providing the partition key in the query.
C.Indexing a column with very low cardinality (e.g., a boolean is_active flag) across a small cluster.
D.Indexing a frozen map to query exact matches of the entire map payload.
Correct Answer: Indexing a highly unique column (e.g., user_email) in a cluster with hundreds of nodes without providing the partition key in the query.
Explanation:
Secondary indexes in Cassandra are distributed locally on each node. If you query a high-cardinality index without the partition key, the coordinator must send a request to every node in the cluster (scatter-gather) and wait for all responses. This scales extremely poorly in large clusters.
Incorrect! Try again.
52A team configures TimeWindowCompactionStrategy (TWCS) for a time-series table holding application logs. They occasionally receive logs that are timestamped 30 days in the past (late-arriving data). How will this late-arriving data impact the compaction strategy?
Cassandra Administration - compaction
Hard
A.TWCS will automatically update the timestamps of the late-arriving logs to the current time to maintain strict chronological SSTables.
B.TWCS will dynamically expand the current time window by 30 days to encompass the late-arriving data, triggering a massive major compaction.
C.The data will be rejected by Cassandra because TWCS enforces a strict monotonic write pattern.
D.It will force the creation of new SSTables in older time windows, which may never be compacted with the original SSTables for that period, reducing read performance and breaking TTL whole-file drop efficiency.
Correct Answer: It will force the creation of new SSTables in older time windows, which may never be compacted with the original SSTables for that period, reducing read performance and breaking TTL whole-file drop efficiency.
Explanation:
TWCS expects chronologically ordered writes. Late-arriving data writes to an old time window. Because the old window has likely already been compacted, Cassandra creates a new, small SSTable for that old window. This fragments data for that period and, if it contains varying TTLs, prevents Cassandra from efficiently dropping the entire SSTable when data expires.
Incorrect! Try again.
53When tuning the JVM for an Apache Cassandra 4.x node with 128GB of RAM, what is the primary architectural rationale for using G1GC (Garbage-First Garbage Collector) rather than CMS (Concurrent Mark Sweep)?
Installation of Apache Cassandra
Hard
A.G1GC is designed to handle larger heap sizes (e.g., 31GB) by compacting memory in regions, virtually eliminating the long Stop-The-World (STW) fragmentation pauses that plague CMS on large heaps.
B.G1GC dynamically resizes the Memtable to match heap usage, whereas CMS uses static off-heap allocation.
C.CMS inherently limits the heap size to 8GB, whereas G1GC supports heaps up to 1TB.
D.G1GC intercepts Gossip protocol messages directly in kernel space, reducing CPU overhead during garbage collection.
Correct Answer: G1GC is designed to handle larger heap sizes (e.g., 31GB) by compacting memory in regions, virtually eliminating the long Stop-The-World (STW) fragmentation pauses that plague CMS on large heaps.
Explanation:
CMS suffers from memory fragmentation over time, which eventually requires a full, Stop-The-World (STW) garbage collection cycle that can cause a node to pause for several seconds (often triggering false failure detections). G1GC avoids this by operating on smaller memory regions and compacting them continuously, allowing larger heaps (like ~31GB) with predictable pause times.
Incorrect! Try again.
54A cluster uses a Replication Factor of 3. Node A goes down for 4 hours. The max_hint_window_in_ms is set to 3 hours. When Node A comes back online, what is the state of the writes that occurred during its downtime, and what action must be taken?
Read and write paths
Hard
A.Node A receives the first 3 hours of hints via Hinted Handoff, but misses the 4th hour. An anti-entropy repair (e.g., nodetool repair) must be run to synchronize the missing data.
B.Node A has missed 4 hours of writes. The coordinator automatically replays all hints from the 4-hour window.
C.Node A uses the Gossip protocol to pull the missing 4 hours of CommitLogs directly from Node B and Node C.
D.Node A receives no hints because Hinted Handoff is disabled the moment a node exceeds the max window. The data is permanently lost.
Correct Answer: Node A receives the first 3 hours of hints via Hinted Handoff, but misses the 4th hour. An anti-entropy repair (e.g., nodetool repair) must be run to synchronize the missing data.
Explanation:
Coordinators store hints for a downed node only up to max_hint_window_in_ms (default 3 hours). Writes occurring after 3 hours are not saved as hints to prevent the coordinator's disk from filling up. Thus, when Node A returns, it gets the 3 hours of hints, but a manual or scheduled anti-entropy repair is required to fetch the data missed during the 4th hour.
Incorrect! Try again.
55A team models IoT data with PRIMARY KEY (device_id, timestamp). Over several years, certain devices produce millions of readings, resulting in massive unbounded partitions. Which data modeling technique best resolves this 'wide row' anti-pattern while maintaining efficient reads?
Cassandra Data Model – keyspace, table, primary key, partition key, clustering columns, wide rows
Hard
A.Changing the primary key to PRIMARY KEY (timestamp, device_id) to distribute data evenly across all days.
B.Using ALLOW FILTERING on all read queries so the coordinator can manage the memory payload dynamically.
C.Implementing 'bucketing' by introducing a time-based artificial column (e.g., month_year) into the partition key, making it PRIMARY KEY ((device_id, month_year), timestamp).
D.Adding a high-cardinality secondary index on the timestamp column.
Correct Answer: Implementing 'bucketing' by introducing a time-based artificial column (e.g., month_year) into the partition key, making it PRIMARY KEY ((device_id, month_year), timestamp).
Explanation:
Bucketing splits a continuously growing partition into smaller, bounded partitions (e.g., one partition per device per month). This prevents single partitions from exceeding size limits (wide rows) and avoids compaction and memory issues, while keeping queries localized to specific partitions when the time bucket is known.
Incorrect! Try again.
56For a workload characterized by overwhelming write volume (e.g., heavy inserts, rare updates/deletes) and relatively few reads, which compaction strategy minimizes write amplification and CPU overhead?
Cassandra Administration - compaction
Hard
A.TimeWindowCompactionStrategy (TWCS) with an infinite window size
SizeTieredCompactionStrategy (STCS) triggers compaction when several SSTables of similar size accumulate. It merges them into one larger SSTable. It is highly optimized for write-heavy workloads because it has lower write amplification and CPU overhead compared to LeveledCompactionStrategy (LCS), which strictly maintains non-overlapping data.
Incorrect! Try again.
57Which of the following describes the difference in how Cassandra handles COUNTER columns compared to standard integer columns during an update?
CQL (Cassandra Query Language) – data types
Hard
A.COUNTER columns bypass the Memtable and write directly to the SSTable to prevent concurrent modification exceptions.
B.COUNTER columns cannot be used in tables alongside non-counter columns (other than primary keys).
C.COUNTER values are synchronized across datacenters using Paxos, making them highly consistent.
D.COUNTER updates require a read-before-write to fetch the current value, meaning they incur a higher latency penalty than standard idempotent writes.
Correct Answer: COUNTER columns cannot be used in tables alongside non-counter columns (other than primary keys).
Explanation:
In Cassandra, a table that contains COUNTER columns can only contain primary key columns and other COUNTER columns. Standard columns and counters cannot be mixed in the same table due to the distinct, non-idempotent internal mechanisms (Counter Mutations) required to manage counter values.
Incorrect! Try again.
58In a Cassandra cluster utilizing a token ring topology, what specific mechanism ensures that a hot spot does not form if one physical node has significantly more disk space and CPU capacity than the others?
Cassandra Architecture – peer-to-peer architecture
Hard
A.Assigning a higher number of vnodes (tokens) to the more powerful node in the cassandra.yaml configuration.
B.Using LeveledCompactionStrategy specifically on the stronger node to dynamically allocate more data.
C.Applying the WeightingSnitch to route more reads to the node with lower CPU utilization.
D.Configuring a dedicated load-balancer proxy in front of the Gossip protocol.
Correct Answer: Assigning a higher number of vnodes (tokens) to the more powerful node in the cassandra.yaml configuration.
Explanation:
By adjusting the num_tokens setting in cassandra.yaml, a more powerful node can be assigned more virtual nodes (vnodes). This gives the node a proportionally larger share of the token ring, causing it to hold more data and serve more requests, effectively balancing the load across heterogeneous hardware.
Incorrect! Try again.
59If a developer executes an UPDATE statement in CQL on a row where the primary key does not currently exist in the database, what is the result?
CQL (Cassandra Query Language) – insert, update, delete
Hard
A.Cassandra throws an InvalidQueryException indicating the row was not found.
B.Cassandra generates a tombstone for the non-existent row and increments the mutation clock.
C.Cassandra applies the update, effectively performing an 'upsert', resulting in a new row being created.
D.Cassandra waits for the row to be created via Hinted Handoff, blocking the query.
Correct Answer: Cassandra applies the update, effectively performing an 'upsert', resulting in a new row being created.
Explanation:
In Cassandra, INSERT and UPDATE are functionally identical under the hood. They are both 'upserts'. Executing an UPDATE on a non-existent primary key will simply create the new row with the provided column values.
Incorrect! Try again.
60An application requires strict linearizability (strong consistency) for critical read and write operations across a multi-DC cluster. Which combination of Consistency Levels should be used to achieve this without relying on Paxos (Lightweight Transactions)?
Replication and consistency levels
Hard
A.Write at QUORUM, Read at LOCAL_QUORUM
B.Write at QUORUM, Read at QUORUM
C.Write at LOCAL_QUORUM, Read at LOCAL_QUORUM
D.Write at EACH_QUORUM, Read at LOCAL_QUORUM
Correct Answer: Write at QUORUM, Read at QUORUM
Explanation:
Strong consistency in Cassandra is guaranteed when the Write Consistency Level + Read Consistency Level > Replication Factor (RF). Using QUORUM for both reads and writes ensures that the read and write sets overlap across the entire cluster, guaranteeing that a read will always see the most recent write.