1MongoDB is a popular example of which type of NoSQL database?
Introduction of MongoDB
Easy
A.Document-oriented
B.Column-family
C.Key-value store
D.Graph database
Correct Answer: Document-oriented
Explanation:
MongoDB stores data in flexible, JSON-like documents with dynamic schemas, which is the defining characteristic of a document-oriented database.
Incorrect! Try again.
2Which of the following best describes a key difference between SQL and NoSQL databases regarding data structure?
SQL vs NoSql
Easy
A.SQL databases use a dynamic schema, while NoSQL databases use a rigid schema.
B.Both SQL and NoSQL databases always require a rigid, predefined schema.
C.SQL databases typically require a predefined schema, while NoSQL databases are often schema-less or have a dynamic schema.
D.SQL is for unstructured data, while NoSQL is for structured data.
Correct Answer: SQL databases typically require a predefined schema, while NoSQL databases are often schema-less or have a dynamic schema.
Explanation:
Relational (SQL) databases enforce a strict schema at the table level, whereas NoSQL databases offer flexibility, allowing documents or items in the same collection to have different structures.
Incorrect! Try again.
3In MongoDB, what is the term for a group of documents, which is analogous to a table in a relational database?
Structure of MongoDB
Easy
A.Database
B.Recordset
C.Table
D.Collection
Correct Answer: Collection
Explanation:
A collection in MongoDB is a grouping of MongoDB documents. It is the equivalent of a table in a relational database system.
Incorrect! Try again.
4What does JSON stand for?
JSON databases
Easy
A.Java Standard Object Notation
B.JavaScript Ordered Network
C.Java-Styled Object Naming
D.JavaScript Object Notation
Correct Answer: JavaScript Object Notation
Explanation:
JSON is an acronym for JavaScript Object Notation. It's a lightweight, text-based data-interchange format.
Incorrect! Try again.
5Amazon DynamoDB is a fully managed NoSQL database service offered by which cloud provider?
DynamoDB
Easy
A.Google Cloud
B.Amazon Web Services (AWS)
C.Microsoft Azure
D.IBM Cloud
Correct Answer: Amazon Web Services (AWS)
Explanation:
DynamoDB is a key-value and document database service that is part of the Amazon Web Services (AWS) portfolio.
Incorrect! Try again.
6Which command is used to query or find documents in a MongoDB collection?
Working with MongoDB
Easy
A.SELECT * FROM collection;
B.show documents in collection;
C.db.collection.find()
D.db.collection.get()
Correct Answer: db.collection.find()
Explanation:
The find() method is the primary way to query a collection in MongoDB. Calling it with no arguments, like db.mycollection.find(), retrieves all documents.
Incorrect! Try again.
7What is a major advantage of using a serverless database?
Serverless cloud database
Easy
A.It only supports the SQL query language.
B.It offers a fixed capacity that never changes.
C.It has a pay-for-what-you-use pricing model and handles scaling automatically.
D.It requires developers to manually provision and scale servers.
Correct Answer: It has a pay-for-what-you-use pricing model and handles scaling automatically.
Explanation:
The core benefit of 'serverless' is abstracting away server management. Users pay for the actual usage, and the cloud provider manages all the infrastructure provisioning and scaling.
Incorrect! Try again.
8What is the primary purpose of creating an index on a field in a database?
Index creation & performance comparison using EXPLAIN
Easy
A.To increase the physical storage size of the database.
B.To make write operations slower and more secure.
C.To encrypt the data within the collection.
D.To speed up the performance of read queries.
Correct Answer: To speed up the performance of read queries.
Explanation:
Indexes are special lookup tables that the database search engine can use to dramatically speed up the time it takes to find records in a query.
Incorrect! Try again.
9Vector databases are specifically designed to store and query what kind of data?
Vector databases are purpose-built to handle vector embeddings, which are numerical representations of data used in AI/ML applications for tasks like similarity search.
Incorrect! Try again.
10Which of the following is a valid JSON representation for a product with a name "Keyboard" and a price of 75?
Valid JSON syntax requires keys to be enclosed in double quotes ("), followed by a colon, and then the value. The entire object is enclosed in curly braces ({}).
Incorrect! Try again.
11The acronym "NoSQL" is most commonly interpreted as:
SQL vs NoSql
Easy
A."Not Only SQL"
B."Non-Standard Query Language"
C."New SQL"
D."No SQL Allowed"
Correct Answer: "Not Only SQL"
Explanation:
"Not Only SQL" reflects that these databases may support SQL-like query languages but are not limited to the relational model and often provide more flexibility.
Incorrect! Try again.
12In MongoDB, what is the basic unit of data that is analogous to a row in a relational database?
Structure of MongoDB
Easy
A.An Index
B.A Field
C.A Document
D.A Schema
Correct Answer: A Document
Explanation:
A document is a single record in a MongoDB collection. It is a data structure composed of field and value pairs, similar in structure to JSON objects.
Incorrect! Try again.
13MongoDB stores data in a binary-encoded JSON format. What is this format called?
Introduction of MongoDB
Easy
A.MsgPack
B.BSON
C.XML
D.YML
Correct Answer: BSON
Explanation:
BSON, which stands for Binary JSON, is the binary-encoded serialization format MongoDB uses to store documents. It extends JSON with additional data types.
Incorrect! Try again.
14What is the primary data model for Amazon DynamoDB?
DynamoDB
Easy
A.Time-series
B.Key-value and Document
C.Graph-based
D.Relational
Correct Answer: Key-value and Document
Explanation:
DynamoDB is a NoSQL database that supports both key-value and document data structures, giving it flexibility for various use cases.
Incorrect! Try again.
15Which of the following is a good example of a serverless database?
Serverless cloud database
Easy
A.A self-hosted MySQL server on a physical machine.
B.A local SQLite file.
C.Amazon DynamoDB.
D.Microsoft Excel.
Correct Answer: Amazon DynamoDB.
Explanation:
Amazon DynamoDB is a prime example of a serverless database where the cloud provider manages all the infrastructure, and users interact with it via APIs.
Incorrect! Try again.
16To add a single new employee document to an employees collection in MongoDB, which method is typically used?
Working with MongoDB
Easy
A.db.employees.insertOne()
B.db.employees.saveNew()
C.db.employees.addOne()
D.db.employees.create()
Correct Answer: db.employees.insertOne()
Explanation:
The insertOne() command is the standard method for inserting a single document into a MongoDB collection.
Incorrect! Try again.
17In JSON, what structure is used to represent an ordered list of values?
JSON databases
Easy
A.A String ("")
B.A Tuple (())
C.An Object ({})
D.An Array ([])
Correct Answer: An Array ([])
Explanation:
An array in JSON is an ordered collection of values, enclosed in square brackets [], with values separated by commas.
Incorrect! Try again.
18What is the function of the EXPLAIN command in a database like MongoDB?
Index creation & performance comparison using EXPLAIN
Easy
A.It returns information on the query plan, showing how the query will be executed.
B.It explains the data types used in the collection.
C.It executes the query and returns the results.
D.It provides a plain-English explanation of the database's purpose.
Correct Answer: It returns information on the query plan, showing how the query will be executed.
Explanation:
The EXPLAIN command is a powerful tool for performance analysis. It provides a report on the query execution plan, including which indexes were used (if any).
Incorrect! Try again.
19A common use case for vector databases in AI applications is:
Vector Databases
Easy
A.Storing user session data.
B.Standard transactional processing.
C.Logging server errors.
D.Performing similarity searches for recommendation engines.
Correct Answer: Performing similarity searches for recommendation engines.
Explanation:
Vector databases excel at finding items that are 'similar' based on their vector representations. This is fundamental for recommendation systems, image search, and semantic text search.
Incorrect! Try again.
20Which type of database scaling involves increasing the resources (CPU, RAM) of a single server?
SQL vs NoSql
Easy
A.Horizontal Scaling (scaling out)
B.Parallel Scaling
C.Vertical Scaling (scaling up)
D.Diagonal Scaling
Correct Answer: Vertical Scaling (scaling up)
Explanation:
Vertical scaling, or scaling up, means adding more resources like CPU or RAM to an existing server. Traditional SQL databases often scale vertically, while NoSQL databases are typically designed to scale horizontally.
Incorrect! Try again.
21A social media application needs to store user-generated content which has a highly variable structure (e.g., text posts, video posts with different metadata, polls). Which database model is more advantageous and why?
SQL vs NoSql
Medium
A.NoSQL, because it only supports key-value storage which is simple for all post types.
B.NoSQL, because its flexible/dynamic schema allows for storing documents with different structures in the same collection.
C.SQL, because its rigid schema ensures data integrity and consistency for all post types.
D.SQL, because JOIN operations are highly optimized for retrieving varied data types.
Correct Answer: NoSQL, because its flexible/dynamic schema allows for storing documents with different structures in the same collection.
Explanation:
NoSQL document databases like MongoDB excel in scenarios with evolving or varied data structures. A flexible schema means you don't have to define all possible columns beforehand, making it easy to store diverse content types without needing complex table structures or frequent schema alterations.
Incorrect! Try again.
22MongoDB stores data in BSON format, which is a binary representation of JSON. What is a key advantage of using BSON over plain JSON for a database's internal storage format?
Structure of MongoDB
Medium
A.BSON is human-readable, making it easier to debug data directly on disk.
B.BSON enforces a strict schema for all documents in a collection.
C.BSON supports additional data types beyond JSON's string, number, boolean, array, and object (e.g., Date, ObjectId, binary data).
D.BSON is a text-based format, which reduces parsing overhead compared to binary formats.
Correct Answer: BSON supports additional data types beyond JSON's string, number, boolean, array, and object (e.g., Date, ObjectId, binary data).
Explanation:
While BSON is also designed for efficient traversal and speed, its primary advantage over plain JSON is the extended set of data types it supports. This allows for richer data representation within the database, such as native handling of dates and unique identifiers (ObjectId), which is crucial for database operations.
Incorrect! Try again.
23Given a students collection with documents like { "name": "John Doe", "major": "Computer Science", "grades": [85, 92, 78] }, which query finds all students majoring in 'Computer Science' who have at least one grade greater than 90?
Working with MongoDB
Medium
A.db.students.find({ "major": "Computer Science" AND "grades" > 90 })
In MongoDB, when a query operator like $gt is applied to an array field, it checks if at least one element in the array satisfies the condition. The query implicitly combines the conditions on major and grades with an AND logic, making this the most direct and correct way to express the requirement.
Incorrect! Try again.
24You execute db.collection.explain('executionStats').find({ 'status': 'A' }) on a large collection. The output's winningPlan shows a stage of COLLSCAN and totalDocsExamined is equal to the total number of documents in the collection. What is the most logical first step to optimize this query?
Index creation & performance comparison using EXPLAIN
Medium
A.Rewrite the query to use the aggregation framework.
B.Shard the collection across multiple servers.
C.Create a single-field index on the status field.
D.Increase the RAM on the database server.
Correct Answer: Create a single-field index on the status field.
Explanation:
A COLLSCAN stage indicates that MongoDB had to perform a collection scan, reading every document to find the ones matching the filter. This is highly inefficient. Creating an index on the status field will allow the database to quickly locate the relevant documents without scanning the entire collection, drastically reducing totalDocsExamined and improving performance.
Incorrect! Try again.
25An application requires read operations that always return the most recently completed write value. When querying a DynamoDB table, which type of read operation must be specified to guarantee this behavior, and what is the trade-off?
DynamoDB
Medium
A.A Transactional Read, which is only used for ACID-compliant operations.
B.A Strongly Consistent Read, which may have higher latency and cost more read capacity units.
C.A Global Secondary Index read, which is always strongly consistent.
D.An Eventually Consistent Read, which has lower latency but might return stale data.
Correct Answer: A Strongly Consistent Read, which may have higher latency and cost more read capacity units.
Explanation:
DynamoDB replicates data across multiple facilities. An Eventually Consistent Read (the default) might return a value before the write has propagated everywhere, resulting in stale data. A Strongly Consistent Read ensures the response reflects all successful prior writes, but at the cost of consuming double the read capacity units (RCUs) and potentially having higher latency.
Incorrect! Try again.
26What core characteristic of a serverless database like Amazon DynamoDB or Google Firestore makes it particularly suitable for a startup building a new application with unpredictable user adoption rates?
Serverless cloud database
Medium
A.Automatic and seamless scaling of throughput and storage without manual intervention.
B.The requirement to run on a specific operating system.
C.Fixed monthly pricing, which simplifies budget planning.
D.Support for standard SQL query language.
Correct Answer: Automatic and seamless scaling of throughput and storage without manual intervention.
Explanation:
Serverless databases abstract away the underlying infrastructure. This means developers don't need to provision servers or predict capacity. The database automatically scales to handle the load, whether it's a few requests per hour or thousands per second. This is ideal for new applications where traffic patterns are unknown.
Incorrect! Try again.
27You are modeling a dataset of 'Orders' and 'LineItems' in a JSON document store. Each order can have multiple line items. If the primary use case is to always retrieve an order and all its line items together, which JSON structure is most efficient for read operations?
JSON representation of part of the dataset
Medium
A.A single 'Order' document with a nested array field called lineItems containing the item objects.
B.A single collection where some documents are orders and others are line items.
C.Separate 'Order' and 'LineItem' collections, with lineItems documents containing an order_id reference.
D.A single 'Order' document with a string field containing comma-separated line item IDs.
Correct Answer: A single 'Order' document with a nested array field called lineItems containing the item objects.
Explanation:
This approach is known as embedding or denormalization. By embedding the lineItems array directly within the Order document, all the required information can be fetched in a single database read operation. This avoids the need for application-level joins or subsequent queries, making it highly performant for read-heavy workloads where the related data is always needed together.
Incorrect! Try again.
28A company wants to build a recommendation engine. They use a machine learning model to convert users and items into 300-dimension numerical vectors. What is the primary function of a vector database in this scenario?
Vector Databases
Medium
A.To execute complex business logic using stored procedures on the vector data.
B.To store the raw user and item data in a relational format.
C.To provide ACID transactions for updating user and item vectors.
D.To efficiently store these vectors and perform fast similarity searches (e.g., finding the 'k' nearest item vectors to a given user vector).
Correct Answer: To efficiently store these vectors and perform fast similarity searches (e.g., finding the 'k' nearest item vectors to a given user vector).
Explanation:
Vector databases are purpose-built to handle high-dimensional vector data. Their core strength lies in indexing these vectors using algorithms like HNSW or IVF to perform Approximate Nearest Neighbor (ANN) searches extremely quickly. This allows the recommendation engine to find items that are 'semantically similar' to a user's preferences in real-time.
Incorrect! Try again.
29MongoDB's architecture is designed for horizontal scaling. What is the name of the native process it uses to distribute data across multiple servers or clusters?
Introduction of MongoDB
Medium
A.Clustering
B.Replication
C.Federation
D.Sharding
Correct Answer: Sharding
Explanation:
Sharding is the method of distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations. It partitions data within a collection and distributes it across different servers (shards), allowing the database to scale horizontally by adding more servers.
Incorrect! Try again.
30The term 'impedance mismatch' often comes up when comparing relational databases to application development. How does using a JSON database help mitigate this problem?
JSON databases
Medium
A.It uses a schema that perfectly mirrors the application's object model, preventing any changes.
B.It eliminates the need for an Object-Relational Mapping (ORM) layer because the database's data model (JSON documents) closely matches the object model used in many programming languages.
C.It provides a SQL interface that all developers are familiar with.
D.It enforces strict data types that match object-oriented programming languages.
Correct Answer: It eliminates the need for an Object-Relational Mapping (ORM) layer because the database's data model (JSON documents) closely matches the object model used in many programming languages.
Explanation:
Object-Relational Impedance Mismatch refers to the difficulty of mapping the relational, table-based model of SQL databases to the object-oriented models used in application code. JSON databases reduce this friction because their document structure (nested objects and arrays) is native to languages like JavaScript, Python, etc., often allowing for direct mapping without a complex ORM layer.
Incorrect! Try again.
31When considering the CAP theorem, a NoSQL database that prioritizes Availability and Partition Tolerance (AP) will often sacrifice strong consistency. What model of consistency does such a system typically adopt?
SQL vs NoSql
Medium
A.Immediate Consistency
B.Serializability
C.Eventual Consistency
D.ACID Consistency
Correct Answer: Eventual Consistency
Explanation:
In a distributed system, to remain available during a network partition, a system might allow writes on one partition that haven't yet replicated to another. Eventual consistency is a model that guarantees that, if no new updates are made, all replicas will eventually converge to the same value. This prioritizes availability over the guarantee that every read will return the most recent write.
Incorrect! Try again.
32In the MongoDB Aggregation Framework, you need to filter the documents before they are grouped. Which two stages should you use, and in what order?
Working with MongoDB
Medium
A.A group stage.
B.A match stage.
C.A project stage.
D.A group stage.
Correct Answer: A group stage.
Explanation:
The aggregation pipeline processes documents in sequence. To improve performance, it's crucial to filter out unnecessary documents as early as possible. The group reduces the amount of data that needs to be processed by the more intensive grouping operation.
Incorrect! Try again.
33A query on your logs collection filters by timestamp and then sorts by level. Which compound index would provide the most benefit for this query: db.logs.find({ timestamp: { $gte: ISODate(...) } }).sort({ level: 1 })?
Index creation & performance comparison using EXPLAIN
Medium
A.A text index on all fields
B.{ "timestamp": 1, "level": 1 }
C.{ "level": 1, "timestamp": 1 }
D.A single index on timestamp and another on level
Correct Answer: { "timestamp": 1, "level": 1 }
Explanation:
The optimal compound index follows the 'Equality, Sort, Range' (ESR) rule. Since the query uses a range filter ($gte) on timestamp and then sorts by level, the index should list the range field first, followed by the sort field. This allows MongoDB to use the index to both efficiently find the documents in the date range and retrieve them in the pre-sorted order.
Incorrect! Try again.
34In a DynamoDB table, you have a partition key userId and a sort key orderId. How does this combination affect data storage and querying capabilities?
DynamoDB
Medium
A.It ensures that every orderId is unique across all users in the table.
B.It allows you to efficiently retrieve all orders for a specific user, sorted by orderId.
C.It prevents you from querying by userId alone.
D.It stores all items with the same userId on different physical partitions.
Correct Answer: It allows you to efficiently retrieve all orders for a specific user, sorted by orderId.
Explanation:
The partition key (userId) determines the physical partition where the data is stored. Within that partition, items are stored physically sorted by the sort key (orderId). This structure makes it extremely efficient to query for all items with the same partition key and perform range-based operations (e.g., get all orders for user X from last month) on the sort key.
Incorrect! Try again.
35What is the conceptual relationship between a MongoDB 'collection' and a 'document' compared to a traditional SQL database?
Structure of MongoDB
Medium
A.A collection is analogous to a column, and a document is analogous to a schema.
B.A collection is analogous to a row, and a document is analogous to a column.
C.A collection is analogous to a database, and a document is analogous to a table.
D.A collection is analogous to a table, and a document is analogous to a row.
Correct Answer: A collection is analogous to a table, and a document is analogous to a row.
Explanation:
This is the most common analogy. A collection holds a group of related documents, just as a table holds a group of related rows. A single document represents a single record or entity, much like a row does in a SQL table. However, unlike rows in a table, documents in a single collection can have different structures.
Incorrect! Try again.
36When using a vector database for semantic search, the system is not matching keywords directly. What is it actually comparing to determine the similarity between a search query and the documents?
Vector Databases
Medium
A.The geometric distance (e.g., cosine similarity or Euclidean distance) between the query's vector embedding and the documents' vector embeddings.
B.The primary keys of the documents.
C.A full-text index of the documents' content.
D.The number of overlapping keywords between the query and the documents.
Correct Answer: The geometric distance (e.g., cosine similarity or Euclidean distance) between the query's vector embedding and the documents' vector embeddings.
Explanation:
Semantic search works by first converting both the query and the documents into high-dimensional numerical vectors (embeddings) using a language model. The vector database then finds documents whose vectors are 'closest' to the query's vector in that multi-dimensional space. This 'closeness' is measured by a distance metric like cosine similarity, allowing it to find conceptually related results even if they don't share keywords.
Incorrect! Try again.
37A team is using a serverless database with a pay-per-request pricing model. They notice their monthly bill is unexpectedly high. What is a likely cause of this issue, related to application design?
Serverless cloud database
Medium
A.The cloud provider increased the per-request price without notice.
B.The application is performing many small, inefficient read/write operations in a loop instead of batching them.
C.The database is automatically scaling to a larger instance size than needed.
D.They forgot to shut down the database server during off-peak hours.
Correct Answer: The application is performing many small, inefficient read/write operations in a loop instead of batching them.
Explanation:
In a pay-per-request model, every single operation contributes to the bill. An application that is 'chatty'—making numerous individual requests inside a loop to fetch or update data—will incur significantly higher costs than a well-designed application that uses batch operations or retrieves all necessary data in a single, more complex query. Optimizing access patterns is critical for cost management.
Incorrect! Try again.
38When is it more appropriate to use 'referencing' (storing an ID of a related document) instead of 'embedding' in a JSON document model?
JSON representation of part of the dataset
Medium
A.When representing a many-to-many relationship or when the related data is large and not always needed.
B.When the total size of the document is very small.
C.When you want to guarantee the fastest possible read performance for all related data.
D.When representing a one-to-one relationship where the data is always accessed together.
Correct Answer: When representing a many-to-many relationship or when the related data is large and not always needed.
Explanation:
Embedding is great for one-to-many relationships with data that's accessed together. However, for many-to-many relationships (e.g., students and courses), embedding would lead to massive data duplication. Referencing (normalization) is better here. It's also preferred if the sub-documents are very large or frequently updated independently, to avoid hitting document size limits and to reduce the amount of data transferred for the main document.
Incorrect! Try again.
39Besides its flexible schema, what is a key feature of MongoDB's data model that distinguishes it from a simple key-value store?
Introduction of MongoDB
Medium
A.Its lack of any indexing capabilities.
B.Its ability to store only string values.
C.Its support for rich data structures within documents, such as nested objects and arrays.
D.Its strict enforcement of storing only a single key-value pair per entry.
Correct Answer: Its support for rich data structures within documents, such as nested objects and arrays.
Explanation:
A simple key-value store typically holds a single opaque value for each key. MongoDB, as a document store, allows the 'value' (the document) to be a complex, structured object with nested fields and arrays. This enables a much richer and more intuitive way to model real-world data compared to a flat key-value structure.
Incorrect! Try again.
40Which of the following statements accurately describes the schema concept in the context of a typical JSON database like MongoDB?
JSON databases
Medium
A.The schema is dynamic and enforced at the application level, not by the database itself, though schema validation rules can be optionally applied.
B.The database has no concept of a schema whatsoever, and data structure is completely random.
C.A rigid schema must be defined for each collection before any documents can be inserted.
D.The schema is defined using SQL's CREATE TABLE statement.
Correct Answer: The schema is dynamic and enforced at the application level, not by the database itself, though schema validation rules can be optionally applied.
Explanation:
JSON databases are often called 'schemaless', but a more accurate term is 'dynamic schema' or 'flexible schema'. This means the database doesn't require all documents in a collection to have the same structure. While the application code inherently works with a schema, the database itself is flexible. For more control, most JSON databases like MongoDB offer optional schema validation features to enforce certain rules if desired.
Incorrect! Try again.
41In MongoDB, you have a collection events with a compound index { "type": 1, "timestamp": -1 }. You execute the query db.events.find({ "type": "login" }).sort({ "timestamp": -1 }). The explain() output shows a winningPlan with an IXSCAN stage. If you change the query to db.events.find({ "type": "login" }).sort({ "timestamp": 1 }), what is the most likely outcome for the explain() plan and why?
Index creation & performance comparison using EXPLAIN
Hard
A.MongoDB will use the index for the IXSCAN and efficiently read the keys in reverse order, requiring no additional SORT stage.
B.The query will fail because the sort direction does not match the indexed direction.
C.MongoDB will still use the index for the IXSCAN but will add an in-memory SORT stage because the sort order opposes the index key order.
D.MongoDB will perform a COLLSCAN followed by a SORT, as the index cannot be used for sorting in the opposite direction.
Correct Answer: MongoDB will still use the index for the IXSCAN but will add an in-memory SORT stage because the sort order opposes the index key order.
Explanation:
MongoDB can use a multi-key index to satisfy the filter predicate (type: "login"). It can also traverse an index in either forward or reverse order. However, for a compound index, this applies to the entire index key pattern. Since the query only filters on the prefix (type) and then requests a sort on the second key in the opposite direction of the index definition (1 vs -1), MongoDB will use the index to find all matching documents (IXSCAN on type) but will then have to perform an additional, potentially costly, in-memory SORT operation to order the results by timestamp: 1. It does not simply read the index in reverse, as the prefix key must be satisfied first.
Incorrect! Try again.
42You are designing a DynamoDB table for a social media application to store user posts. The primary key is (UserID, PostID). You need to efficiently query for the 10 most recent posts by a specific user and query for the 10 most recent posts globally across all users. What is the most cost-effective and performant indexing strategy?
DynamoDB
Hard
A.Use a Global Secondary Index (GSI) with a partition key of UserID and a sort key of PostTimestamp.
B.Create two GSIs: one with partition key UserID and sort key PostTimestamp, and another with a static partition key like 'all_posts' and sort key PostTimestamp.
C.Use a Local Secondary Index (LSI) with PostTimestamp as the sort key for user-specific queries, and a GSI with a static partition key (e.g., a constant value like 'all_posts') and PostTimestamp as the sort key for global queries.
D.No indexes are needed; perform a Scan operation with a FilterExpression for global posts and a Query for user-specific posts.
Correct Answer: Use a Local Secondary Index (LSI) with PostTimestamp as the sort key for user-specific queries, and a GSI with a static partition key (e.g., a constant value like 'all_posts') and PostTimestamp as the sort key for global queries.
Explanation:
This is a complex design pattern. The base table's primary key (UserID, PostID) already supports efficient user-specific queries if PostID is time-sortable (like a ULID). However, assuming PostID is not a timestamp, an LSI on (UserID, PostTimestamp) is ideal for user-specific queries as it shares the partition key and provides strong consistency. For the global query, a GSI is required. The 'static partition key' (or 'write-sharded GSI') pattern is a common, albeit advanced, technique for this use case. A single partition key like 'all_posts' creates a 'hot partition' but allows you to query all posts sorted by time. This is often combined with write-sharding (e.g., 'all_posts_shard_1', 'all_posts_shard_2') to distribute the write load, but the fundamental pattern is a GSI with a non-unique partition key. Option C is less optimal because the user-specific query could be handled by an LSI, which is often cheaper and offers strong consistency. Option B is incorrect; the base table already handles user queries well if PostID is sortable, but the question implies needing a different sort. The correct option identifies the nuanced use of both an LSI and a GSI with a specific pattern for the global query.
Incorrect! Try again.
43In the context of Approximate Nearest Neighbor (ANN) search in vector databases, which statement correctly analyzes the trade-off between the HNSW (Hierarchical Navigable Small World) and IVFPQ (Inverted File with Product Quantization) indexing methods?
Vector Databases
Hard
A.HNSW offers higher query latency but provides better recall and is more memory-efficient than IVFPQ due to its graph structure.
B.IVFPQ provides perfect recall (100% accuracy) at the cost of higher latency, while HNSW is a purely approximate method.
C.IVFPQ has a faster index build time and lower memory footprint, but its performance degrades significantly in high-dimensional spaces compared to HNSW, and it is less flexible for adding new data points.
D.HNSW is optimized for static datasets, making it difficult to add new vectors without a full re-index, whereas IVFPQ dynamically handles new data points with minimal overhead.
Correct Answer: IVFPQ has a faster index build time and lower memory footprint, but its performance degrades significantly in high-dimensional spaces compared to HNSW, and it is less flexible for adding new data points.
Explanation:
This question requires a deep understanding of two major ANN algorithms. IVFPQ works by partitioning the vector space into cells (Voronoi cells) and then using Product Quantization to compress vectors within those cells. This makes it memory-efficient and fast to build. However, its performance relies on the 'inverted file' lookup, which can be a bottleneck, and adding new data requires re-assigning points to centroids, which is less dynamic than HNSW. HNSW builds a multi-layered graph that is highly effective for searching, especially in high dimensions, and allows for dynamic insertion of new points gracefully. HNSW's index build time is generally longer, and it can consume more memory than a heavily compressed IVFPQ index, but it often provides a better recall-latency trade-off.
Incorrect! Try again.
44A financial services company is building a global transaction processing system. The system must never lose a transaction (high durability) and must reflect the same account balance to users in New York and Tokyo simultaneously (strong consistency). The system must remain available for new transactions even if the network link between New York and Tokyo is temporarily severed. According to the CAP theorem, which of the following statements is the most accurate analysis of this system's requirements?
SQL vs NoSql
Hard
A.The system can be built by prioritizing Availability and Partition Tolerance (AP), and then implementing a reconciliation layer to achieve eventual consistency for account balances.
B.The requirements are fundamentally contradictory. A system cannot simultaneously guarantee Strong Consistency (C) and Availability (A) during a network Partition (P).
C.The requirements are fully achievable with a modern NoSQL database that offers tunable consistency, sacrificing only partition tolerance.
D.The requirements are fully achievable with a traditional SQL database using synchronous replication across continents.
Correct Answer: The requirements are fundamentally contradictory. A system cannot simultaneously guarantee Strong Consistency (C) and Availability (A) during a network Partition (P).
Explanation:
This question applies the CAP theorem to a real-world, complex scenario. The requirements are for Consistency (C - same balance everywhere), Availability (A - available for new transactions), and Partition Tolerance (P - network link severed). The CAP theorem states that a distributed data store can only provide two of these three guarantees simultaneously. The scenario explicitly demands all three, which is impossible. If the network partitions, the system must choose: either become unavailable (sacrificing A) to ensure the two sites don't diverge, or remain available but risk the sites becoming inconsistent (sacrificing C). Therefore, the requirements as stated are a paradox that cannot be solved by any database, SQL or NoSQL. The other options suggest a solution is possible without acknowledging this fundamental trade-off.
Incorrect! Try again.
45You are debugging a slow MongoDB aggregation pipeline. The collection contains 10 million documents. The pipeline is structured as follows:
1. $lookup: Joins with another collection.
2. $unwind: Deconstructs an array field created by the lookup.
3. $match: Filters documents based on a highly selective field.
4. $group: Aggregates the results.
What is the most effective change to optimize this pipeline's performance?
Working with MongoDB
Hard
A.Add a group stage to pre-sort the data for the grouping operation.
B.Replace the $lookup with client-side joins to reduce database server load.
C.Create a compound index on the fields used in the group stages.
D.Move the $match stage to be the first stage in the pipeline.
Correct Answer: Move the $match stage to be the first stage in the pipeline.
Explanation:
The order of stages in a MongoDB aggregation pipeline is critical for performance. The principle is to reduce the number of documents being processed as early as possible. In the given pipeline, the resource-intensive unwind operations are being performed on all 10 million documents. By moving the highly selective lookup, group stages will operate on a much smaller dataset, leading to a dramatic performance improvement. While indexing (Option D) is important, the structural inefficiency of the pipeline is the primary bottleneck here. An index would only help the $match stage, but its benefit is maximized when it runs first.
Incorrect! Try again.
46A company uses a serverless database (like Amazon DynamoDB On-Demand or Azure Cosmos DB Serverless) for an e-commerce application. During a flash sale, the read traffic spikes from 100 RCU/s to 50,000 RCU/s in under a minute. Despite the 'serverless' nature which promises to scale automatically, the application experiences a high rate of ProvisionedThroughputExceededException or throttling errors for the first few minutes of the sale. What is the most likely technical reason for this behavior?
Serverless cloud database
Hard
A.Serverless databases have a hard, fixed upper limit on throughput that was exceeded by the flash sale traffic.
B.The client-side SDK is misconfigured and is not using exponential backoff, causing it to overwhelm the database with retries.
C.A serverless billing model requires manual intervention through an API call to authorize a sudden increase in spending, which was not performed.
D.The database's underlying partitions were not 'pre-warmed' and the adaptive capacity mechanism couldn't scale the partitions' physical resources fast enough to accommodate the instantaneous, massive spike in traffic.
Correct Answer: The database's underlying partitions were not 'pre-warmed' and the adaptive capacity mechanism couldn't scale the partitions' physical resources fast enough to accommodate the instantaneous, massive spike in traffic.
Explanation:
This question targets a common misconception about serverless databases. While they scale 'infinitely' in theory, the physical scaling is not instantaneous. DynamoDB, for instance, allocates capacity to underlying partitions. When a workload is consistently low, this capacity is minimal. A sudden, massive spike (e.g., 500x) can outpace the 'adaptive capacity' feature, which rebalances and splits partitions to handle more load. This scaling process takes time. For a short period, the initial partitions cannot handle the new load, resulting in throttling, even in an 'on-demand' mode. The system is designed to handle gradual or predictable growth better than instantaneous vertical spikes. Client-side retries are important, but they treat the symptom, not the root cause of the initial throttling.
Incorrect! Try again.
47In MongoDB, consider a schema design for a blogging platform where posts can have multiple tags. Two common approaches are:
If the primary query requirement is to find all posts that have both the 'mongodb' tag and the 'indexing' tag, and to do so using a single index for optimal performance, what is the key advantage of the Array of Strings approach combined with a multikey index on the tags field?
Structure of MongoDB
Hard
A.The Array of Strings approach supports atomic updates to individual tags, whereas the sub-document approach does not.
B.The Array of Strings approach allows the use of the $all operator, which can be efficiently serviced by a single multikey index on the tags field.
C.The sub-document approach cannot use a multikey index at all.
D.The sub-document approach is more storage-efficient, which indirectly leads to better query performance.
Correct Answer: The Array of Strings approach allows the use of the $all operator, which can be efficiently serviced by a single multikey index on the tags field.
Explanation:
This question analyzes the subtle implications of schema design on indexing and query capabilities. While both schemas can be queried, the Array of Strings approach is superior for this specific requirement. When you create a multikey index on tags, MongoDB creates an index entry for each element in the array. A query like db.posts.find({ tags: { all on tags.tag. However, the all operator on a simple array is a more direct and often more efficient pattern for 'AND' conditions on array elements.
Incorrect! Try again.
48Consider a JSON document storing sensor readings: { "deviceId": "A-1", "ts": 1672531200, "metrics": { "temp": 25.5, "humidity": 45.1 } }. If you are designing a system to store billions of such documents and the most critical query is to find the average temperature for a specific device within a time range, which statement represents the most significant challenge of a pure JSON-based storage format versus a columnar format like Parquet or ORC?
JSON databases
Hard
A.Storing timestamps as numbers in JSON is inefficient and leads to large storage overhead compared to native date types in columnar formats.
B.A row-oriented format like JSON requires reading and parsing the entire document (including deviceId, ts, and humidity) for every record in the range, even though only the temp field is needed for the aggregation.
C.JSON parsers are inherently single-threaded and create a performance bottleneck that columnar formats do not have.
D.JSON's lack of a strict schema makes it impossible to query the temp field reliably.
Correct Answer: A row-oriented format like JSON requires reading and parsing the entire document (including deviceId, ts, and humidity) for every record in the range, even though only the temp field is needed for the aggregation.
Explanation:
This question contrasts the fundamental storage layout of JSON (a row-oriented document format) with columnar formats used in analytical databases. The key performance difference for analytical queries (like AVG, SUM, COUNT on a specific column) is I/O. In a columnar store, the system would only need to read the 'temp' column data for the records matching the time range. In a JSON or any row-oriented database, the system must read the entire document for each row that matches the filter, even the fields that are not part of the aggregation (humidity). This leads to significantly higher I/O and CPU cost for parsing unneeded data, which is the primary reason columnar formats dominate the data warehousing and analytics space.
Incorrect! Try again.
49You have a MongoDB collection users with an index { "country": 1, "age": 1 }. You run a query db.users.find({ age: { $gt: 30 } }).sort({ country: 1 }). The explain() command reveals that the query plan involves a COLLSCAN followed by a SORT. Why did the MongoDB query planner reject the { "country": 1, "age": 1 } index for this operation?
Index creation & performance comparison using EXPLAIN
Hard
A.The sort operation sort({ country: 1 }) is redundant as the data is already sorted by country in the index.
B.The query planner can only use an index if the sort field is the same as the query predicate field.
C.The index cannot be used because the query predicate is a range operator ($gt) on the second field of the index (age) without an equality match on the first field (country).
D.MongoDB indexes do not support range operators like $gt, they only work for equality matches.
Correct Answer: The index cannot be used because the query predicate is a range operator ($gt) on the second field of the index (age) without an equality match on the first field (country).
Explanation:
This question tests a critical rule of compound indexing known as the ESR (Equality, Sort, Range) rule. For a compound index to be used effectively, any equality predicates must come first, followed by sort predicates, and finally range predicates. The fields in the query must be a prefix of the index keys. In this case, the query filters on age (the second index field) without providing an equality filter for country (the first index field). Therefore, the index cannot be used to efficiently find the documents. The planner cannot 'skip' the country part of the index to filter by age. It is forced to scan the entire collection (COLLSCAN) and then sort the results.
Incorrect! Try again.
50In DynamoDB, you are implementing a system that requires a transaction to update an Order item and decrement the stock_count in a corresponding Product item. The operation must be atomic. You use a TransactWriteItems operation. If the transaction fails specifically due to an OptimisticLockException on the Product item, what does this imply?
DynamoDB
Hard
A.The Product item did not exist in the table when the transaction was initiated.
B.Another process modified the Product item between the time your transaction read it (as part of a condition check) and the time it attempted to write the update.
C.The IAM role executing the transaction does not have permission to write to the Product table.
D.The entire transaction violated a unique constraint defined on one of the tables.
Correct Answer: Another process modified the Product item between the time your transaction read it (as part of a condition check) and the time it attempted to write the update.
Explanation:
An OptimisticLockException (or ConditionalCheckFailedException within a transaction) is central to implementing optimistic concurrency control in DynamoDB. This pattern usually involves reading an item's version number, performing logic, and then writing the update with a ConditionExpression that checks if the version number is still the same. If another write operation updated the item in the meantime, the version number will have changed, the condition check will fail, and the transaction will be rolled back with this specific exception. It signals a contention scenario, indicating that the state of the data changed concurrently, and the transaction must be retried. The other options describe different failure modes (item not found, permission denied, etc.).
Incorrect! Try again.
51When using a vector database for semantic search, you embed a query q and search for the nearest vectors in your indexed dataset. The search commonly uses a distance metric like Cosine Similarity or Euclidean Distance (). Under which condition would these two metrics rank the nearest neighbors most differently?
Vector Databases
Hard
A.When the vectors in the dataset have widely varying magnitudes (lengths).
B.When the vectors have very high dimensionality (e.g., > 1536).
C.When all vectors in the dataset have been normalized to a unit length of 1.
D.When the query vector q is the zero vector.
Correct Answer: When the vectors in the dataset have widely varying magnitudes (lengths).
Explanation:
This question probes the mathematical difference between these two common metrics. Cosine Similarity measures the cosine of the angle between two vectors, making it sensitive only to orientation, not magnitude. (Euclidean) distance is the straight-line distance between two vector endpoints, making it sensitive to both orientation and magnitude. If all vectors are normalized to unit length, they all lie on the surface of a hypersphere, and in this specific case, ranking by Cosine Similarity is equivalent to ranking by Euclidean Distance. However, if the vectors have very different magnitudes, two vectors can be very close in angle (high cosine similarity) but far apart in Euclidean space because one is much longer than the other. This is when the rankings will diverge the most. High dimensionality affects both but doesn't cause the fundamental ranking difference.
Incorrect! Try again.
52You are representing a graph structure (e.g., a social network) in a JSON document database like MongoDB. You need to model a 'follows' relationship between users. Consider these two JSON representations for a user document:
If the most critical, high-frequency operation is to display a user's complete profile, including the count of their followers and the count of users they are following, which modeling approach presents a significant performance challenge and why?
JSON representation of part of the dataset
Hard
A.Parent Referencing, because getting the 'following' count requires a separate query to a different collection, increasing application complexity.
B.Child Referencing, because determining the 'follower' count requires a query across the entire users collection to find all documents where the user's ID appears in the following array.
C.Child Referencing, because the following array can grow unboundedly, potentially exceeding the BSON document size limit (16MB).
D.Parent Referencing, because storing relationships in a separate collection eliminates the ability to use indexes.
Correct Answer: Child Referencing, because determining the 'follower' count requires a query across the entire users collection to find all documents where the user's ID appears in the following array.
Explanation:
This is a classic data modeling trade-off problem. In the Child Referencing model, getting the 'following' count is easy (size of the 'following' array). However, to get the 'follower' count for 'userA', you must execute a query like db.users.count({ following: "userA" }). This query needs to scan the following array in every single user document in the collection. Even with a multikey index on following, this can be a very expensive operation in a large collection. The Parent Referencing model (or a dedicated 'edges' collection) makes finding followers trivial (e.g., db.follows.count({ following: "userA" })) and finding who a user follows is also a simple query (db.follows.count({ follower: "userA" })), but it requires more queries to assemble a full profile. The unbounded array issue (Option C) is also a valid concern with Child Referencing, but the 'reverse lookup' performance problem is a more immediate and universal challenge for this specific query requirement.
Incorrect! Try again.
53In MongoDB, what is the primary difference in write behavior and performance implications between a standard updateOne operation and an updateOne operation with the upsert: true option when the document to be updated does not exist?
Working with MongoDB
Hard
A.With upsert: true, MongoDB must first perform a query to check for the document's existence, potentially acquiring more locks and adding latency, before deciding whether to perform an insert or an update.
B.A standard updateOne is faster as it does not write to the journal until the operation is complete, whereas an upsert must journal immediately.
C.There is no difference in performance; the database engine handles both cases with the same internal mechanism.
D.With upsert: true, the operation is not atomic and can lead to race conditions.
Correct Answer: With upsert: true, MongoDB must first perform a query to check for the document's existence, potentially acquiring more locks and adding latency, before deciding whether to perform an insert or an update.
Explanation:
The upsert option provides 'update or insert' functionality, which is convenient but has performance subtleties. A standard updateOne on a non-existent document simply finds nothing and returns. An upsert operation on a non-existent document requires the database to first perform a search based on the query filter. If it finds nothing, it must then perform an insert operation. This two-step logic (find-then-insert if needed) is inherently more complex than a simple find. It can involve acquiring different types of locks (e.g., an intent lock on the collection for the potential insert) and adds a small amount of overhead compared to a plain update that finds and modifies a document, or an update that finds nothing. While the difference may be negligible for a single operation, it can be measurable in high-throughput scenarios.
Incorrect! Try again.
54When migrating a relational database schema to a document database like MongoDB, a one-to-many relationship (e.g., an Author has many Books) is often denormalized by embedding the 'many' side into the 'one' side (i.e., embedding an array of book documents within the author document). Which scenario represents the strongest argument against this denormalization strategy?
SQL vs NoSql
Hard
A.The application requires ACID transactions spanning updates to both author and book information.
B.The total number of books per author is always small and bounded (e.g., less than 10).
C.Queries often need to retrieve an author and all of their books in a single operation.
D.The book information is frequently updated independently of the author information.
Correct Answer: The book information is frequently updated independently of the author information.
Explanation:
Denormalization by embedding is a powerful pattern in document databases, primarily optimized for read performance (getting an author and all their books is a single document read). However, it creates a significant drawback for writes. If book information (e.g., sales rank, reviews) is updated very frequently, embedding means you are constantly rewriting the entire, potentially large, author document just to change a small piece of data within the embedded book array. This leads to write amplification, increased I/O, and potential contention on the author document. In such a high-update scenario, keeping books in a separate collection and referencing them from the author document (a more normalized approach) is often the better design, as it allows for small, targeted updates to individual book documents.
Incorrect! Try again.
55MongoDB's storage engine, WiredTiger, uses both an in-memory cache and on-disk data files. When a write operation occurs with the default WriteConcern (w:1), which sequence of events accurately describes how the data is persisted to ensure durability?
Introduction of MongoDB
Hard
A.The write is first written to an on-disk write-ahead log (journal), then applied to the in-memory cache, and an acknowledgement is then sent to the client. The data files are updated later during a checkpoint.
B.The write is applied directly to the on-disk data files, and then an acknowledgement is sent to the client.
C.The write is applied to the in-memory cache, written to the on-disk journal, flushed to the on-disk data files, and only then is an acknowledgement sent to the client.
D.The write is applied to the in-memory cache, an acknowledgement is sent to the client, and the data is later flushed to disk during a periodic checkpoint.
Correct Answer: The write is first written to an on-disk write-ahead log (journal), then applied to the in-memory cache, and an acknowledgement is then sent to the client. The data files are updated later during a checkpoint.
Explanation:
This question tests detailed knowledge of MongoDB's internal durability mechanisms. With w:1, MongoDB ensures durability in case of a server crash by using a write-ahead log (WAL), also known as the journal. The correct sequence is: 1) The change is written to the journal file on disk. This is a fast, sequential append. 2) The change is applied to the in-memory representation of the data (the WiredTiger cache). 3) Once both are complete, the server sends an acknowledgement to the client. The actual B-tree data files on disk are updated later by a background process during a checkpoint (e.g., every 60 seconds). The journal ensures that if the server crashes before a checkpoint, it can replay the journal upon restart to recover the changes that were in memory but not yet in the main data files.
Incorrect! Try again.
56You have a MongoDB collection logs with a sparse index on a field errorCode: db.logs.createIndex({ errorCode: 1 }, { sparse: true }). Which of the following queries would be unable to use this index, and why?
Index creation & performance comparison using EXPLAIN
Hard
A.A query for documents where the errorCode field is explicitly null, e.g., db.logs.find({ errorCode: null }).
B.A query for documents that have a specific error code, e.g., db.logs.find({ errorCode: 500 }).
C.A query that sorts by errorCode, e.g., db.logs.find().sort({ errorCode: 1 }).
D.A query for documents where the errorCode field exists, e.g., db.logs.find({ errorCode: { $exists: true } }).
Correct Answer: A query for documents where the errorCode field is explicitly null, e.g., db.logs.find({ errorCode: null }).
Explanation:
A sparse index only contains entries for documents that have the indexed field. If a document does not have the errorCode field, it is omitted from the index entirely. The query db.logs.find({ errorCode: null }) looks for documents that either do not have the errorCode field OR have the errorCode field set to null. Since the sparse index does not contain entries for documents missing the field, it cannot satisfy the 'field does not exist' part of this query's logic. Therefore, MongoDB cannot use the sparse index to resolve this query and must perform a COLLSCAN. A query for { $exists: true } can use the index because it's asking for documents that are in the index. Sorting can also use the index, but it will only sort the subset of documents that contain the field.
Incorrect! Try again.
57You are designing a DynamoDB table where you need to perform complex filtering on multiple non-key attributes. For example, finding all products where price > 100 AND category = 'Electronics' AND in_stock = true. The table's primary key is ProductID. Using a traditional DynamoDB Scan with a FilterExpression is too slow and costly. What is a common and advanced design pattern to enable efficient, multi-faceted searching on DynamoDB data without performing a full table scan?
DynamoDB
Hard
A.Implement client-side filtering by downloading the entire table daily and indexing it in a local application cache.
B.Use DynamoDB Accelerator (DAX) to cache the entire table in memory, allowing for faster scans.
C.Create a Global Secondary Index for every possible combination of filterable attributes.
D.Stream the DynamoDB table data via DynamoDB Streams to an external search service like Amazon OpenSearch (or Elasticsearch) and direct search queries to it.
Correct Answer: Stream the DynamoDB table data via DynamoDB Streams to an external search service like Amazon OpenSearch (or Elasticsearch) and direct search queries to it.
Explanation:
This question addresses a key limitation of DynamoDB: its query capabilities are tied directly to its primary and secondary keys, which are not designed for ad-hoc, multi-attribute filtering. The standard architectural solution is the 'Search Index' pattern. DynamoDB Streams capture item-level changes in real-time. A Lambda function can be triggered by these streams to replicate the data into a dedicated search service like OpenSearch or Algolia. This service is purpose-built for complex text search and faceted filtering. Queries that require this flexibility are then sent to OpenSearch, which returns a list of ProductIDs, which can then be used to fetch the full items from DynamoDB via BatchGetItem if needed. Creating GSIs for every combination is impractical and expensive. DAX accelerates key-value lookups but does not fundamentally change the Scan operation's inefficiency. Client-side indexing is not scalable or real-time.
Incorrect! Try again.
58The BSON specification, which MongoDB uses for data storage, includes an ObjectId type. It is a 12-byte value composed of: a 4-byte timestamp, a 5-byte random value, and a 3-byte incrementing counter. What is the primary purpose of structuring the ObjectId this way, particularly the inclusion of the timestamp as the most significant bytes?
Structure of MongoDB
Hard
A.To compress the document size by encoding three separate pieces of information into a single 12-byte field.
B.To guarantee universal uniqueness across every database cluster in the world.
C.To allow for the automatic deletion of documents after the timestamp expires, similar to a TTL index.
D.To ensure that default sorting on the _id field roughly corresponds to the insertion order of the documents, which is highly beneficial for many queries.
Correct Answer: To ensure that default sorting on the _id field roughly corresponds to the insertion order of the documents, which is highly beneficial for many queries.
Explanation:
While ObjectIds are designed to be highly unique, their structure provides an additional, crucial benefit. Because the 4-byte timestamp is the most significant part of the 12-byte value, the binary representation of ObjectIds will sort in approximately chronological order. This means that a find().sort({_id: 1}) is effectively a sort by creation time. This is extremely useful for retrieving the most recently inserted documents without needing a separate createdAt timestamp field and an index on it. The default _id index can be used for both unique lookups and for efficiently fetching recent items.
Incorrect! Try again.
59A key characteristic of serverless databases like FaunaDB or the 'Data API' for Amazon Aurora Serverless is the way they handle connections. Unlike traditional databases that maintain persistent connection pools, these services are often accessed via stateless HTTP APIs. What is the most significant architectural implication of this connectionless model for an application with a very high volume of small, frequent queries?
Serverless cloud database
Hard
A.It simplifies the application code by removing the need for complex connection pooling and retry logic.
B.It eliminates the 'C10k problem' by allowing a virtually unlimited number of concurrent clients, as there are no persistent server-side connections to manage.
C.It introduces significant latency overhead for each query due to the need for a new TCP and TLS handshake for every API call.
D.It shifts the burden of ensuring data consistency from the database to the client-side application.
Correct Answer: It eliminates the 'C10k problem' by allowing a virtually unlimited number of concurrent clients, as there are no persistent server-side connections to manage.
Explanation:
The 'C10k problem' refers to the challenge of a server handling ten thousand concurrent connections. Traditional databases often struggle here because each connection consumes memory and CPU resources on the server. The stateless, HTTP-based model of many serverless databases completely sidesteps this problem. Since there is no persistent connection, the server doesn't need to maintain state for thousands of clients simultaneously. This allows for massive client-side concurrency (e.g., from thousands of ephemeral serverless functions) without exhausting the database's connection limits. While there is some latency overhead per call (Option B), modern HTTP/2 and connection reuse can mitigate this. The main architectural benefit is the massive scalability of concurrent operations.
Incorrect! Try again.
60You are building a real-time recommendation engine using a vector database. New user interaction vectors are generated constantly and must be searchable within seconds. You choose an ANN index like HNSW. During a period of extremely high write traffic, you notice that query latency increases and recall (accuracy) decreases. What is the most likely cause of this degradation within the HNSW index structure?
Vector Databases
Hard
A.The product quantization (PQ) a-symmetric distance computation becomes a CPU bottleneck under high load.
B.The database is performing a full re-index of all vectors in the background, which consumes all available I/O.
C.The high rate of concurrent insertions leads to lock contention on the graph's entry point and upper layers, slowing down both writes and reads that need to traverse from the top.
D.The new vectors are being written to a write-ahead-log (WAL) but are not yet added to the searchable graph index until a scheduled batch process runs.
Correct Answer: The high rate of concurrent insertions leads to lock contention on the graph's entry point and upper layers, slowing down both writes and reads that need to traverse from the top.
Explanation:
This question tests the understanding of the dynamics of a mutable HNSW graph. In HNSW, both inserting a new vector and performing a search start at a common entry point in the top layer of the graph and traverse down. To maintain the integrity of the graph structure during concurrent modifications, locks are required on the nodes being visited and modified. Under very high insert rates, these locks can cause contention, creating queues for both new writes trying to enter the graph and new reads trying to start their search. This contention is a well-known challenge in implementing highly concurrent HNSW, directly leading to increased latency and potentially affecting the quality of the search path, which can decrease recall.