Unit 6 - Practice Quiz

INT306 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 MongoDB is a popular example of which type of NoSQL database?

Introduction of MongoDB Easy
A. Document-oriented
B. Column-family
C. Key-value store
D. Graph database

2 Which of the following best describes a key difference between SQL and NoSQL databases regarding data structure?

SQL vs NoSql Easy
A. SQL databases use a dynamic schema, while NoSQL databases use a rigid schema.
B. Both SQL and NoSQL databases always require a rigid, predefined schema.
C. SQL databases typically require a predefined schema, while NoSQL databases are often schema-less or have a dynamic schema.
D. SQL is for unstructured data, while NoSQL is for structured data.

3 In MongoDB, what is the term for a group of documents, which is analogous to a table in a relational database?

Structure of MongoDB Easy
A. Database
B. Recordset
C. Table
D. Collection

4 What does JSON stand for?

JSON databases Easy
A. Java Standard Object Notation
B. JavaScript Ordered Network
C. Java-Styled Object Naming
D. JavaScript Object Notation

5 Amazon DynamoDB is a fully managed NoSQL database service offered by which cloud provider?

DynamoDB Easy
A. Google Cloud
B. Amazon Web Services (AWS)
C. Microsoft Azure
D. IBM Cloud

6 Which command is used to query or find documents in a MongoDB collection?

Working with MongoDB Easy
A. SELECT * FROM collection;
B. show documents in collection;
C. db.collection.find()
D. db.collection.get()

7 What is a major advantage of using a serverless database?

Serverless cloud database Easy
A. It only supports the SQL query language.
B. It offers a fixed capacity that never changes.
C. It has a pay-for-what-you-use pricing model and handles scaling automatically.
D. It requires developers to manually provision and scale servers.

8 What is the primary purpose of creating an index on a field in a database?

Index creation & performance comparison using EXPLAIN Easy
A. To increase the physical storage size of the database.
B. To make write operations slower and more secure.
C. To encrypt the data within the collection.
D. To speed up the performance of read queries.

9 Vector databases are specifically designed to store and query what kind of data?

Vector Databases Easy
A. Simple key-value pairs.
B. High-dimensional vector embeddings.
C. Relational data in tables.
D. Graph-based relationship data.

10 Which of the following is a valid JSON representation for a product with a name "Keyboard" and a price of 75?

JSON representation of part of the dataset Easy
A. {"name": "Keyboard", "price": 75}
B. { 'name': 'Keyboard', 'price': 75 }
C. <product><name>Keyboard</name><price>75</price></product>
D. name=Keyboard, price=75

11 The acronym "NoSQL" is most commonly interpreted as:

SQL vs NoSql Easy
A. "Not Only SQL"
B. "Non-Standard Query Language"
C. "New SQL"
D. "No SQL Allowed"

12 In MongoDB, what is the basic unit of data that is analogous to a row in a relational database?

Structure of MongoDB Easy
A. An Index
B. A Field
C. A Document
D. A Schema

13 MongoDB stores data in a binary-encoded JSON format. What is this format called?

Introduction of MongoDB Easy
A. MsgPack
B. BSON
C. XML
D. YML

14 What is the primary data model for Amazon DynamoDB?

DynamoDB Easy
A. Time-series
B. Key-value and Document
C. Graph-based
D. Relational

15 Which of the following is a good example of a serverless database?

Serverless cloud database Easy
A. A self-hosted MySQL server on a physical machine.
B. A local SQLite file.
C. Amazon DynamoDB.
D. Microsoft Excel.

16 To add a single new employee document to an employees collection in MongoDB, which method is typically used?

Working with MongoDB Easy
A. db.employees.insertOne()
B. db.employees.saveNew()
C. db.employees.addOne()
D. db.employees.create()

17 In JSON, what structure is used to represent an ordered list of values?

JSON databases Easy
A. A String ("")
B. A Tuple (())
C. An Object ({})
D. An Array ([])

18 What is the function of the EXPLAIN command in a database like MongoDB?

Index creation & performance comparison using EXPLAIN Easy
A. It returns information on the query plan, showing how the query will be executed.
B. It explains the data types used in the collection.
C. It executes the query and returns the results.
D. It provides a plain-English explanation of the database's purpose.

19 A common use case for vector databases in AI applications is:

Vector Databases Easy
A. Storing user session data.
B. Standard transactional processing.
C. Logging server errors.
D. Performing similarity searches for recommendation engines.

20 Which type of database scaling involves increasing the resources (CPU, RAM) of a single server?

SQL vs NoSql Easy
A. Horizontal Scaling (scaling out)
B. Parallel Scaling
C. Vertical Scaling (scaling up)
D. Diagonal Scaling

21 A social media application needs to store user-generated content which has a highly variable structure (e.g., text posts, video posts with different metadata, polls). Which database model is more advantageous and why?

SQL vs NoSql Medium
A. NoSQL, because it only supports key-value storage which is simple for all post types.
B. NoSQL, because its flexible/dynamic schema allows for storing documents with different structures in the same collection.
C. SQL, because its rigid schema ensures data integrity and consistency for all post types.
D. SQL, because JOIN operations are highly optimized for retrieving varied data types.

22 MongoDB stores data in BSON format, which is a binary representation of JSON. What is a key advantage of using BSON over plain JSON for a database's internal storage format?

Structure of MongoDB Medium
A. BSON is human-readable, making it easier to debug data directly on disk.
B. BSON enforces a strict schema for all documents in a collection.
C. BSON supports additional data types beyond JSON's string, number, boolean, array, and object (e.g., Date, ObjectId, binary data).
D. BSON is a text-based format, which reduces parsing overhead compared to binary formats.

23 Given a students collection with documents like { "name": "John Doe", "major": "Computer Science", "grades": [85, 92, 78] }, which query finds all students majoring in 'Computer Science' who have at least one grade greater than 90?

Working with MongoDB Medium
A. db.students.find({ "major": "Computer Science" AND "grades" > 90 })
B. db.students.find({ "major": "Computer Science", "grades": { "$all": [90] } })
C. db.students.find({ "gt": 90 } } ] })
D. db.students.find({ "major": "Computer Science", "grades": { "$gt": 90 } })

24 You execute db.collection.explain('executionStats').find({ 'status': 'A' }) on a large collection. The output's winningPlan shows a stage of COLLSCAN and totalDocsExamined is equal to the total number of documents in the collection. What is the most logical first step to optimize this query?

Index creation & performance comparison using EXPLAIN Medium
A. Rewrite the query to use the aggregation framework.
B. Shard the collection across multiple servers.
C. Create a single-field index on the status field.
D. Increase the RAM on the database server.

25 An application requires read operations that always return the most recently completed write value. When querying a DynamoDB table, which type of read operation must be specified to guarantee this behavior, and what is the trade-off?

DynamoDB Medium
A. A Transactional Read, which is only used for ACID-compliant operations.
B. A Strongly Consistent Read, which may have higher latency and cost more read capacity units.
C. A Global Secondary Index read, which is always strongly consistent.
D. An Eventually Consistent Read, which has lower latency but might return stale data.

26 What core characteristic of a serverless database like Amazon DynamoDB or Google Firestore makes it particularly suitable for a startup building a new application with unpredictable user adoption rates?

Serverless cloud database Medium
A. Automatic and seamless scaling of throughput and storage without manual intervention.
B. The requirement to run on a specific operating system.
C. Fixed monthly pricing, which simplifies budget planning.
D. Support for standard SQL query language.

27 You are modeling a dataset of 'Orders' and 'LineItems' in a JSON document store. Each order can have multiple line items. If the primary use case is to always retrieve an order and all its line items together, which JSON structure is most efficient for read operations?

JSON representation of part of the dataset Medium
A. A single 'Order' document with a nested array field called lineItems containing the item objects.
B. A single collection where some documents are orders and others are line items.
C. Separate 'Order' and 'LineItem' collections, with lineItems documents containing an order_id reference.
D. A single 'Order' document with a string field containing comma-separated line item IDs.

28 A company wants to build a recommendation engine. They use a machine learning model to convert users and items into 300-dimension numerical vectors. What is the primary function of a vector database in this scenario?

Vector Databases Medium
A. To execute complex business logic using stored procedures on the vector data.
B. To store the raw user and item data in a relational format.
C. To provide ACID transactions for updating user and item vectors.
D. To efficiently store these vectors and perform fast similarity searches (e.g., finding the 'k' nearest item vectors to a given user vector).

29 MongoDB's architecture is designed for horizontal scaling. What is the name of the native process it uses to distribute data across multiple servers or clusters?

Introduction of MongoDB Medium
A. Clustering
B. Replication
C. Federation
D. Sharding

30 The term 'impedance mismatch' often comes up when comparing relational databases to application development. How does using a JSON database help mitigate this problem?

JSON databases Medium
A. It uses a schema that perfectly mirrors the application's object model, preventing any changes.
B. It eliminates the need for an Object-Relational Mapping (ORM) layer because the database's data model (JSON documents) closely matches the object model used in many programming languages.
C. It provides a SQL interface that all developers are familiar with.
D. It enforces strict data types that match object-oriented programming languages.

31 When considering the CAP theorem, a NoSQL database that prioritizes Availability and Partition Tolerance (AP) will often sacrifice strong consistency. What model of consistency does such a system typically adopt?

SQL vs NoSql Medium
A. Immediate Consistency
B. Serializability
C. Eventual Consistency
D. ACID Consistency

32 In the MongoDB Aggregation Framework, you need to filter the documents before they are grouped. Which two stages should you use, and in what order?

Working with MongoDB Medium
A. A group stage.
B. A match stage.
C. A project stage.
D. A group stage.

33 A query on your logs collection filters by timestamp and then sorts by level. Which compound index would provide the most benefit for this query: db.logs.find({ timestamp: { $gte: ISODate(...) } }).sort({ level: 1 })?

Index creation & performance comparison using EXPLAIN Medium
A. A text index on all fields
B. { "timestamp": 1, "level": 1 }
C. { "level": 1, "timestamp": 1 }
D. A single index on timestamp and another on level

34 In a DynamoDB table, you have a partition key userId and a sort key orderId. How does this combination affect data storage and querying capabilities?

DynamoDB Medium
A. It ensures that every orderId is unique across all users in the table.
B. It allows you to efficiently retrieve all orders for a specific user, sorted by orderId.
C. It prevents you from querying by userId alone.
D. It stores all items with the same userId on different physical partitions.

35 What is the conceptual relationship between a MongoDB 'collection' and a 'document' compared to a traditional SQL database?

Structure of MongoDB Medium
A. A collection is analogous to a column, and a document is analogous to a schema.
B. A collection is analogous to a row, and a document is analogous to a column.
C. A collection is analogous to a database, and a document is analogous to a table.
D. A collection is analogous to a table, and a document is analogous to a row.

36 When using a vector database for semantic search, the system is not matching keywords directly. What is it actually comparing to determine the similarity between a search query and the documents?

Vector Databases Medium
A. The geometric distance (e.g., cosine similarity or Euclidean distance) between the query's vector embedding and the documents' vector embeddings.
B. The primary keys of the documents.
C. A full-text index of the documents' content.
D. The number of overlapping keywords between the query and the documents.

37 A team is using a serverless database with a pay-per-request pricing model. They notice their monthly bill is unexpectedly high. What is a likely cause of this issue, related to application design?

Serverless cloud database Medium
A. The cloud provider increased the per-request price without notice.
B. The application is performing many small, inefficient read/write operations in a loop instead of batching them.
C. The database is automatically scaling to a larger instance size than needed.
D. They forgot to shut down the database server during off-peak hours.

38 When is it more appropriate to use 'referencing' (storing an ID of a related document) instead of 'embedding' in a JSON document model?

JSON representation of part of the dataset Medium
A. When representing a many-to-many relationship or when the related data is large and not always needed.
B. When the total size of the document is very small.
C. When you want to guarantee the fastest possible read performance for all related data.
D. When representing a one-to-one relationship where the data is always accessed together.

39 Besides its flexible schema, what is a key feature of MongoDB's data model that distinguishes it from a simple key-value store?

Introduction of MongoDB Medium
A. Its lack of any indexing capabilities.
B. Its ability to store only string values.
C. Its support for rich data structures within documents, such as nested objects and arrays.
D. Its strict enforcement of storing only a single key-value pair per entry.

40 Which of the following statements accurately describes the schema concept in the context of a typical JSON database like MongoDB?

JSON databases Medium
A. The schema is dynamic and enforced at the application level, not by the database itself, though schema validation rules can be optionally applied.
B. The database has no concept of a schema whatsoever, and data structure is completely random.
C. A rigid schema must be defined for each collection before any documents can be inserted.
D. The schema is defined using SQL's CREATE TABLE statement.

41 In MongoDB, you have a collection events with a compound index { "type": 1, "timestamp": -1 }. You execute the query db.events.find({ "type": "login" }).sort({ "timestamp": -1 }). The explain() output shows a winningPlan with an IXSCAN stage. If you change the query to db.events.find({ "type": "login" }).sort({ "timestamp": 1 }), what is the most likely outcome for the explain() plan and why?

Index creation & performance comparison using EXPLAIN Hard
A. MongoDB will use the index for the IXSCAN and efficiently read the keys in reverse order, requiring no additional SORT stage.
B. The query will fail because the sort direction does not match the indexed direction.
C. MongoDB will still use the index for the IXSCAN but will add an in-memory SORT stage because the sort order opposes the index key order.
D. MongoDB will perform a COLLSCAN followed by a SORT, as the index cannot be used for sorting in the opposite direction.

42 You are designing a DynamoDB table for a social media application to store user posts. The primary key is (UserID, PostID). You need to efficiently query for the 10 most recent posts by a specific user and query for the 10 most recent posts globally across all users. What is the most cost-effective and performant indexing strategy?

DynamoDB Hard
A. Use a Global Secondary Index (GSI) with a partition key of UserID and a sort key of PostTimestamp.
B. Create two GSIs: one with partition key UserID and sort key PostTimestamp, and another with a static partition key like 'all_posts' and sort key PostTimestamp.
C. Use a Local Secondary Index (LSI) with PostTimestamp as the sort key for user-specific queries, and a GSI with a static partition key (e.g., a constant value like 'all_posts') and PostTimestamp as the sort key for global queries.
D. No indexes are needed; perform a Scan operation with a FilterExpression for global posts and a Query for user-specific posts.

43 In the context of Approximate Nearest Neighbor (ANN) search in vector databases, which statement correctly analyzes the trade-off between the HNSW (Hierarchical Navigable Small World) and IVFPQ (Inverted File with Product Quantization) indexing methods?

Vector Databases Hard
A. HNSW offers higher query latency but provides better recall and is more memory-efficient than IVFPQ due to its graph structure.
B. IVFPQ provides perfect recall (100% accuracy) at the cost of higher latency, while HNSW is a purely approximate method.
C. IVFPQ has a faster index build time and lower memory footprint, but its performance degrades significantly in high-dimensional spaces compared to HNSW, and it is less flexible for adding new data points.
D. HNSW is optimized for static datasets, making it difficult to add new vectors without a full re-index, whereas IVFPQ dynamically handles new data points with minimal overhead.

44 A financial services company is building a global transaction processing system. The system must never lose a transaction (high durability) and must reflect the same account balance to users in New York and Tokyo simultaneously (strong consistency). The system must remain available for new transactions even if the network link between New York and Tokyo is temporarily severed. According to the CAP theorem, which of the following statements is the most accurate analysis of this system's requirements?

SQL vs NoSql Hard
A. The system can be built by prioritizing Availability and Partition Tolerance (AP), and then implementing a reconciliation layer to achieve eventual consistency for account balances.
B. The requirements are fundamentally contradictory. A system cannot simultaneously guarantee Strong Consistency (C) and Availability (A) during a network Partition (P).
C. The requirements are fully achievable with a modern NoSQL database that offers tunable consistency, sacrificing only partition tolerance.
D. The requirements are fully achievable with a traditional SQL database using synchronous replication across continents.

45 You are debugging a slow MongoDB aggregation pipeline. The collection contains 10 million documents. The pipeline is structured as follows:

1. $lookup: Joins with another collection.
2. $unwind: Deconstructs an array field created by the lookup.
3. $match: Filters documents based on a highly selective field.
4. $group: Aggregates the results.

What is the most effective change to optimize this pipeline's performance?

Working with MongoDB Hard
A. Add a group stage to pre-sort the data for the grouping operation.
B. Replace the $lookup with client-side joins to reduce database server load.
C. Create a compound index on the fields used in the group stages.
D. Move the $match stage to be the first stage in the pipeline.

46 A company uses a serverless database (like Amazon DynamoDB On-Demand or Azure Cosmos DB Serverless) for an e-commerce application. During a flash sale, the read traffic spikes from 100 RCU/s to 50,000 RCU/s in under a minute. Despite the 'serverless' nature which promises to scale automatically, the application experiences a high rate of ProvisionedThroughputExceededException or throttling errors for the first few minutes of the sale. What is the most likely technical reason for this behavior?

Serverless cloud database Hard
A. Serverless databases have a hard, fixed upper limit on throughput that was exceeded by the flash sale traffic.
B. The client-side SDK is misconfigured and is not using exponential backoff, causing it to overwhelm the database with retries.
C. A serverless billing model requires manual intervention through an API call to authorize a sudden increase in spending, which was not performed.
D. The database's underlying partitions were not 'pre-warmed' and the adaptive capacity mechanism couldn't scale the partitions' physical resources fast enough to accommodate the instantaneous, massive spike in traffic.

47 In MongoDB, consider a schema design for a blogging platform where posts can have multiple tags. Two common approaches are:

1. Array of Strings: { "_id": 1, "title": "...", "tags": ["nosql", "mongodb", "indexing"] }
2. Array of Sub-documents: { "_id": 1, "title": "...", "tags": [{ "tag": "nosql" }, { "tag": "mongodb" }, { "tag": "indexing" }] }

If the primary query requirement is to find all posts that have both the 'mongodb' tag and the 'indexing' tag, and to do so using a single index for optimal performance, what is the key advantage of the Array of Strings approach combined with a multikey index on the tags field?

Structure of MongoDB Hard
A. The Array of Strings approach supports atomic updates to individual tags, whereas the sub-document approach does not.
B. The Array of Strings approach allows the use of the $all operator, which can be efficiently serviced by a single multikey index on the tags field.
C. The sub-document approach cannot use a multikey index at all.
D. The sub-document approach is more storage-efficient, which indirectly leads to better query performance.

48 Consider a JSON document storing sensor readings: { "deviceId": "A-1", "ts": 1672531200, "metrics": { "temp": 25.5, "humidity": 45.1 } }. If you are designing a system to store billions of such documents and the most critical query is to find the average temperature for a specific device within a time range, which statement represents the most significant challenge of a pure JSON-based storage format versus a columnar format like Parquet or ORC?

JSON databases Hard
A. Storing timestamps as numbers in JSON is inefficient and leads to large storage overhead compared to native date types in columnar formats.
B. A row-oriented format like JSON requires reading and parsing the entire document (including deviceId, ts, and humidity) for every record in the range, even though only the temp field is needed for the aggregation.
C. JSON parsers are inherently single-threaded and create a performance bottleneck that columnar formats do not have.
D. JSON's lack of a strict schema makes it impossible to query the temp field reliably.

49 You have a MongoDB collection users with an index { "country": 1, "age": 1 }. You run a query db.users.find({ age: { $gt: 30 } }).sort({ country: 1 }). The explain() command reveals that the query plan involves a COLLSCAN followed by a SORT. Why did the MongoDB query planner reject the { "country": 1, "age": 1 } index for this operation?

Index creation & performance comparison using EXPLAIN Hard
A. The sort operation sort({ country: 1 }) is redundant as the data is already sorted by country in the index.
B. The query planner can only use an index if the sort field is the same as the query predicate field.
C. The index cannot be used because the query predicate is a range operator ($gt) on the second field of the index (age) without an equality match on the first field (country).
D. MongoDB indexes do not support range operators like $gt, they only work for equality matches.

50 In DynamoDB, you are implementing a system that requires a transaction to update an Order item and decrement the stock_count in a corresponding Product item. The operation must be atomic. You use a TransactWriteItems operation. If the transaction fails specifically due to an OptimisticLockException on the Product item, what does this imply?

DynamoDB Hard
A. The Product item did not exist in the table when the transaction was initiated.
B. Another process modified the Product item between the time your transaction read it (as part of a condition check) and the time it attempted to write the update.
C. The IAM role executing the transaction does not have permission to write to the Product table.
D. The entire transaction violated a unique constraint defined on one of the tables.

51 When using a vector database for semantic search, you embed a query q and search for the nearest vectors in your indexed dataset. The search commonly uses a distance metric like Cosine Similarity or Euclidean Distance (). Under which condition would these two metrics rank the nearest neighbors most differently?

Vector Databases Hard
A. When the vectors in the dataset have widely varying magnitudes (lengths).
B. When the vectors have very high dimensionality (e.g., > 1536).
C. When all vectors in the dataset have been normalized to a unit length of 1.
D. When the query vector q is the zero vector.

52 You are representing a graph structure (e.g., a social network) in a JSON document database like MongoDB. You need to model a 'follows' relationship between users. Consider these two JSON representations for a user document:

1. Child Referencing: { "_id": "userA", "name": "Alice", "following": ["userB", "userC"] }
2. Parent Referencing: { "_id": "userA", "name": "Alice" } (and elsewhere, documents like { "_id": "follow1", "follower": "userD", "following": "userA" })

If the most critical, high-frequency operation is to display a user's complete profile, including the count of their followers and the count of users they are following, which modeling approach presents a significant performance challenge and why?

JSON representation of part of the dataset Hard
A. Parent Referencing, because getting the 'following' count requires a separate query to a different collection, increasing application complexity.
B. Child Referencing, because determining the 'follower' count requires a query across the entire users collection to find all documents where the user's ID appears in the following array.
C. Child Referencing, because the following array can grow unboundedly, potentially exceeding the BSON document size limit (16MB).
D. Parent Referencing, because storing relationships in a separate collection eliminates the ability to use indexes.

53 In MongoDB, what is the primary difference in write behavior and performance implications between a standard updateOne operation and an updateOne operation with the upsert: true option when the document to be updated does not exist?

Working with MongoDB Hard
A. With upsert: true, MongoDB must first perform a query to check for the document's existence, potentially acquiring more locks and adding latency, before deciding whether to perform an insert or an update.
B. A standard updateOne is faster as it does not write to the journal until the operation is complete, whereas an upsert must journal immediately.
C. There is no difference in performance; the database engine handles both cases with the same internal mechanism.
D. With upsert: true, the operation is not atomic and can lead to race conditions.

54 When migrating a relational database schema to a document database like MongoDB, a one-to-many relationship (e.g., an Author has many Books) is often denormalized by embedding the 'many' side into the 'one' side (i.e., embedding an array of book documents within the author document). Which scenario represents the strongest argument against this denormalization strategy?

SQL vs NoSql Hard
A. The application requires ACID transactions spanning updates to both author and book information.
B. The total number of books per author is always small and bounded (e.g., less than 10).
C. Queries often need to retrieve an author and all of their books in a single operation.
D. The book information is frequently updated independently of the author information.

55 MongoDB's storage engine, WiredTiger, uses both an in-memory cache and on-disk data files. When a write operation occurs with the default WriteConcern (w:1), which sequence of events accurately describes how the data is persisted to ensure durability?

Introduction of MongoDB Hard
A. The write is first written to an on-disk write-ahead log (journal), then applied to the in-memory cache, and an acknowledgement is then sent to the client. The data files are updated later during a checkpoint.
B. The write is applied directly to the on-disk data files, and then an acknowledgement is sent to the client.
C. The write is applied to the in-memory cache, written to the on-disk journal, flushed to the on-disk data files, and only then is an acknowledgement sent to the client.
D. The write is applied to the in-memory cache, an acknowledgement is sent to the client, and the data is later flushed to disk during a periodic checkpoint.

56 You have a MongoDB collection logs with a sparse index on a field errorCode: db.logs.createIndex({ errorCode: 1 }, { sparse: true }). Which of the following queries would be unable to use this index, and why?

Index creation & performance comparison using EXPLAIN Hard
A. A query for documents where the errorCode field is explicitly null, e.g., db.logs.find({ errorCode: null }).
B. A query for documents that have a specific error code, e.g., db.logs.find({ errorCode: 500 }).
C. A query that sorts by errorCode, e.g., db.logs.find().sort({ errorCode: 1 }).
D. A query for documents where the errorCode field exists, e.g., db.logs.find({ errorCode: { $exists: true } }).

57 You are designing a DynamoDB table where you need to perform complex filtering on multiple non-key attributes. For example, finding all products where price > 100 AND category = 'Electronics' AND in_stock = true. The table's primary key is ProductID. Using a traditional DynamoDB Scan with a FilterExpression is too slow and costly. What is a common and advanced design pattern to enable efficient, multi-faceted searching on DynamoDB data without performing a full table scan?

DynamoDB Hard
A. Implement client-side filtering by downloading the entire table daily and indexing it in a local application cache.
B. Use DynamoDB Accelerator (DAX) to cache the entire table in memory, allowing for faster scans.
C. Create a Global Secondary Index for every possible combination of filterable attributes.
D. Stream the DynamoDB table data via DynamoDB Streams to an external search service like Amazon OpenSearch (or Elasticsearch) and direct search queries to it.

58 The BSON specification, which MongoDB uses for data storage, includes an ObjectId type. It is a 12-byte value composed of: a 4-byte timestamp, a 5-byte random value, and a 3-byte incrementing counter. What is the primary purpose of structuring the ObjectId this way, particularly the inclusion of the timestamp as the most significant bytes?

Structure of MongoDB Hard
A. To compress the document size by encoding three separate pieces of information into a single 12-byte field.
B. To guarantee universal uniqueness across every database cluster in the world.
C. To allow for the automatic deletion of documents after the timestamp expires, similar to a TTL index.
D. To ensure that default sorting on the _id field roughly corresponds to the insertion order of the documents, which is highly beneficial for many queries.

59 A key characteristic of serverless databases like FaunaDB or the 'Data API' for Amazon Aurora Serverless is the way they handle connections. Unlike traditional databases that maintain persistent connection pools, these services are often accessed via stateless HTTP APIs. What is the most significant architectural implication of this connectionless model for an application with a very high volume of small, frequent queries?

Serverless cloud database Hard
A. It simplifies the application code by removing the need for complex connection pooling and retry logic.
B. It eliminates the 'C10k problem' by allowing a virtually unlimited number of concurrent clients, as there are no persistent server-side connections to manage.
C. It introduces significant latency overhead for each query due to the need for a new TCP and TLS handshake for every API call.
D. It shifts the burden of ensuring data consistency from the database to the client-side application.

60 You are building a real-time recommendation engine using a vector database. New user interaction vectors are generated constantly and must be searchable within seconds. You choose an ANN index like HNSW. During a period of extremely high write traffic, you notice that query latency increases and recall (accuracy) decreases. What is the most likely cause of this degradation within the HNSW index structure?

Vector Databases Hard
A. The product quantization (PQ) a-symmetric distance computation becomes a CPU bottleneck under high load.
B. The database is performing a full re-index of all vectors in the background, which consumes all available I/O.
C. The high rate of concurrent insertions leads to lock contention on the graph's entry point and upper layers, slowing down both writes and reads that need to traverse from the top.
D. The new vectors are being written to a write-ahead-log (WAL) but are not yet added to the searchable graph index until a scheduled batch process runs.