Unit 4 - Practice Quiz
1 Which of the following is a mandatory prerequisite software framework for running Apache Hive?
2 What is the default embedded database used by Hive for its Metastore in local mode?
3 Which environment variable is typically set to point to the base directory of the Hive installation?
4 Which of the following Hive data types is used to store text or character sequences?
5 Which complex data type in Hive is used to store an unordered collection of key-value pairs?
6 Which complex data type represents an ordered collection of elements of the exact same data type?
7 Which primitive data type in Hive is used to represent true or false values?
8
Which clause is used in a HiveQL CREATE TABLE statement to implement bucketing?
9 How does Hive determine which bucket a particular record should be stored in?
10 What is one of the primary performance benefits of using bucketing in Hive?
11 How is partitioned data physically organized in the Hadoop Distributed File System (HDFS)?
12 Which keyword is used to create partitioned tables in Hive?
13 What is the main advantage of using partitioning in Hive?
14 Which HiveQL command is used to display a list of all existing databases?
15 Which command is used to permanently remove a table and its metadata from Hive?
16 Which command allows a user to import data from a local file system into a Hive table?
17 How do you define a new table structure in HiveQL?
18 Which relational operator in Hive is used to check if two values are equal?
19
Which logical operator is used to ensure both conditions in a WHERE clause evaluate to true?
20 Which Hive operator is specifically used to check if a column value is missing or undefined?
21 Which of the following is a mandatory prerequisite running service for Apache Hive to function properly, as it handles the underlying distributed storage?
22 When configuring a remote metastore for Apache Hive, which configuration file is primarily modified to set the JDBC connection parameters for the external RDBMS?
hive-site.xml
mapred-site.xml
core-site.xml
hive-env.sh
23 A developer wants to store a user's address containing fields like street (string), city (string), and zip code (integer). Which complex Hive data type is most appropriate to group these different types together?
UNIONTYPE
MAP
STRUCT
ARRAY
24
How does Hive handle a cast operation when a STRING containing non-numeric alphabetic characters is explicitly cast to an INT?
0.
ClassCastException.
NULL.
25 Which of the following Hive data types allows a column to store a value that can be exactly one of several explicitly specified data types?
ANY
UNIONTYPE
STRUCT
MAP
26
What is the default precision of the TIMESTAMP data type in Apache Hive?
27
Which Hive configuration property must be enabled (set to true) to allow the automatic creation of partitions based on the values of the loaded data?
hive.partition.dynamic.enable
hive.exec.partition.dynamic
hive.exec.dynamic.partition
hive.dynamic.partition.mode
28
If a table is partitioned by country and then by state, how will the nested directory structure be physically represented in HDFS for a record from California, USA?
/table_path/state=California/country=USA/
/table_path/country_USA_state_California/
/table_path/USA/California/
/table_path/country=USA/state=California/
29
What happens if a user executes an INSERT statement into a table with dynamic partitioning enabled, but the hive.exec.dynamic.partition.mode is set to strict?
30 Which of the following accurately describes a major operational drawback of excessive partitioning (over-partitioning) on a high-cardinality column in Hive?
STRUCT or MAP.
31
How does Hive determine the correct bucket for a specific row when a table is defined with the CLUSTERED BY (col_name) INTO N BUCKETS clause?
hash_function(col_name) % N
32
Why is bucketing preferred over partitioning when distributing data based on a column with highly unique values, such as a user_id?
33
Historically, which configuration property had to be set to true to ensure standard INSERT operations populated the appropriate number of buckets correctly in Hive?
hive.exec.bucketing.enable
hive.optimize.bucketmapjoin
hive.enforce.bucketing
hive.mapred.bucketing.strict
34 If two tables are physically bucketed and sorted by the same key, and share a proportional number of buckets, which advanced Hive query execution strategy becomes possible?
35
What is the physical outcome of executing DROP TABLE my_data; if my_data was defined as an EXTERNAL table?
36 Which specific HiveQL clause allows a user to write queries that read only a random or bucket-based subset of a table for exploratory data analysis?
TABLESAMPLE
LIMIT
FETCH FIRST
SAMPLE BY
37
How does an INSERT OVERWRITE operation differ mechanically from a standard INSERT INTO operation in HiveQL?
INSERT OVERWRITE clears the existing data in the target table or partition before writing the new data.
INSERT OVERWRITE automatically ignores rows that contain identical primary keys.
INSERT OVERWRITE requires an explicit transaction lock and updates row values based on a WHERE clause.
INSERT OVERWRITE drops the table completely and recreates a new schema based on the input stream.
38
In HiveQL, what is the core purpose of using a LATERAL VIEW in conjunction with a UDTF (User-Defined Table-Generating Function) like explode()?
ARRAY field for storage optimization.
39
Which of the following operators is known as the null-safe equality operator in HiveQL, capable of returning true if both operands happen to be NULL?
=!=
IS NOT DISTINCT FROM
<=>
==
40
When evaluating a WHERE clause in HiveQL, what exact functionality does the RLIKE operator provide?
41 When configuring a Hive installation for a highly concurrent production environment, which metastore configuration approach eliminates the single point of failure and allows multiple HiveServer2 instances to scale efficiently?
42
Which set of properties must be correctly configured in hive-site.xml to successfully enable HiveServer2 High Availability (HA) using Apache ZooKeeper?
hive.server2.enable.doAs and hive.server2.zookeeper.namespace
hive.server2.support.dynamic.service.discovery and hive.zookeeper.quorum
hive.execution.engine and hive.zookeeper.client.port
hive.metastore.uris and hive.server2.thrift.port
43
Consider a Hive table containing a column events of type ARRAY<STRUCT<id:INT, type:STRING>>. Which of the following syntaxes correctly retrieves the type of the second element in the array?
events.type[1]
events[1].type
events[2].type
events{1}.type
44
In Hive, what is the behavior when a calculation involves a DECIMAL(p1, s1) and a DECIMAL(p2, s2) using multiplication, and the resulting precision exceeds the maximum allowed precision of 38?
ArithmeticException and fails the query.
DOUBLE to preserve the magnitude.
45
Which of the following implicitly casts properly without losing precision or failing in a standard Hive aggregate function like SUM()?
FLOAT to INT
STRING to BIGINT
INT to DOUBLE
TIMESTAMP to DECIMAL
46
A user executes an INSERT OVERWRITE statement with dynamic partitioning enabled. The configuration hive.exec.dynamic.partition.mode is set to strict. What condition MUST be met for the query to execute successfully?
hive.exec.max.dynamic.partitions) must be overridden.
47
When working with highly scaled tables stored on an object store (like AWS S3), why might a data engineer prefer using ALTER TABLE ADD PARTITION instead of MSCK REPAIR TABLE to register new data directories?
MSCK REPAIR TABLE does not support external tables.
MSCK REPAIR TABLE locks the entire database, whereas ALTER TABLE only locks the specific partition.
MSCK REPAIR TABLE recursively scans the entire table's file system tree, which is an extremely slow metadata operation on object stores.
ALTER TABLE ADD PARTITION automatically formats the underlying data into ORC format.
48
Assume a Hive table is partitioned by country (STRING) and year (INT). If data is inserted for country='US' and year=2023, what is the exact default directory structure created in HDFS?
.../table_name/year=2023/country=US/
.../table_name/country='US'/year='2023'/
.../table_name/country=US/year=2023/
.../table_name/US/2023/
49
How does Hive determine the exact bucket number for a given row based on its bucketing column col and the total number of buckets B?
length(col) % B
murmur_hash(col) / B
(hash(col) & 0x7FFFFFFF) % B
hash(col) % B
50 To perform a Sort-Merge-Bucket (SMB) Join optimally without shuffling data, what must be true about the two participating tables?
hive.enforce.bucketing set to false.
51
Consider a table sales bucketed into 32 buckets by customer_id. A user executes: SELECT * FROM sales TABLESAMPLE(BUCKET 3 OUT OF 8 ON customer_id);. How many buckets are actually scanned by this query?
52
When using LATERAL VIEW explode(array_col) tbl AS val, what happens to rows in the base table where array_col is empty or NULL?
val.
val.
53 Which of the following table property combinations is strictly REQUIRED to create an ACID transactional table in Hive 3.x?
partitioned by and tblproperties('transactional'='strict')
stored as orc and tblproperties('transactional'='true')
clustered by and stored as textfile
stored as parquet and tblproperties('transactional'='true')
54
What is the primary operational difference between ORDER BY and SORT BY in HiveQL?
SORT BY performs sorting in memory before shuffling, whereas ORDER BY sorts on disk.
SORT BY guarantees total global ordering across all reducers, while ORDER BY only orders within map tasks.
ORDER BY requires a partition key, while SORT BY does not.
ORDER BY guarantees total global ordering using a single reducer, while SORT BY guarantees partial ordering within each reducer.
55
In a HiveQL query, what is the effect of using CLUSTER BY col_name?
DISTRIBUTE BY col_name SORT BY col_name DESC.
DISTRIBUTE BY col_name SORT BY col_name ASC.
col_name's hash, but does not sort the data within the reducers.
col_name as the primary index.
56 Which Hive operator should be used to safely compare two columns for equality where both columns might contain NULL values, ensuring it returns TRUE if both are NULL?
=!=
==
IS NOT DISTINCT FROM
<=>
57
Evaluate the following Hive logical operation: SELECT NULL AND FALSE;. What is the result returned by Hive?
TRUE
NULL
FALSE
58
If a user applies the bitwise NOT operator ~ on a TINYINT column containing the value 1, what is the resulting integer value?
-2
254
0
-1
59
Which of the following correctly describes the behavior of the RLIKE operator in Hive?
% and _.
60
In a complex HiveQL query utilizing window functions, how does the frame specification ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING handle the first row of a partition?