Unit 4 - Practice Quiz

INT312 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 Which of the following is a mandatory prerequisite software framework for running Apache Hive?

Hive installation Easy
A. Apache Kafka
B. Apache Flink
C. Apache Hadoop
D. Apache Spark

2 What is the default embedded database used by Hive for its Metastore in local mode?

Hive installation Easy
A. PostgreSQL
B. Apache Derby
C. MySQL
D. Oracle

3 Which environment variable is typically set to point to the base directory of the Hive installation?

Hive installation Easy
A. HIVE_HOME
B. HADOOP_HOME
C. HIVE_PATH
D. HIVE_DIR

4 Which of the following Hive data types is used to store text or character sequences?

Hive data types Easy
A. BOOLEAN
B. INT
C. STRING
D. FLOAT

5 Which complex data type in Hive is used to store an unordered collection of key-value pairs?

Hive data types Easy
A. UNION
B. ARRAY
C. MAP
D. STRUCT

6 Which complex data type represents an ordered collection of elements of the exact same data type?

Hive data types Easy
A. STRUCT
B. MAP
C. STRING
D. ARRAY

7 Which primitive data type in Hive is used to represent true or false values?

Hive data types Easy
A. DOUBLE
B. TINYINT
C. INT
D. BOOLEAN

8 Which clause is used in a HiveQL CREATE TABLE statement to implement bucketing?

Hive bucketing Easy
A. ORDER BY
B. GROUP BY
C. PARTITIONED BY
D. CLUSTERED BY

9 How does Hive determine which bucket a particular record should be stored in?

Hive bucketing Easy
A. Using a hash function on the bucketing column
B. Based on the file size of the input data
C. Using a random number generator
D. Based on the alphabetical order of the entire row

10 What is one of the primary performance benefits of using bucketing in Hive?

Hive bucketing Easy
A. It improves map-side joins and sampling efficiency
B. It encrypts the data automatically
C. It compresses data automatically into zip files
D. It converts all data into JSON format

11 How is partitioned data physically organized in the Hadoop Distributed File System (HDFS)?

Hive partitioning Easy
A. As encrypted binary blocks in a single folder
B. As one massive uncompressed text file
C. As separate sub-directories under the main table directory
D. As multiple tables in a database

12 Which keyword is used to create partitioned tables in Hive?

Hive partitioning Easy
A. SPLIT BY
B. PARTITIONED BY
C. CLUSTERED BY
D. DIVIDED BY

13 What is the main advantage of using partitioning in Hive?

Hive partitioning Easy
A. It allows faster query execution by scanning only relevant partitions (Data Pruning)
B. It automatically backs up deleted data
C. It provides a graphical user interface for queries
D. It removes duplicate rows without using DISTINCT

14 Which HiveQL command is used to display a list of all existing databases?

Hiveql operations Easy
A. DISPLAY DATABASES
B. GET DATABASES
C. LIST DATABASES
D. SHOW DATABASES

15 Which command is used to permanently remove a table and its metadata from Hive?

Hiveql operations Easy
A. TRUNCATE TABLE
B. REMOVE TABLE
C. DELETE TABLE
D. DROP TABLE

16 Which command allows a user to import data from a local file system into a Hive table?

Hiveql operations Easy
A. FETCH DATA
B. INSERT DATA
C. IMPORT DATA
D. LOAD DATA LOCAL INPATH

17 How do you define a new table structure in HiveQL?

Hiveql operations Easy
A. GENERATE TABLE
B. CREATE TABLE
C. MAKE TABLE
D. BUILD TABLE

18 Which relational operator in Hive is used to check if two values are equal?

Hive operators Easy
A. EQUALS
B. ==
C. :=
D. =

19 Which logical operator is used to ensure both conditions in a WHERE clause evaluate to true?

Hive operators Easy
A. OR
B. AND
C. XOR
D. NOT

20 Which Hive operator is specifically used to check if a column value is missing or undefined?

Hive operators Easy
A. IS BLANK
B. IS NULL
C. IS EMPTY
D. == NULL

21 Which of the following is a mandatory prerequisite running service for Apache Hive to function properly, as it handles the underlying distributed storage?

Hive installation Medium
A. Hadoop Distributed File System (HDFS)
B. Apache Spark
C. Apache HBase
D. Apache Kafka

22 When configuring a remote metastore for Apache Hive, which configuration file is primarily modified to set the JDBC connection parameters for the external RDBMS?

Hive installation Medium
A. hive-site.xml
B. mapred-site.xml
C. core-site.xml
D. hive-env.sh

23 A developer wants to store a user's address containing fields like street (string), city (string), and zip code (integer). Which complex Hive data type is most appropriate to group these different types together?

Hive data types Medium
A. UNIONTYPE
B. MAP
C. STRUCT
D. ARRAY

24 How does Hive handle a cast operation when a STRING containing non-numeric alphabetic characters is explicitly cast to an INT?

Hive data types Medium
A. It returns 0.
B. It returns the ASCII value of the first character.
C. It throws a ClassCastException.
D. It returns NULL.

25 Which of the following Hive data types allows a column to store a value that can be exactly one of several explicitly specified data types?

Hive data types Medium
A. ANY
B. UNIONTYPE
C. STRUCT
D. MAP

26 What is the default precision of the TIMESTAMP data type in Apache Hive?

Hive data types Medium
A. Millisecond precision
B. Second precision
C. Nanosecond precision
D. Microsecond precision

27 Which Hive configuration property must be enabled (set to true) to allow the automatic creation of partitions based on the values of the loaded data?

Hive partitioning Medium
A. hive.partition.dynamic.enable
B. hive.exec.partition.dynamic
C. hive.exec.dynamic.partition
D. hive.dynamic.partition.mode

28 If a table is partitioned by country and then by state, how will the nested directory structure be physically represented in HDFS for a record from California, USA?

Hive partitioning Medium
A. /table_path/state=California/country=USA/
B. /table_path/country_USA_state_California/
C. /table_path/USA/California/
D. /table_path/country=USA/state=California/

29 What happens if a user executes an INSERT statement into a table with dynamic partitioning enabled, but the hive.exec.dynamic.partition.mode is set to strict?

Hive partitioning Medium
A. The query requires the user to specify at least one static partition column.
B. Hive will dynamically determine all partitions but limit the creation to 100 folders.
C. Hive overrides the strict mode silently and proceeds in nonstrict mode.
D. The query will fail completely because strict mode disables all dynamic partitions.

30 Which of the following accurately describes a major operational drawback of excessive partitioning (over-partitioning) on a high-cardinality column in Hive?

Hive partitioning Medium
A. It prevents the usage of complex data types like STRUCT or MAP.
B. It completely disables bucketing optimizations on the table.
C. It forces Hive to bypass the execution engine and run natively on the NameNode.
D. It creates a large number of small files, leading to high memory overhead on the Hadoop NameNode.

31 How does Hive determine the correct bucket for a specific row when a table is defined with the CLUSTERED BY (col_name) INTO N BUCKETS clause?

Hive bucketing Medium
A. By using the mathematical formula: hash_function(col_name) % N
B. By utilizing a MapReduce partitioner based on the file size of the incoming data
C. By sorting the dataset and splitting it into perfectly equal parts
D. By distributing rows in a simple round-robin sequence across buckets

32 Why is bucketing preferred over partitioning when distributing data based on a column with highly unique values, such as a user_id?

Hive bucketing Medium
A. Bucketing automatically sorts the data globally across the entire cluster.
B. Bucketing utilizes higher data compression ratios natively than partitioning.
C. Bucketing strictly ensures a fixed number of files, mitigating the small-file problem.
D. Bucketing enables dynamic schema evolution for nested queries.

33 Historically, which configuration property had to be set to true to ensure standard INSERT operations populated the appropriate number of buckets correctly in Hive?

Hive bucketing Medium
A. hive.exec.bucketing.enable
B. hive.optimize.bucketmapjoin
C. hive.enforce.bucketing
D. hive.mapred.bucketing.strict

34 If two tables are physically bucketed and sorted by the same key, and share a proportional number of buckets, which advanced Hive query execution strategy becomes possible?

Hive bucketing Medium
A. Skew Join
B. Sort-Merge Bucket (SMB) Join
C. Vectorized Query Execution
D. Map-Side Aggregation

35 What is the physical outcome of executing DROP TABLE my_data; if my_data was defined as an EXTERNAL table?

Hiveql operations Medium
A. Both the metadata in the metastore and the data files in HDFS are deleted.
B. The command throws an error because external tables can only be unlinked, not dropped.
C. The metadata in the metastore is removed, but the underlying data remains intact in HDFS.
D. The underlying data in HDFS is deleted, but the table schema remains in the metastore.

36 Which specific HiveQL clause allows a user to write queries that read only a random or bucket-based subset of a table for exploratory data analysis?

Hiveql operations Medium
A. TABLESAMPLE
B. LIMIT
C. FETCH FIRST
D. SAMPLE BY

37 How does an INSERT OVERWRITE operation differ mechanically from a standard INSERT INTO operation in HiveQL?

Hiveql operations Medium
A. INSERT OVERWRITE clears the existing data in the target table or partition before writing the new data.
B. INSERT OVERWRITE automatically ignores rows that contain identical primary keys.
C. INSERT OVERWRITE requires an explicit transaction lock and updates row values based on a WHERE clause.
D. INSERT OVERWRITE drops the table completely and recreates a new schema based on the input stream.

38 In HiveQL, what is the core purpose of using a LATERAL VIEW in conjunction with a UDTF (User-Defined Table-Generating Function) like explode()?

Hiveql operations Medium
A. It creates an immutable temporary view that behaves identically to an external table.
B. It restricts query execution to the map phase by skipping the reduce phase completely.
C. It projects multiple row results into a compressed single ARRAY field for storage optimization.
D. It allows expanding a complex data structure (like an array) into multiple rows and joining them back to the original source row.

39 Which of the following operators is known as the null-safe equality operator in HiveQL, capable of returning true if both operands happen to be NULL?

Hive operators Medium
A. =!=
B. IS NOT DISTINCT FROM
C. <=>
D. ==

40 When evaluating a WHERE clause in HiveQL, what exact functionality does the RLIKE operator provide?

Hive operators Medium
A. It checks if a specified string correctly matches a Java regular expression pattern.
B. It dynamically checks the phonetic similarity of two strings using Soundex.
C. It evaluates case-insensitive exact equality between a string and a column.
D. It securely joins strings by ignoring leading and trailing white spaces.

41 When configuring a Hive installation for a highly concurrent production environment, which metastore configuration approach eliminates the single point of failure and allows multiple HiveServer2 instances to scale efficiently?

Hive installation Hard
A. Remote Metastore Server with an external RDBMS
B. Embedded Metastore with Derby
C. Local Metastore with MySQL
D. In-Memory Metastore using Redis

42 Which set of properties must be correctly configured in hive-site.xml to successfully enable HiveServer2 High Availability (HA) using Apache ZooKeeper?

Hive installation Hard
A. hive.server2.enable.doAs and hive.server2.zookeeper.namespace
B. hive.server2.support.dynamic.service.discovery and hive.zookeeper.quorum
C. hive.execution.engine and hive.zookeeper.client.port
D. hive.metastore.uris and hive.server2.thrift.port

43 Consider a Hive table containing a column events of type ARRAY<STRUCT<id:INT, type:STRING>>. Which of the following syntaxes correctly retrieves the type of the second element in the array?

Hive data types Hard
A. events.type[1]
B. events[1].type
C. events[2].type
D. events{1}.type

44 In Hive, what is the behavior when a calculation involves a DECIMAL(p1, s1) and a DECIMAL(p2, s2) using multiplication, and the resulting precision exceeds the maximum allowed precision of 38?

Hive data types Hard
A. Hive truncates the most significant digits, preserving the exact scale requested.
B. Hive throws an ArithmeticException and fails the query.
C. Hive automatically casts the result to a DOUBLE to preserve the magnitude.
D. Hive truncates the scale to accommodate the integral part, potentially resulting in loss of fractional precision.

45 Which of the following implicitly casts properly without losing precision or failing in a standard Hive aggregate function like SUM()?

Hive data types Hard
A. FLOAT to INT
B. STRING to BIGINT
C. INT to DOUBLE
D. TIMESTAMP to DECIMAL

46 A user executes an INSERT OVERWRITE statement with dynamic partitioning enabled. The configuration hive.exec.dynamic.partition.mode is set to strict. What condition MUST be met for the query to execute successfully?

Hive partitioning Hard
A. The table must be bucketed alongside being partitioned.
B. The maximum number of dynamic partitions (hive.exec.max.dynamic.partitions) must be overridden.
C. All partition columns must be dynamically evaluated.
D. At least one partition column must be statically specified.

47 When working with highly scaled tables stored on an object store (like AWS S3), why might a data engineer prefer using ALTER TABLE ADD PARTITION instead of MSCK REPAIR TABLE to register new data directories?

Hive partitioning Hard
A. MSCK REPAIR TABLE does not support external tables.
B. MSCK REPAIR TABLE locks the entire database, whereas ALTER TABLE only locks the specific partition.
C. MSCK REPAIR TABLE recursively scans the entire table's file system tree, which is an extremely slow metadata operation on object stores.
D. ALTER TABLE ADD PARTITION automatically formats the underlying data into ORC format.

48 Assume a Hive table is partitioned by country (STRING) and year (INT). If data is inserted for country='US' and year=2023, what is the exact default directory structure created in HDFS?

Hive partitioning Hard
A. .../table_name/year=2023/country=US/
B. .../table_name/country='US'/year='2023'/
C. .../table_name/country=US/year=2023/
D. .../table_name/US/2023/

49 How does Hive determine the exact bucket number for a given row based on its bucketing column col and the total number of buckets B?

Hive bucketing Hard
A. length(col) % B
B. murmur_hash(col) / B
C. (hash(col) & 0x7FFFFFFF) % B
D. hash(col) % B

50 To perform a Sort-Merge-Bucket (SMB) Join optimally without shuffling data, what must be true about the two participating tables?

Hive bucketing Hard
A. One table must be small enough to fit entirely in memory to act as a hash table.
B. They must be bucketed by the join key, sorted by the join key, and have the exact same number of buckets (or multiples of each other).
C. They must be partitioned by the same column and use the same execution engine.
D. They must be stored in TEXTFILE format and have hive.enforce.bucketing set to false.

51 Consider a table sales bucketed into 32 buckets by customer_id. A user executes: SELECT * FROM sales TABLESAMPLE(BUCKET 3 OUT OF 8 ON customer_id);. How many buckets are actually scanned by this query?

Hive bucketing Hard
A. 8 buckets
B. 1 bucket
C. 4 buckets
D. 3 buckets

52 When using LATERAL VIEW explode(array_col) tbl AS val, what happens to rows in the base table where array_col is empty or NULL?

Hiveql operations Hard
A. The query throws a runtime error.
B. They are completely dropped from the result set.
C. They produce a single output row with a NULL value for val.
D. They produce an output row with an empty string for val.

53 Which of the following table property combinations is strictly REQUIRED to create an ACID transactional table in Hive 3.x?

Hiveql operations Hard
A. partitioned by and tblproperties('transactional'='strict')
B. stored as orc and tblproperties('transactional'='true')
C. clustered by and stored as textfile
D. stored as parquet and tblproperties('transactional'='true')

54 What is the primary operational difference between ORDER BY and SORT BY in HiveQL?

Hiveql operations Hard
A. SORT BY performs sorting in memory before shuffling, whereas ORDER BY sorts on disk.
B. SORT BY guarantees total global ordering across all reducers, while ORDER BY only orders within map tasks.
C. ORDER BY requires a partition key, while SORT BY does not.
D. ORDER BY guarantees total global ordering using a single reducer, while SORT BY guarantees partial ordering within each reducer.

55 In a HiveQL query, what is the effect of using CLUSTER BY col_name?

Hiveql operations Hard
A. It is equivalent to DISTRIBUTE BY col_name SORT BY col_name DESC.
B. It is a shortcut that is functionally equivalent to DISTRIBUTE BY col_name SORT BY col_name ASC.
C. It distributes the data across reducers based on col_name's hash, but does not sort the data within the reducers.
D. It optimizes the storage format of the table to ORC using col_name as the primary index.

56 Which Hive operator should be used to safely compare two columns for equality where both columns might contain NULL values, ensuring it returns TRUE if both are NULL?

Hive operators Hard
A. =!=
B. ==
C. IS NOT DISTINCT FROM
D. <=>

57 Evaluate the following Hive logical operation: SELECT NULL AND FALSE;. What is the result returned by Hive?

Hive operators Hard
A. A syntax error is thrown.
B. TRUE
C. NULL
D. FALSE

58 If a user applies the bitwise NOT operator ~ on a TINYINT column containing the value 1, what is the resulting integer value?

Hive operators Hard
A. -2
B. 254
C. 0
D. -1

59 Which of the following correctly describes the behavior of the RLIKE operator in Hive?

Hive operators Hard
A. It evaluates regular expressions but only returns TRUE if the entire string perfectly matches the pattern from start to finish.
B. It evaluates regular expressions and returns TRUE if any substring of the string matches the pattern.
C. It performs standard SQL wildcard matching using % and _.
D. It performs a substring search optimized for large text fields without regex overhead.

60 In a complex HiveQL query utilizing window functions, how does the frame specification ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING handle the first row of a partition?

Hiveql operations Hard
A. It skips the first row entirely because the frame cannot be completely satisfied.
B. It fills the 1 PRECEDING value with a NULL and processes it as part of the aggregation.
C. It ignores the preceding bound, aggregating only the CURRENT ROW and the 1 FOLLOWING row.
D. It throws an out-of-bounds error and stops query execution.