1 $Which of the following is a mandatory prerequisite software framework for running Apache Hive?$

Hive installation Easy

A.

Apache Kafka

B.

Apache Hadoop

C.

Apache Flink

D.

Apache Spark

2 $What is the default embedded database used by Hive for its Metastore in local mode?$

Hive installation Easy

A.

Oracle

B.

MySQL

C.

PostgreSQL

D.

Apache Derby

3 $Which environment variable is typically set to point to the base directory of the Hive installation?$

Hive installation Easy

A.

HIVE_HOME

B.

HADOOP_HOME

C.

HIVE_PATH

D.

HIVE_DIR

4 $Which of the following Hive data types is used to store text or character sequences?$

Hive data types Easy

A.

STRING

B.

INT

C.

BOOLEAN

D.

FLOAT

5 $Which complex data type in Hive is used to store an unordered collection of key-value pairs?$

Hive data types Easy

A.

ARRAY

B.

UNION

C.

MAP

D.

STRUCT

6 $Which complex data type represents an ordered collection of elements of the exact same data type?$

Hive data types Easy

A.

STRING

B.

MAP

C.

ARRAY

D.

STRUCT

7 $Which primitive data type in Hive is used to represent true or false values?$

Hive data types Easy

A.

TINYINT

B.

BOOLEAN

C.

INT

D.

DOUBLE

8 $Which clause is used in a HiveQL CREATE TABLE statement to implement bucketing?$

Hive bucketing Easy

A.

CLUSTERED BY

B.

PARTITIONED BY

C.

ORDER BY

D.

GROUP BY

9 $How does Hive determine which bucket a particular record should be stored in?$

Hive bucketing Easy

A.

Using a hash function on the bucketing column

B.

Based on the alphabetical order of the entire row

C.

Based on the file size of the input data

D.

Using a random number generator

10 $What is one of the primary performance benefits of using bucketing in Hive?$

Hive bucketing Easy

A.

It improves map-side joins and sampling efficiency

B.

It encrypts the data automatically

C.

It converts all data into JSON format

D.

It compresses data automatically into zip files

11 $How is partitioned data physically organized in the Hadoop Distributed File System (HDFS)?$

Hive partitioning Easy

A.

As multiple tables in a database

B.

As one massive uncompressed text file

C.

As separate sub-directories under the main table directory

D.

As encrypted binary blocks in a single folder

12 $Which keyword is used to create partitioned tables in Hive?$

Hive partitioning Easy

A.

SPLIT BY

B.

CLUSTERED BY

C.

DIVIDED BY

D.

PARTITIONED BY

13 $What is the main advantage of using partitioning in Hive?$

Hive partitioning Easy

A.

It automatically backs up deleted data

B.

It provides a graphical user interface for queries

C.

It allows faster query execution by scanning only relevant partitions (Data Pruning)

D.

It removes duplicate rows without using DISTINCT

14 $Which HiveQL command is used to display a list of all existing databases?$

Hiveql operations Easy

A.

LIST DATABASES

B.

SHOW DATABASES

C.

GET DATABASES

D.

DISPLAY DATABASES

15 $Which command is used to permanently remove a table and its metadata from Hive?$

Hiveql operations Easy

A.

DROP TABLE

B.

DELETE TABLE

C.

REMOVE TABLE

D.

TRUNCATE TABLE

16 $Which command allows a user to import data from a local file system into a Hive table?$

Hiveql operations Easy

A.

FETCH DATA

B.

LOAD DATA LOCAL INPATH

C.

IMPORT DATA

D.

INSERT DATA

17 $How do you define a new table structure in HiveQL?$

Hiveql operations Easy

A.

BUILD TABLE

B.

CREATE TABLE

C.

GENERATE TABLE

D.

MAKE TABLE

18 $Which relational operator in Hive is used to check if two values are equal?$

Hive operators Easy

A.

EQUALS

B.

=

C.

==

D.

:=

19 $Which logical operator is used to ensure both conditions in a WHERE clause evaluate to true?$

Hive operators Easy

A.

OR

B.

AND

C.

NOT

D.

XOR

20 $Which Hive operator is specifically used to check if a column value is missing or undefined?$

Hive operators Easy

A.

IS NULL

B.

IS BLANK

C.

IS EMPTY

D.

== NULL

21 $Which of the following is a mandatory prerequisite running service for Apache Hive to function properly, as it handles the underlying distributed storage?$

Hive installation Medium

A.

Apache HBase

B.

Apache Spark

C.

Apache Kafka

D.

Hadoop Distributed File System (HDFS)

22 $When configuring a remote metastore for Apache Hive, which configuration file is primarily modified to set the JDBC connection parameters for the external RDBMS?$

Hive installation Medium

A.

hive-env.sh

B.

core-site.xml

C.

mapred-site.xml

D.

hive-site.xml

23 $A developer wants to store a user's address containing fields like street (string), city (string), and zip code (integer). Which complex Hive data type is most appropriate to group these different types together?$

Hive data types Medium

A.

MAP

B.

STRUCT

C.

UNIONTYPE

D.

ARRAY

24 $How does Hive handle a cast operation when a STRING containing non-numeric alphabetic characters is explicitly cast to an INT ?$

Hive data types Medium

A.

It returns 0 .

B.

It returns NULL .

C.

It throws a ClassCastException .

D.

It returns the ASCII value of the first character.

25 $Which of the following Hive data types allows a column to store a value that can be exactly one of several explicitly specified data types?$

Hive data types Medium

A.

STRUCT

B.

MAP

C.

ANY

D.

UNIONTYPE

26 $What is the default precision of the TIMESTAMP data type in Apache Hive?$

Hive data types Medium

A.

Microsecond precision

B.

Nanosecond precision

C.

Second precision

D.

Millisecond precision

27 $Which Hive configuration property must be enabled (set to true) to allow the automatic creation of partitions based on the values of the loaded data?$

Hive partitioning Medium

A.

hive.dynamic.partition.mode

B.

hive.exec.dynamic.partition

C.

hive.partition.dynamic.enable

D.

hive.exec.partition.dynamic

28 $If a table is partitioned by country and then by state, how will the nested directory structure be physically represented in HDFS for a record from California, USA?$

Hive partitioning Medium

A.

/table_path/state=California/country=USA/

B.

/table_path/country_USA_state_California/

C.

/table_path/country=USA/state=California/

D.

/table_path/USA/California/

29 $What happens if a user executes an INSERT statement into a table with dynamic partitioning enabled, but the hive.exec.dynamic.partition.mode is set to strict ?$

Hive partitioning Medium

A.

The query will fail completely because strict mode disables all dynamic partitions.

B.

The query requires the user to specify at least one static partition column.

C.

Hive will dynamically determine all partitions but limit the creation to 100 folders.

D.

Hive overrides the strict mode silently and proceeds in nonstrict mode.

30 $Which of the following accurately describes a major operational drawback of excessive partitioning (over-partitioning) on a high-cardinality column in Hive?$

Hive partitioning Medium

A.

It prevents the usage of complex data types like STRUCT or MAP .

B.

It creates a large number of small files, leading to high memory overhead on the Hadoop NameNode.

C.

It completely disables bucketing optimizations on the table.

D.

It forces Hive to bypass the execution engine and run natively on the NameNode.

31 $How does Hive determine the correct bucket for a specific row when a table is defined with the CLUSTERED BY (col_name) INTO N BUCKETS clause?$

Hive bucketing Medium

A.

By utilizing a MapReduce partitioner based on the file size of the incoming data

B.

By sorting the dataset and splitting it into perfectly equal parts

C.

By using the mathematical formula: hash_function(col_name) % N

D.

By distributing rows in a simple round-robin sequence across buckets

32 $Why is bucketing preferred over partitioning when distributing data based on a column with highly unique values, such as a user_id ?$

Hive bucketing Medium

A.

Bucketing enables dynamic schema evolution for nested queries.

B.

Bucketing automatically sorts the data globally across the entire cluster.

C.

Bucketing utilizes higher data compression ratios natively than partitioning.

D.

Bucketing strictly ensures a fixed number of files, mitigating the small-file problem.

33 $Historically, which configuration property had to be set to true to ensure standard INSERT operations populated the appropriate number of buckets correctly in Hive?$

Hive bucketing Medium

A.

hive.exec.bucketing.enable

B.

hive.mapred.bucketing.strict

C.

hive.optimize.bucketmapjoin

D.

hive.enforce.bucketing

34 $If two tables are physically bucketed and sorted by the same key, and share a proportional number of buckets, which advanced Hive query execution strategy becomes possible?$

Hive bucketing Medium

A.

Skew Join

B.

Map-Side Aggregation

C.

Vectorized Query Execution

D.

Sort-Merge Bucket (SMB) Join

35 $What is the physical outcome of executing DROP TABLE my_data; if my_data was defined as an EXTERNAL table?$

Hiveql operations Medium

A.

Both the metadata in the metastore and the data files in HDFS are deleted.

B.

The metadata in the metastore is removed, but the underlying data remains intact in HDFS.

C.

The command throws an error because external tables can only be unlinked, not dropped.

D.

The underlying data in HDFS is deleted, but the table schema remains in the metastore.

36 $Which specific HiveQL clause allows a user to write queries that read only a random or bucket-based subset of a table for exploratory data analysis?$

Hiveql operations Medium

A.

LIMIT

B.

FETCH FIRST

C.

TABLESAMPLE

D.

SAMPLE BY

37 $How does an INSERT OVERWRITE operation differ mechanically from a standard INSERT INTO operation in HiveQL?$

Hiveql operations Medium

A.

INSERT OVERWRITE automatically ignores rows that contain identical primary keys.

B.

INSERT OVERWRITE requires an explicit transaction lock and updates row values based on a WHERE clause.

C.

INSERT OVERWRITE drops the table completely and recreates a new schema based on the input stream.

D.

INSERT OVERWRITE clears the existing data in the target table or partition before writing the new data.

38 $In HiveQL, what is the core purpose of using a LATERAL VIEW in conjunction with a UDTF (User-Defined Table-Generating Function) like explode() ?$

Hiveql operations Medium

A.

It creates an immutable temporary view that behaves identically to an external table.

B.

It projects multiple row results into a compressed single ARRAY field for storage optimization.

C.

It allows expanding a complex data structure (like an array) into multiple rows and joining them back to the original source row.

D.

It restricts query execution to the map phase by skipping the reduce phase completely.

39 $Which of the following operators is known as the null-safe equality operator in HiveQL, capable of returning true if both operands happen to be NULL ?$

Hive operators Medium

A.

<=>

B.

=!=

C.

IS NOT DISTINCT FROM

D.

==

40 $When evaluating a WHERE clause in HiveQL, what exact functionality does the RLIKE operator provide?$

Hive operators Medium

A.

It checks if a specified string correctly matches a Java regular expression pattern.

B.

It securely joins strings by ignoring leading and trailing white spaces.

C.

It evaluates case-insensitive exact equality between a string and a column.

D.

It dynamically checks the phonetic similarity of two strings using Soundex.

41 $When configuring a Hive installation for a highly concurrent production environment, which metastore configuration approach eliminates the single point of failure and allows multiple HiveServer2 instances to scale efficiently?$

Hive installation Hard

A.

In-Memory Metastore using Redis

B.

Local Metastore with MySQL

C.

Remote Metastore Server with an external RDBMS

D.

Embedded Metastore with Derby

42 $Which set of properties must be correctly configured in hive-site.xml to successfully enable HiveServer2 High Availability (HA) using Apache ZooKeeper?$

Hive installation Hard

A.

hive.server2.support.dynamic.service.discovery and hive.zookeeper.quorum

B.

hive.server2.enable.doAs and hive.server2.zookeeper.namespace

C.

hive.execution.engine and hive.zookeeper.client.port

D.

hive.metastore.uris and hive.server2.thrift.port

43 $Consider a Hive table containing a column events of type ARRAY<STRUCT<id:INT, type:STRING>> . Which of the following syntaxes correctly retrieves the type of the second element in the array?$

Hive data types Hard

A.

events{1}.type

B.

events.type[1]

C.

events[2].type

D.

events[1].type

44 $In Hive, what is the behavior when a calculation involves a DECIMAL(p1, s1) and a DECIMAL(p2, s2) using multiplication, and the resulting precision exceeds the maximum allowed precision of 38?$

Hive data types Hard

A.

Hive truncates the most significant digits, preserving the exact scale requested.

B.

Hive automatically casts the result to a DOUBLE to preserve the magnitude.

C.

Hive throws an ArithmeticException and fails the query.

D.

Hive truncates the scale to accommodate the integral part, potentially resulting in loss of fractional precision.

45 $Which of the following implicitly casts properly without losing precision or failing in a standard Hive aggregate function like SUM() ?$

Hive data types Hard

A.

TIMESTAMP to DECIMAL

B.

FLOAT to INT

C.

INT to DOUBLE

D.

STRING to BIGINT

46 $A user executes an INSERT OVERWRITE statement with dynamic partitioning enabled. The configuration hive.exec.dynamic.partition.mode is set to strict . What condition MUST be met for the query to execute successfully?$

Hive partitioning Hard

A.

The maximum number of dynamic partitions (hive.exec.max.dynamic.partitions) must be overridden.

B.

At least one partition column must be statically specified.

C.

The table must be bucketed alongside being partitioned.

D.

All partition columns must be dynamically evaluated.

47 $When working with highly scaled tables stored on an object store (like AWS S3), why might a data engineer prefer using ALTER TABLE ADD PARTITION instead of MSCK REPAIR TABLE to register new data directories?$

Hive partitioning Hard

A.

ALTER TABLE ADD PARTITION automatically formats the underlying data into ORC format.

B.

MSCK REPAIR TABLE does not support external tables.

C.

MSCK REPAIR TABLE locks the entire database, whereas ALTER TABLE only locks the specific partition.

D.

MSCK REPAIR TABLE recursively scans the entire table's file system tree, which is an extremely slow metadata operation on object stores.

48 $Assume a Hive table is partitioned by country (STRING) and year (INT). If data is inserted for country='US' and year=2023, what is the exact default directory structure created in HDFS?$

Hive partitioning Hard

A.

.../table_name/US/2023/

B.

.../table_name/year=2023/country=US/

C.

.../table_name/country='US'/year='2023'/

D.

.../table_name/country=US/year=2023/

49 $How does Hive determine the exact bucket number for a given row based on its bucketing column col and the total number of buckets B ?$

Hive bucketing Hard

A.

(hash(col) & 0x7FFFFFFF) % B

B.

hash(col) % B

C.

length(col) % B

D.

murmur_hash(col) / B

50 $To perform a Sort-Merge-Bucket (SMB) Join optimally without shuffling data, what must be true about the two participating tables?$

Hive bucketing Hard

A.

One table must be small enough to fit entirely in memory to act as a hash table.

B.

They must be bucketed by the join key, sorted by the join key, and have the exact same number of buckets (or multiples of each other).

C.

They must be stored in TEXTFILE format and have hive.enforce.bucketing set to false.

D.

They must be partitioned by the same column and use the same execution engine.

51 $Consider a table sales bucketed into 32 buckets by customer_id . A user executes: SELECT * FROM sales TABLESAMPLE(BUCKET 3 OUT OF 8 ON customer_id); . How many buckets are actually scanned by this query?$

Hive bucketing Hard

A.

4 buckets

B.

8 buckets

C.

3 buckets

D.

1 bucket

52 $When using LATERAL VIEW explode(array_col) tbl AS val, what happens to rows in the base table where array_col is empty or NULL?$

Hiveql operations Hard

A.

They produce a single output row with a NULL value for val .

B.

The query throws a runtime error.

C.

They are completely dropped from the result set.

D.

They produce an output row with an empty string for val .

53 $Which of the following table property combinations is strictly REQUIRED to create an ACID transactional table in Hive 3.x?$

Hiveql operations Hard

A.

stored as parquet and tblproperties('transactional'='true')

B.

clustered by and stored as textfile

C.

partitioned by and tblproperties('transactional'='strict')

D.

stored as orc and tblproperties('transactional'='true')

54 $What is the primary operational difference between ORDER BY and SORT BY in HiveQL?$

Hiveql operations Hard

A.

ORDER BY requires a partition key, while SORT BY does not.

B.

ORDER BY guarantees total global ordering using a single reducer, while SORT BY guarantees partial ordering within each reducer.

C.

SORT BY performs sorting in memory before shuffling, whereas ORDER BY sorts on disk.

D.

SORT BY guarantees total global ordering across all reducers, while ORDER BY only orders within map tasks.

55 $In a HiveQL query, what is the effect of using CLUSTER BY col_name ?$

Hiveql operations Hard

A.

It distributes the data across reducers based on col_name's hash, but does not sort the data within the reducers.

B.

It optimizes the storage format of the table to ORC using col_name as the primary index.

C.

It is equivalent to DISTRIBUTE BY col_name SORT BY col_name DESC .

D.

It is a shortcut that is functionally equivalent to DISTRIBUTE BY col_name SORT BY col_name ASC .

56 $Which Hive operator should be used to safely compare two columns for equality where both columns might contain NULL values, ensuring it returns TRUE if both are NULL?$

Hive operators Hard

A.

==

B.

<=>

C.

IS NOT DISTINCT FROM

D.

=!=

57 $Evaluate the following Hive logical operation: SELECT NULL AND FALSE; . What is the result returned by Hive?$

Hive operators Hard

A.

FALSE

B.

TRUE

C.

A syntax error is thrown.

D.

NULL

58 $If a user applies the bitwise NOT operator ~ on a TINYINT column containing the value 1, what is the resulting integer value?$

Hive operators Hard

A.

254

B.

-1

C.

-2

D.

0

59 $Which of the following correctly describes the behavior of the RLIKE operator in Hive?$

Hive operators Hard

A.

It evaluates regular expressions but only returns TRUE if the entire string perfectly matches the pattern from start to finish.

B.

It evaluates regular expressions and returns TRUE if any substring of the string matches the pattern.

C.

It performs standard SQL wildcard matching using % and_.

D.

It performs a substring search optimized for large text fields without regex overhead.

60 $In a complex HiveQL query utilizing window functions, how does the frame specification ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING handle the first row of a partition?$

Hiveql operations Hard

A.

It throws an out-of-bounds error and stops query execution.

B.

It skips the first row entirely because the frame cannot be completely satisfied.

C.

It fills the 1 PRECEDING value with a NULL and processes it as part of the aggregation.

D.

It ignores the preceding bound, aggregating only the CURRENT ROW and the 1 FOLLOWING row.

Unit 4 - Practice Quiz