1

What is Apache Hive, and what are its primary use cases in the Big Data ecosystem?

2

Explain the architecture of Apache Hive in detail. Discuss its major components.

3

What is the Hive Metastore? Discuss the different modes of deploying the Metastore.

4

Distinguish between Managed (Internal) Tables and External Tables in Apache Hive.

5

What is Partitioning in Hive? Explain the difference between Static and Dynamic Partitioning.

6

Describe the concept of Bucketing in Apache Hive. How does it work mathematically?

7

Compare and contrast Partitioning and Bucketing in Apache Hive.

Feature	Partitioning	Bucketing
Concept	Divides data based on distinct values of a column (e.g., Country, Date).	Divides data based on a hash function of a column into a fixed number of buckets.
HDFS Structure	Creates separate sub-directories for each partition.	Creates separate files within the table/partition directory.
Use Case	Ideal for columns with a low cardinality (fewer distinct values).	Ideal for columns with a high cardinality (many distinct values, e.g., User ID).
Problem Avoidance	Avoids full table scans (Partition Pruning).	Solves the "too many small files" problem that partitioning on high-cardinality columns would cause.
Query Benefit	Faster filtering/WHERE clauses.	Faster Map-side Joins and efficient sampling.

8

What is SerDe in Apache Hive? Explain its role during read and write operations.

9

Explain the different types of User Defined Functions (UDFs) available in Hive.

10

Discuss Map Join and Reduce-Side Join in Apache Hive. When should each be used?

11

Highlight the differences between Apache Hive and traditional RDBMS.

While Hive provides a SQL-like interface, it is fundamentally different from a traditional Relational Database Management System (RDBMS):

Feature	Traditional RDBMS	Apache Hive
Primary Use Case	OLTP (Online Transaction Processing)	OLAP (Online Analytical Processing)
Latency	Low latency, fast response times (milliseconds)	High latency, batch processing (minutes to hours)
Data Size	Gigabytes to Terabytes	Petabytes and beyond
Schema Enforcement	Schema on Write (validates data during insert)	Schema on Read (validates data only during query)
Updates/Deletes	Fully supported (Row-level)	Historically not supported; modern Hive supports ACID but it's not optimized for frequent row-level updates.
Scaling	Vertical scaling (Scale up)	Horizontal scaling (Scale out on commodity hardware)

12

Compare Apache Pig and Apache Hive. In what scenarios would you choose one over the other?

13

Describe the complex data types available in Apache Hive with examples.

14

Explain the step-by-step process of Query Execution in Apache Hive.

15

What are the common File Formats supported by Apache Hive? Discuss ORC and Parquet in detail.

16

Discuss various optimization techniques available in Apache Hive to improve query performance.

17

What are Views in Hive? How do they differ from tables?

18

Provide examples of DDL and DML commands in HiveQL.

19

What is HCatalog in the Hadoop ecosystem, and how does it relate to Apache Hive?

20

Explain the significance of the Execution Engine in Apache Hive. Compare MapReduce and Tez in this context.

Unit4 - Subjective Questions