Unit4 - Subjective Questions

INT312 • Practice Questions with Detailed Answers

1

What is Apache Hive, and what are its primary use cases in the Big Data ecosystem?

2

Explain the architecture of Apache Hive in detail. Discuss its major components.

3

What is the Hive Metastore? Discuss the different modes of deploying the Metastore.

4

Distinguish between Managed (Internal) Tables and External Tables in Apache Hive.

5

What is Partitioning in Hive? Explain the difference between Static and Dynamic Partitioning.

6

Describe the concept of Bucketing in Apache Hive. How does it work mathematically?

7

Compare and contrast Partitioning and Bucketing in Apache Hive.

8

What is SerDe in Apache Hive? Explain its role during read and write operations.

9

Explain the different types of User Defined Functions (UDFs) available in Hive.

10

Discuss Map Join and Reduce-Side Join in Apache Hive. When should each be used?

11

Highlight the differences between Apache Hive and traditional RDBMS.

12

Compare Apache Pig and Apache Hive. In what scenarios would you choose one over the other?

13

Describe the complex data types available in Apache Hive with examples.

14

Explain the step-by-step process of Query Execution in Apache Hive.

15

What are the common File Formats supported by Apache Hive? Discuss ORC and Parquet in detail.

16

Discuss various optimization techniques available in Apache Hive to improve query performance.

17

What are Views in Hive? How do they differ from tables?

18

Provide examples of DDL and DML commands in HiveQL.

19

What is HCatalog in the Hadoop ecosystem, and how does it relate to Apache Hive?

20

Explain the significance of the Execution Engine in Apache Hive. Compare MapReduce and Tez in this context.