Unit1 - Subjective Questions

INT312 • Practice Questions with Detailed Answers

1

Define Big Data and explain the '5 V's' of Big Data.

2

What is Apache Hadoop? Briefly explain its core components.

3

Explain the architecture of HDFS in detail. What are the roles of the NameNode and DataNode?

4

Describe the MapReduce programming model. How does it process large datasets?

5

How does HDFS ensure fault tolerance and high availability?

6

What is YARN? Explain the architecture of YARN including ResourceManager and NodeManager.

7

Compare and contrast Hadoop 1.x and Hadoop 2.x architectures.

8

Explain the Heartbeat mechanism in Hadoop. Why is it important?

9

What is the role of the Secondary NameNode? Is it a backup for the NameNode?

10

What is a block in HDFS? Explain why HDFS uses such large block sizes (e.g., 128 MB).

11

Explain the anatomy of a file read and write operation in HDFS.

12

Discuss the Data Replication strategy in HDFS. Give an example with a replication factor of 3.

13

Explain the High Availability (HA) architecture of NameNode introduced in Hadoop 2.x.

14

Detail the different phases of a MapReduce job (Map, Combine, Shuffle & Sort, Reduce).

15

Briefly describe the components of the Hadoop Ecosystem (Hive, Pig, Sqoop, Flume, HBase).

16

Discuss the scenarios where Hadoop is NOT the right tool for data processing.

17

What is Speculative Execution in Hadoop MapReduce?

18

Explain the concept of Rack Awareness in HDFS and its advantages.

19

Distinguish between traditional RDBMS and Apache Hadoop.

20

Discuss the concept of Data Locality in Hadoop. Why is it a fundamental principle of the framework?