1

Define Ensemble Learning and explain the motivation behind using it. How does it address the Bias-Variance Tradeoff?

2

Differentiate between Bagging and Boosting with respect to their training mechanisms and objectives.

Feature	Bagging (Bootstrap Aggregating)	Boosting
Mechanism	Trains $N$ learners in parallel effectively independent of each other.	Trains $N$ learners sequentially; each learner depends on the previous one.
Data Sampling	Random sampling with replacement (Bootstrap).	Reweights data: points misclassified by previous learners get higher weights (or fits residuals).
Objective	To decrease variance and prevent overfitting.	To decrease bias and improve accuracy on difficult instances.
Aggregation	Simple averaging (regression) or majority voting (classification).	Weighted sum of the weak learners' outputs.
Example	Random Forest.	AdaBoost, Gradient Boosting, XGBoost.

3

Explain the concept of a Majority Voting Classifier. Distinguish between Hard Voting and Soft Voting.

4

Describe the working principle of the Random Forest algorithm. How does it introduce randomness to improve upon standard Bagging?

5

What is the Out-of-Bag (OOB) error in Random Forests, and why is it useful?

6

Explain the AdaBoost (Adaptive Boosting) algorithm. How does it update weights for misclassified instances?

7

What is the fundamental intuition behind Gradient Boosting Machines (GBM)? How does it differ from AdaBoost?

8

Explain XGBoost (eXtreme Gradient Boosting) and list three key features that make it superior to standard GBM.

9

Compare Level-wise tree growth vs. Leaf-wise tree growth strategies. Which algorithm uses which strategy?

10

Discuss the two novel techniques introduced by LightGBM: GOSS and EFB.

11

What distinguishes CatBoost from other Gradient Boosting frameworks regarding categorical data handling?

12

Explain the concept of Stacking (Stacked Generalization). How is it different from Voting?

13

What are Machine Learning Pipelines? Why are they essential in the context of Hyperparameter Tuning and Cross-Validation?

14

Compare Grid Search and Random Search for hyperparameter tuning. When would you prefer Random Search?

15

Explain Bayesian Optimization for hyperparameter tuning. What are the roles of the Surrogate Model and the Acquisition Function?

16

Describe K-Fold Cross-Validation and Stratified K-Fold Cross-Validation. When is Stratified K-Fold necessary?

17

How is Ensemble Regression performed? Discuss how Bagging and Boosting are adapted for regression tasks.

18

Derive or explain the XGBoost Objective Function considering the Loss term and the Regularization term.

19

What metrics are commonly used for Model Evaluation in Ensemble Learning for Classification and Regression problems?

20

How do Tree-based Ensembles (Random Forest, XGBoost) calculate Feature Importance?

Unit5 - Subjective Questions