1 $Which component of the prediction error results from the model's assumptions being too simple to capture the underlying structure of the data?$

A.

Variance

B.

Bias

C.

Irreducible Error

D.

Noise

2 $A model that captures random noise in the training data rather than the intended outputs is said to have:$

A.

High Bias

B.

Low Variance

C.

High Variance

D.

High Bias and High Variance

3 $What is the relationship between model complexity and the bias-variance trade-off?$

A.

As complexity increases, bias increases and variance decreases.

B.

As complexity increases, bias decreases and variance increases.

C.

As complexity increases, both bias and variance decrease.

D.

As complexity increases, both bias and variance increase.

4 $Which of the following describes 'Underfitting' in the context of the bias-variance trade-off?$

A.

Low Bias, High Variance

B.

High Bias, Low Variance

C.

Low Bias, Low Variance

D.

High Bias, High Variance

5 $Mathematical decomposition of the total error of a model consists of:$

A.

Bias + Variance

B.

Bias^2 + Variance

C.

Bias + Variance + Irreducible Error

D.

Bias^2 + Variance + Irreducible Error

6 $Which of the following errors cannot be reduced regardless of how good the model is?$

A.

Bias Error

B.

Variance Error

C.

Irreducible Error

D.

Systematic Error

7 $What is the primary purpose of Cross-Validation?$

A.

To increase the size of the dataset

B.

To assess how the results of a statistical analysis will generalize to an independent data set

C.

To eliminate outliers in the data

D.

To reduce the dimensionality of the data

8 $In K-folds cross-validation, if K equals the number of observations in the dataset (N), this method is known as:$

A.

Stratified K-fold

B.

Leave-One-Out Cross-Validation (LOOCV)

C.

Holdout Method

D.

Bootstrap

9 $Which of the following is a major disadvantage of Leave-One-Out Cross-Validation (LOOCV) compared to K-fold cross-validation (where K=5 or 10)?$

A.

It has higher bias.

B.

It is computationally expensive.

C.

It wastes too much training data.

D.

It is less accurate.

10 $In 5-fold cross-validation, what percentage of the data is used for testing in each iteration?$

A.

10%

B.

20%

C.

25%

D.

50%

11 $Compared to LOOCV, 10-fold cross-validation typically has:$

A.

Lower bias and higher variance

B.

Higher bias and lower variance

C.

Higher bias and higher variance

D.

Lower bias and lower variance

12 $What is 'Stratified' K-Fold Cross-Validation useful for?$

A.

Regression problems with continuous targets

B.

Time-series data

C.

Datasets with imbalanced class distributions

D.

Reducing computational time

13 $What does 'Bagging' stand for?$

A.

Bootstrap Aggregating

B.

Binary Aggregating

C.

Backward Aggregating

D.

Boosted Aggregating

14 $How does Bagging create different training sets?$

A.

By splitting the data into K distinct folds

B.

By sampling with replacement from the original dataset

C.

By sampling without replacement from the original dataset

D.

By selecting only the most difficult instances

15 $Bagging is particularly effective at reducing which component of error?$

A.

Bias

B.

Variance

C.

Noise

D.

Computation time

16 $In Bagging, how is the final prediction made for a regression problem?$

A.

Majority Voting

B.

Weighted Voting

C.

Averaging

D.

Selecting the single best model

17 $What is the 'Out-of-Bag' (OOB) error in Bagging?$

A.

The error on the training set

B.

The error calculated using data not included in the bootstrap sample

C.

The error calculated using an external validation set

D.

The error due to missing values

18 $Which ensemble method builds models sequentially, where each new model attempts to correct the errors of the previous one?$

A.

Bagging

B.

Random Forests

C.

Boosting

D.

Cross-Validation

19 $Random Forest is an extension of which technique?$

A.

Boosting

B.

Bagging

C.

Linear Regression

D.

K-Means Clustering

20 $In Random Forests, how are features selected for splitting a node?$

A.

All features are considered at every split.

B.

A random subset of features is considered at every split.

C.

The single best feature from the entire dataset is always chosen.

D.

Features are selected based on user preference.

21 $Why are Random Forests generally better than a single Decision Tree?$

A.

They are easier to interpret.

B.

They are faster to train.

C.

They reduce overfitting and variance.

D.

They provide a linear decision boundary.

22 $In the context of Boosting, what is a 'weak learner'?$

A.

A model that performs slightly better than random guessing

B.

A model that has 100% accuracy

C.

A model with high variance

D.

A model with complex architecture

23 $How does AdaBoost (Adaptive Boosting) handle misclassified instances?$

A.

It discards them.

B.

It decreases their weights.

C.

It increases their weights.

D.

It keeps their weights constant.

24 $Which of the following is a key difference between Bagging and Boosting?$

A.

Bagging trains models in parallel; Boosting trains models sequentially.

B.

Bagging increases bias; Boosting increases variance.

C.

Bagging uses the whole dataset; Boosting uses a subset.

D.

Bagging uses weighted voting; Boosting uses simple averaging.

25 $Gradient Boosting improves the model by minimizing:$

A.

The weights of the features

B.

A loss function using gradient descent

C.

The number of trees

D.

The variance of the data

26 $Which algorithm is most likely to overfit if the number of base estimators (iterations) is too large?$

A.

Bagging

B.

Random Forest

C.

Boosting

D.

Leave-One-Out CV

27 $Which hyperparameter in Random Forests controls the number of features to consider when looking for the best split?$

A.

n_estimators

B.

max_depth

C.

max_features (mtry)

D.

min_samples_leaf

28 $If a model has high bias, which ensemble method is most likely to improve performance?$

A.

Bagging

B.

Boosting

C.

Stratified Sampling

D.

Pruning

29 $If a model has high variance, which ensemble method is most likely to improve performance?$

A.

Bagging

B.

Gradient Descent

C.

Linear Regression

D.

Boosting (without regularization)

30 $In K-fold cross-validation, what is the trade-off when increasing K?$

A.

Bias increases, Variance decreases, Computation time decreases.

B.

Bias decreases, Variance increases, Computation time increases.

C.

Bias decreases, Variance decreases, Computation time decreases.

D.

Bias increases, Variance increases, Computation time increases.

31 $What is the typical base learner used in Random Forests?$

A.

Linear Regression

B.

Support Vector Machines

C.

Decision Trees

D.

Neural Networks

32 $Which of the following is NOT a benefit of Random Forests?$

A.

Handles high-dimensional data well

B.

Provides feature importance estimates

C.

Is easily interpretable visually like a single tree

D.

Robust to outliers

33 $When using Bootstrap sampling in Bagging, approximately what fraction of unique observations from the original dataset are included in each sample?$

A.

33%

B.

50%

C.

63.2%

D.

100%

34 $Which Boosting algorithm uses a learning rate parameter to shrink the contribution of each tree?$

A.

AdaBoost

B.

Gradient Boosting

C.

Bagging

D.

Random Forest

35 $In the bias-variance decomposition, if the total error is high and the training error is also high, the model suffers from:$

A.

High Variance

B.

High Bias

C.

Overfitting

D.

Low Bias

36 $Which cross-validation method involves randomly splitting the data into a training set and a test set without distinct 'folds'?$

A.

K-Fold CV

B.

Leave-One-Out CV

C.

Holdout Method

D.

Bootstrap

37 $What is 'Stacking' in the context of model performance?$

A.

Adding more features to the data

B.

Combining predictions from multiple different models using a meta-model

C.

Running Cross-Validation multiple times

D.

Using a single Deep Neural Network

38 $In Random Forests, increasing the number of trees (n_estimators) typically:$

A.

Increases overfitting significantly

B.

Stabilizes the error but increases training time

C.

Decreases bias significantly

D.

Decreases the computational cost

39 $Which of the following describes the 'Stump' often used in AdaBoost?$

A.

A tree with full depth

B.

A tree with only one split (depth = 1)

C.

A random forest with 10 trees

D.

A linear regression model

40 $XGBoost is a popular implementation of which algorithm?$

A.

Random Forest

B.

Gradient Boosting

C.

K-Nearest Neighbors

D.

Support Vector Machine

41 $In K-fold Cross-Validation, the final performance metric is usually calculated by:$

A.

Taking the best score among the K folds

B.

Taking the worst score among the K folds

C.

Averaging the scores of the K folds

D.

Summing the scores of the K folds

42 $What is the primary motivation for using Cross-Validation over a simple Train/Test split?$

A.

It is faster.

B.

It uses less data.

C.

It provides a less biased estimate of model performance on unseen data.

D.

It automatically tunes hyperparameters.

43 $In the context of bias-variance, a very deep Decision Tree without pruning usually exhibits:$

A.

High Bias, Low Variance

B.

Low Bias, High Variance

C.

Low Bias, Low Variance

D.

High Bias, High Variance

44 $Why does Random Forest usually perform better than Bagging with Decision Trees?$

A.

It uses more trees.

B.

It decorrelates the trees by restricting feature selection.

C.

It uses a different loss function.

D.

It does not use bootstrap sampling.

45 $The process of tuning hyperparameters using Cross-Validation is often called:$

A.

Grid Search

B.

Backpropagation

C.

Forward Selection

D.

Bagging

46 $When N is small (small dataset), which Cross-Validation method is preferred to maximize the data used for training?$

A.

Holdout (50/50 split)

B.

2-Fold CV

C.

Leave-One-Out CV

D.

Bootstrap

47 $Which technique allows for parallel processing during training?$

A.

Gradient Boosting

B.

AdaBoost

C.

Random Forest

D.

Recurrent Neural Networks

48 $What is the 'Learning Rate' in Boosting?$

A.

The speed at which the computer processes data

B.

A parameter scaling the contribution of each tree to the final prediction

C.

The percentage of data used for training

D.

The depth of the trees

49 $Which of the following is true regarding the bias-variance trade-off in K-Nearest Neighbors (KNN)?$

A.

Large K results in High Variance.

B.

Small K results in High Bias.

C.

Small K results in Low Bias and High Variance.

D.

K does not affect Bias or Variance.

50 $If your training error is 1% and your test error is 20%, your model is likely:$

A.

Underfitting

B.

Overfitting

C.

Perfectly balanced

D.

Experiencing high bias

Unit 6 - Practice Quiz