1Which component of the prediction error results from the model's assumptions being too simple to capture the underlying structure of the data?
A.Variance
B.Bias
C.Irreducible Error
D.Noise
Correct Answer: Bias
Explanation:Bias refers to the error introduced by approximating a real-world problem, which may be extremely complicated, by a much simpler model (e.g., linear regression). High bias often causes underfitting.
Incorrect! Try again.
2A model that captures random noise in the training data rather than the intended outputs is said to have:
A.High Bias
B.Low Variance
C.High Variance
D.High Bias and High Variance
Correct Answer: High Variance
Explanation:High variance means the model is highly sensitive to small fluctuations or noise in the training set, often leading to overfitting.
Incorrect! Try again.
3What is the relationship between model complexity and the bias-variance trade-off?
A.As complexity increases, bias increases and variance decreases.
B.As complexity increases, bias decreases and variance increases.
C.As complexity increases, both bias and variance decrease.
D.As complexity increases, both bias and variance increase.
Correct Answer: As complexity increases, bias decreases and variance increases.
Explanation:Simple models (low complexity) have high bias and low variance. Complex models (high complexity) fit the training data very well (low bias) but generalize poorly (high variance).
Incorrect! Try again.
4Which of the following describes 'Underfitting' in the context of the bias-variance trade-off?
A.Low Bias, High Variance
B.High Bias, Low Variance
C.Low Bias, Low Variance
D.High Bias, High Variance
Correct Answer: High Bias, Low Variance
Explanation:Underfitting occurs when a model is too simple to capture the data's pattern, resulting in high bias but stable results (low variance) across different training sets.
Incorrect! Try again.
5Mathematical decomposition of the total error of a model consists of:
Explanation:The expected test mean squared error (MSE) can be decomposed into the squared bias of the estimator, the variance of the estimator, and the irreducible error.
Incorrect! Try again.
6Which of the following errors cannot be reduced regardless of how good the model is?
A.Bias Error
B.Variance Error
C.Irreducible Error
D.Systematic Error
Correct Answer: Irreducible Error
Explanation:Irreducible error is the error associated with noise in the underlying system itself and cannot be eliminated by modeling.
Incorrect! Try again.
7What is the primary purpose of Cross-Validation?
A.To increase the size of the dataset
B.To assess how the results of a statistical analysis will generalize to an independent data set
C.To eliminate outliers in the data
D.To reduce the dimensionality of the data
Correct Answer: To assess how the results of a statistical analysis will generalize to an independent data set
Explanation:Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample to estimate the model's performance on unseen data.
Incorrect! Try again.
8In K-folds cross-validation, if K equals the number of observations in the dataset (N), this method is known as:
Explanation:LOOCV is a special case of K-folds cross-validation where K is equal to the total number of data points, meaning each fold consists of a single observation.
Incorrect! Try again.
9Which of the following is a major disadvantage of Leave-One-Out Cross-Validation (LOOCV) compared to K-fold cross-validation (where K=5 or 10)?
A.It has higher bias.
B.It is computationally expensive.
C.It wastes too much training data.
D.It is less accurate.
Correct Answer: It is computationally expensive.
Explanation:LOOCV requires fitting the model N times (where N is the number of observations), which can be extremely computationally expensive for large datasets.
Incorrect! Try again.
10In 5-fold cross-validation, what percentage of the data is used for testing in each iteration?
A.10%
B.20%
C.25%
D.50%
Correct Answer: 20%
Explanation:In k-fold cross-validation, the data is split into k parts. For 5-fold, 1/5th (or 20%) of the data is used for testing in each iteration.
Incorrect! Try again.
11Compared to LOOCV, 10-fold cross-validation typically has:
A.Lower bias and higher variance
B.Higher bias and lower variance
C.Higher bias and higher variance
D.Lower bias and lower variance
Correct Answer: Higher bias and lower variance
Explanation:LOOCV has very low bias (uses nearly all data) but high variance (training sets are almost identical). 10-fold introduces slight bias (less training data) but significantly reduces variance.
Incorrect! Try again.
12What is 'Stratified' K-Fold Cross-Validation useful for?
A.Regression problems with continuous targets
B.Time-series data
C.Datasets with imbalanced class distributions
D.Reducing computational time
Correct Answer: Datasets with imbalanced class distributions
Explanation:Stratified K-Fold ensures that each fold of the dataset has the same proportion of observations with a given label as the whole dataset, which is crucial for imbalanced classes.
Incorrect! Try again.
13What does 'Bagging' stand for?
A.Bootstrap Aggregating
B.Binary Aggregating
C.Backward Aggregating
D.Boosted Aggregating
Correct Answer: Bootstrap Aggregating
Explanation:Bagging is an acronym for Bootstrap Aggregating, an ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms.
Incorrect! Try again.
14How does Bagging create different training sets?
A.By splitting the data into K distinct folds
B.By sampling with replacement from the original dataset
C.By sampling without replacement from the original dataset
D.By selecting only the most difficult instances
Correct Answer: By sampling with replacement from the original dataset
Explanation:Bagging generates multiple versions of a predictor and uses these to get an aggregated predictor. The versions are obtained by training on bootstrap replicates (sampling with replacement).
Incorrect! Try again.
15Bagging is particularly effective at reducing which component of error?
A.Bias
B.Variance
C.Noise
D.Computation time
Correct Answer: Variance
Explanation:Bagging averages the predictions of multiple models, which helps to smooth out the noise and reduce variance, preventing overfitting.
Incorrect! Try again.
16In Bagging, how is the final prediction made for a regression problem?
A.Majority Voting
B.Weighted Voting
C.Averaging
D.Selecting the single best model
Correct Answer: Averaging
Explanation:For regression problems, the outputs of individual models in the bagging ensemble are averaged to obtain the final prediction.
Incorrect! Try again.
17What is the 'Out-of-Bag' (OOB) error in Bagging?
A.The error on the training set
B.The error calculated using data not included in the bootstrap sample
C.The error calculated using an external validation set
D.The error due to missing values
Correct Answer: The error calculated using data not included in the bootstrap sample
Explanation:Since bootstrapping involves sampling with replacement, some data points are left out of each tree's training set. These 'out-of-bag' observations are used to estimate the prediction error.
Incorrect! Try again.
18Which ensemble method builds models sequentially, where each new model attempts to correct the errors of the previous one?
A.Bagging
B.Random Forests
C.Boosting
D.Cross-Validation
Correct Answer: Boosting
Explanation:Boosting is an ensemble technique that attempts to create a strong classifier from a number of weak classifiers by training them sequentially to correct prior errors.
Incorrect! Try again.
19Random Forest is an extension of which technique?
A.Boosting
B.Bagging
C.Linear Regression
D.K-Means Clustering
Correct Answer: Bagging
Explanation:Random Forest is a modification of Bagging (Bootstrap Aggregating) that introduces random feature selection to de-correlate the trees.
Incorrect! Try again.
20In Random Forests, how are features selected for splitting a node?
A.All features are considered at every split.
B.A random subset of features is considered at every split.
C.The single best feature from the entire dataset is always chosen.
D.Features are selected based on user preference.
Correct Answer: A random subset of features is considered at every split.
Explanation:To ensure trees are less correlated, Random Forest searches for the best feature among a random subset of features (usually square root of total features) at each split.
Incorrect! Try again.
21Why are Random Forests generally better than a single Decision Tree?
A.They are easier to interpret.
B.They are faster to train.
C.They reduce overfitting and variance.
D.They provide a linear decision boundary.
Correct Answer: They reduce overfitting and variance.
Explanation:Single decision trees are prone to high variance (overfitting). Random Forests average many trees, which neutralizes individual errors and reduces variance.
Incorrect! Try again.
22In the context of Boosting, what is a 'weak learner'?
A.A model that performs slightly better than random guessing
B.A model that has 100% accuracy
C.A model with high variance
D.A model with complex architecture
Correct Answer: A model that performs slightly better than random guessing
Explanation:Boosting combines many weak learners (simple models, like shallow decision trees) that perform only slightly better than chance to form a strong learner.
Incorrect! Try again.
23How does AdaBoost (Adaptive Boosting) handle misclassified instances?
A.It discards them.
B.It decreases their weights.
C.It increases their weights.
D.It keeps their weights constant.
Correct Answer: It increases their weights.
Explanation:In AdaBoost, instances that are misclassified by the previous model are assigned higher weights so that the next model in the sequence focuses more on correcting them.
Incorrect! Try again.
24Which of the following is a key difference between Bagging and Boosting?
A.Bagging trains models in parallel; Boosting trains models sequentially.
Explanation:Bagging models are independent and can be trained simultaneously (parallel). Boosting models depend on the previous model's output and must be trained in order (sequential).
Incorrect! Try again.
25Gradient Boosting improves the model by minimizing:
A.The weights of the features
B.A loss function using gradient descent
C.The number of trees
D.The variance of the data
Correct Answer: A loss function using gradient descent
Explanation:Gradient Boosting builds sequential trees where each new tree models the gradients (residuals) of the loss function with respect to the prediction of the previous ensemble.
Incorrect! Try again.
26Which algorithm is most likely to overfit if the number of base estimators (iterations) is too large?
A.Bagging
B.Random Forest
C.Boosting
D.Leave-One-Out CV
Correct Answer: Boosting
Explanation:Because Boosting aggressively tries to reduce error by focusing on hard-to-predict examples, running it for too many iterations can lead to overfitting the noise in the training data.
Incorrect! Try again.
27Which hyperparameter in Random Forests controls the number of features to consider when looking for the best split?
A.n_estimators
B.max_depth
C.max_features (mtry)
D.min_samples_leaf
Correct Answer: max_features (mtry)
Explanation:The 'max_features' (or mtry in R) parameter determines the size of the random subset of features to inspect at each split point.
Incorrect! Try again.
28If a model has high bias, which ensemble method is most likely to improve performance?
A.Bagging
B.Boosting
C.Stratified Sampling
D.Pruning
Correct Answer: Boosting
Explanation:Boosting is primarily designed to reduce bias by sequentially correcting errors, making it ideal for converting weak, high-bias learners into strong ones.
Incorrect! Try again.
29If a model has high variance, which ensemble method is most likely to improve performance?
A.Bagging
B.Gradient Descent
C.Linear Regression
D.Boosting (without regularization)
Correct Answer: Bagging
Explanation:Bagging reduces variance by averaging multiple models trained on different samples, effectively smoothing out the decision boundary.
Incorrect! Try again.
30In K-fold cross-validation, what is the trade-off when increasing K?
A.Bias increases, Variance decreases, Computation time decreases.
B.Bias decreases, Variance increases, Computation time increases.
C.Bias decreases, Variance decreases, Computation time decreases.
D.Bias increases, Variance increases, Computation time increases.
Correct Answer: Bias decreases, Variance increases, Computation time increases.
Explanation:Higher K (approaching LOOCV) means larger training sets (lower bias) but higher overlap between training sets (higher variance) and more models to train (higher computation time).
Incorrect! Try again.
31What is the typical base learner used in Random Forests?
A.Linear Regression
B.Support Vector Machines
C.Decision Trees
D.Neural Networks
Correct Answer: Decision Trees
Explanation:Random Forests are an ensemble of Decision Trees.
Incorrect! Try again.
32Which of the following is NOT a benefit of Random Forests?
A.Handles high-dimensional data well
B.Provides feature importance estimates
C.Is easily interpretable visually like a single tree
D.Robust to outliers
Correct Answer: Is easily interpretable visually like a single tree
Explanation:While powerful, Random Forests are 'black boxes' compared to single decision trees because it is difficult to visualize or interpret the logic of hundreds of trees combined.
Incorrect! Try again.
33When using Bootstrap sampling in Bagging, approximately what fraction of unique observations from the original dataset are included in each sample?
A.33%
B.50%
C.63.2%
D.100%
Correct Answer: 63.2%
Explanation:Mathematically, as N approaches infinity, the probability of an observation being selected in a bootstrap sample is 1 - (1/e), which is approximately 0.632.
Incorrect! Try again.
34Which Boosting algorithm uses a learning rate parameter to shrink the contribution of each tree?
A.AdaBoost
B.Gradient Boosting
C.Bagging
D.Random Forest
Correct Answer: Gradient Boosting
Explanation:Gradient Boosting often uses a learning rate (shrinkage) to scale the contribution of each new tree, which helps prevent overfitting.
Incorrect! Try again.
35In the bias-variance decomposition, if the total error is high and the training error is also high, the model suffers from:
A.High Variance
B.High Bias
C.Overfitting
D.Low Bias
Correct Answer: High Bias
Explanation:High training error indicates the model cannot capture the underlying pattern of the data even on the data it was trained on, which is the definition of high bias (underfitting).
Incorrect! Try again.
36Which cross-validation method involves randomly splitting the data into a training set and a test set without distinct 'folds'?
A.K-Fold CV
B.Leave-One-Out CV
C.Holdout Method
D.Bootstrap
Correct Answer: Holdout Method
Explanation:The Holdout method simply splits the data into two parts (train and test) once. It is not an iterative technique like K-fold.
Incorrect! Try again.
37What is 'Stacking' in the context of model performance?
A.Adding more features to the data
B.Combining predictions from multiple different models using a meta-model
C.Running Cross-Validation multiple times
D.Using a single Deep Neural Network
Correct Answer: Combining predictions from multiple different models using a meta-model
Explanation:Stacking involves training a new model (meta-learner) to combine the predictions of several base models to improve performance.
Incorrect! Try again.
38In Random Forests, increasing the number of trees (n_estimators) typically:
A.Increases overfitting significantly
B.Stabilizes the error but increases training time
C.Decreases bias significantly
D.Decreases the computational cost
Correct Answer: Stabilizes the error but increases training time
Explanation:Unlike Boosting, adding more trees to a Random Forest does not usually cause overfitting; the error rate stabilizes, but computational cost increases.
Incorrect! Try again.
39Which of the following describes the 'Stump' often used in AdaBoost?
A.A tree with full depth
B.A tree with only one split (depth = 1)
C.A random forest with 10 trees
D.A linear regression model
Correct Answer: A tree with only one split (depth = 1)
Explanation:AdaBoost often uses 'decision stumps,' which are decision trees with a single split, acting as very weak learners.
Incorrect! Try again.
40XGBoost is a popular implementation of which algorithm?
A.Random Forest
B.Gradient Boosting
C.K-Nearest Neighbors
D.Support Vector Machine
Correct Answer: Gradient Boosting
Explanation:XGBoost stands for Extreme Gradient Boosting, which is an optimized distributed gradient boosting library.
Incorrect! Try again.
41In K-fold Cross-Validation, the final performance metric is usually calculated by:
A.Taking the best score among the K folds
B.Taking the worst score among the K folds
C.Averaging the scores of the K folds
D.Summing the scores of the K folds
Correct Answer: Averaging the scores of the K folds
Explanation:To get a robust estimate of model performance, the accuracy (or error) from each of the K folds is averaged.
Incorrect! Try again.
42What is the primary motivation for using Cross-Validation over a simple Train/Test split?
A.It is faster.
B.It uses less data.
C.It provides a less biased estimate of model performance on unseen data.
D.It automatically tunes hyperparameters.
Correct Answer: It provides a less biased estimate of model performance on unseen data.
Explanation:A simple split depends heavily on which data points end up in the test set. CV evaluates the model on all data points, providing a more reliable estimate.
Incorrect! Try again.
43In the context of bias-variance, a very deep Decision Tree without pruning usually exhibits:
A.High Bias, Low Variance
B.Low Bias, High Variance
C.Low Bias, Low Variance
D.High Bias, High Variance
Correct Answer: Low Bias, High Variance
Explanation:A deep tree can perfectly memorize the training data (low bias) but will likely fail to generalize to new data due to capturing noise (high variance).
Incorrect! Try again.
44Why does Random Forest usually perform better than Bagging with Decision Trees?
A.It uses more trees.
B.It decorrelates the trees by restricting feature selection.
C.It uses a different loss function.
D.It does not use bootstrap sampling.
Correct Answer: It decorrelates the trees by restricting feature selection.
Explanation:In standard Bagging, if one feature is very strong, all trees will look similar. Random Forest forces trees to use different features, reducing correlation and improving variance reduction.
Incorrect! Try again.
45The process of tuning hyperparameters using Cross-Validation is often called:
A.Grid Search
B.Backpropagation
C.Forward Selection
D.Bagging
Correct Answer: Grid Search
Explanation:Grid Search CV involves exhaustively generating candidates from a grid of parameter values and evaluating them using cross-validation.
Incorrect! Try again.
46When N is small (small dataset), which Cross-Validation method is preferred to maximize the data used for training?
A.Holdout (50/50 split)
B.2-Fold CV
C.Leave-One-Out CV
D.Bootstrap
Correct Answer: Leave-One-Out CV
Explanation:LOOCV uses N-1 samples for training in every iteration, maximizing the training data, which is crucial when the dataset is very small.
Incorrect! Try again.
47Which technique allows for parallel processing during training?
A.Gradient Boosting
B.AdaBoost
C.Random Forest
D.Recurrent Neural Networks
Correct Answer: Random Forest
Explanation:Since the trees in a Random Forest are built independently of one another, they can be trained in parallel on different CPU cores.
Incorrect! Try again.
48What is the 'Learning Rate' in Boosting?
A.The speed at which the computer processes data
B.A parameter scaling the contribution of each tree to the final prediction
C.The percentage of data used for training
D.The depth of the trees
Correct Answer: A parameter scaling the contribution of each tree to the final prediction
Explanation:The learning rate shrinks the contribution of each new tree. Lower learning rates generally require more trees but lead to better generalization.
Incorrect! Try again.
49Which of the following is true regarding the bias-variance trade-off in K-Nearest Neighbors (KNN)?
A.Large K results in High Variance.
B.Small K results in High Bias.
C.Small K results in Low Bias and High Variance.
D.K does not affect Bias or Variance.
Correct Answer: Small K results in Low Bias and High Variance.
Explanation:A small K means the model reacts to local noise (High Variance) but fits the training points closely (Low Bias). Large K smooths the boundary (High Bias, Low Variance).
Incorrect! Try again.
50If your training error is 1% and your test error is 20%, your model is likely:
A.Underfitting
B.Overfitting
C.Perfectly balanced
D.Experiencing high bias
Correct Answer: Overfitting
Explanation:A large gap between training performance (good) and test performance (bad) is the classic signature of overfitting (high variance).
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.