Unit 5 - Practice Quiz

CSE274 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary motivation behind using Ensemble Learning methods compared to single individual models?

A. To increase the computational speed of training
B. To improve the predictive performance by combining the strengths of multiple models
C. To reduce the number of features required for training
D. To simplify the interpretability of the final model

2 In the context of ensemble learning, what is the Bias-Variance Trade-off implication for Bagging?

A. Bagging primarily reduces bias while variance remains high
B. Bagging primarily reduces variance while bias remains unchanged
C. Bagging increases both bias and variance
D. Bagging reduces bias but significantly increases variance

3 Which of the following describes Hard Voting in a classification ensemble?

A. Averaging the predicted probabilities of all classifiers
B. Weighting the votes based on the confidence of the classifier
C. Selecting the class that receives the majority of votes from the individual classifiers
D. Using a meta-classifier to learn from the predictions of base classifiers

4 What is Bootstrapping in the context of Bagging?

A. Sampling data subsets without replacement
B. Sampling data subsets with replacement
C. Sampling features only without touching rows
D. Training models sequentially on residuals

5 In a Random Forest, how does the algorithm introduce randomness beyond simple Bagging?

A. By using different loss functions for each tree
B. By selecting a random subset of features to consider for each split
C. By randomly pruning the trees after construction
D. By assigning random weights to data points

6 What is the Out-of-Bag (OOB) Error in Random Forests?

A. The error calculated on a separate validation set
B. The training error averaged across all trees
C. The error calculated using data points that were not included in the bootstrap sample for a specific tree
D. The error resulting from missing values in the dataset

7 Which algorithm is best described as an ensemble method that trains predictors sequentially, where each new predictor tries to correct the errors of its predecessor?

A. Bagging
B. Random Forest
C. Boosting
D. Stacking

8 In AdaBoost (Adaptive Boosting), how are the weights of misclassified instances updated?

A. Their weights are decreased
B. Their weights represent the average of neighbors
C. Their weights are increased so the next classifier focuses on them
D. Their weights remain constant throughout the process

9 What is the typical Base Estimator used in standard AdaBoost?

A. Deep Decision Trees
B. Decision Stumps (trees with depth 1)
C. Linear Regression models
D. Support Vector Machines with RBF kernels

10 Which loss function does AdaBoost essentially minimize?

A. Mean Squared Error (MSE)
B. Hinge Loss
C. Exponential Loss ()
D. Log Loss

11 How does Gradient Boosting differ from AdaBoost?

A. Gradient Boosting uses parallel processing, AdaBoost is sequential
B. Gradient Boosting fits new models to the residuals (negative gradients) of the previous model
C. Gradient Boosting only works for regression, AdaBoost only for classification
D. Gradient Boosting cannot use decision trees

12 In Gradient Boosting, what is the role of the Learning Rate (shrinkage)?

A. It determines the maximum depth of the trees
B. It scales the contribution of each new tree to the final prediction
C. It sets the number of cross-validation folds
D. It controls the ratio of features sampled

13 Which regularization technique is native to XGBoost but not standard Gradient Boosting (sklearn implementation)?

A. and regularization on leaf weights
B. Tree pruning based on max depth
C. Minimum samples per leaf
D. Bootstrap sampling

14 How does LightGBM typically grow its trees, differing from XGBoost's level-wise approach?

A. Depth-wise (column-wise)
B. Leaf-wise (best-first)
C. Breadth-first
D. Random growth

15 What is the primary feature of CatBoost that distinguishes it from other boosting libraries?

A. It only works on CPUs
B. It handles categorical features automatically using Ordered Target Statistics
C. It uses neural networks as base learners
D. It requires one-hot encoding for all inputs

16 In Ensemble Regression, if you have predictions from base regressor models, what is the simplest way to combine them?

A. Majority voting
B. Calculating the standard deviation
C. Simple Averaging ()
D. Selecting the prediction with the highest value

17 When building an ensemble pipeline, why is it crucial to perform data preprocessing (e.g., scaling) inside the cross-validation loop?

A. To save memory
B. To prevent Data Leakage
C. To speed up the training process
D. To ensure the scaler fits to the test set

18 What is Grid Search in the context of hyperparameter tuning?

A. An optimization algorithm using derivatives
B. A technique that tries every combination of a preset list of values for hyperparameters
C. A method of randomly selecting hyperparameters
D. A manual process of guessing parameters

19 What is the main advantage of Random Search over Grid Search?

A. It guarantees finding the global minimum
B. It is computationally more expensive
C. It is often more efficient because not all hyperparameters are equally important
D. It checks every possible combination

20 Which method uses a probabilistic model (often a Gaussian Process) to model the objective function and decide which hyperparameters to evaluate next?

A. Grid Search
B. Random Search
C. Bayesian Optimization
D. Gradient Descent

21 In K-Fold Cross-Validation, if , what percentage of the data is used for validation in each iteration?

A. 10%
B. 20%
C. 25%
D. 50%

22 Which cross-validation strategy is recommended for imbalanced classification datasets?

A. Standard K-Fold
B. Leave-One-Out CV
C. Stratified K-Fold
D. TimeSeriesSplit

23 What is the Condorcet's Jury Theorem relevance to Ensemble Learning?

A. It states that adding more weak learners always increases variance
B. It suggests that if individual classifiers are slightly better than random guess and independent, the majority vote accuracy approaches 100% as the number of voters increases
C. It proves that Neural Networks are superior to Decision Trees
D. It defines the stopping criteria for Boosting

24 In Stacking (Stacked Generalization), what is the 'Meta-Learner' trained on?

A. The original raw features
B. The residuals of the base models
C. The predictions (outputs) of the base models
D. Random noise

25 When using XGBoost, what does the parameter colsample_bytree control?

A. The fraction of rows to subsample
B. The learning rate
C. The fraction of columns (features) to be randomly sampled for each tree
D. The maximum depth of the tree

26 Which of the following is a disadvantage of Random Forests compared to a single Decision Tree?

A. Lower accuracy
B. Higher risk of overfitting
C. Lack of model interpretability/visualizability
D. Inability to handle categorical data

27 In the context of Pipelines, what is the purpose of the fit_transform() method?

A. It trains the model and makes predictions simultaneously
B. It fits the transformer to the data and then returns the transformed version of the data
C. It is used only for the final estimator in the pipeline
D. It transforms the data without learning any parameters

28 What is Nested Cross-Validation used for?

A. To tune hyperparameters only
B. To estimate the generalization error of the model while performing hyperparameter tuning, preventing bias
C. To visualize the decision boundary
D. To handle missing values in time series

29 What technique does LightGBM use to bundle mutually exclusive features to reduce dimensionality?

A. Gradient-based One-Side Sampling (GOSS)
B. Exclusive Feature Bundling (EFB)
C. Principal Component Analysis (PCA)
D. Feature hashing

30 In Bayesian Optimization, what is an Acquisition Function?

A. The function that calculates the training error
B. A function that guides the search by determining which point to evaluate next based on the surrogate model
C. The actual cost function of the machine learning model
D. A function to acquire data from the database

31 Which ensemble method is generally considered the fastest to train on large datasets among the following?

A. Standard Gradient Boosting (sklearn)
B. XGBoost (exact greedy algorithm)
C. LightGBM
D. Random Forest with 10000 trees

32 What is the effect of increasing n_estimators (number of trees) in Random Forest?

A. It causes severe overfitting
B. It increases the variance of the model
C. It stabilizes the error rate but increases computational cost
D. It reduces the bias significantly

33 What is the effect of increasing n_estimators in Boosting without adjusting the learning rate?

A. It always improves accuracy
B. It leads to overfitting
C. It decreases model complexity
D. It has no effect

34 What does GOSS stand for in LightGBM?

A. Global Optimization Search Strategy
B. Gradient-based One-Side Sampling
C. Generalized Ordered Subset Selection
D. Gaussian Over-Sampling Strategy

35 If an ensemble model uses Weighted Voting, how is the final class determined?

A.
B.
C.
D. Random selection

36 Which evaluation metric is most appropriate for a regression ensemble model predicting house prices?

A. Accuracy
B. F1-Score
C. Root Mean Squared Error (RMSE)
D. ROC-AUC

37 In Extremely Randomized Trees (ExtraTrees), how are splits chosen compared to Random Forest?

A. They calculate the optimal split for every feature
B. They select cut-points completely randomly for each feature and pick the best among them
C. They use the entire dataset instead of bootstrapping
D. They use Gradient Descent to find splits

38 What is Early Stopping in the context of training Gradient Boosting models?

A. Stopping training when the training error reaches zero
B. Stopping training when the validation score stops improving for a specified number of rounds
C. Stopping training after the first tree is built
D. Stopping training when CPU usage is too high

39 When performing Hyperparameter Tuning with sklearn, how do you access a parameter of a specific step in a Pipeline (e.g., n_estimators of a step named rf)?

A. rf.n_estimators
B. rf_n_estimators
C. rf__n_estimators (double underscore)
D. rf->n_estimators

40 Which of the following describes a Heterogeneous Ensemble?

A. An ensemble where all base learners are Decision Trees
B. An ensemble combining different types of algorithms (e.g., SVM + Naive Bayes + Decision Tree)
C. An ensemble used for regression only
D. An ensemble trained on different hardware

41 Why is Cross-Validation preferred over a single Train/Test split for model evaluation?

A. It is faster
B. It provides a more reliable estimate of model performance by reducing the variance associated with the data split
C. It eliminates the need for a test set completely
D. It automatically tunes hyperparameters

42 In CatBoost, what is the concept of Symmetric Trees?

A. Trees where the left child is always deeper than the right
B. Trees where the same split condition is applied to all nodes at the same depth
C. Trees that are mirror images of each other
D. Trees with only two leaves

43 What is the main drawback of Grid Search when the dimensionality of the hyperparameter space is high?

A. It is too inaccurate
B. It suffers from the Curse of Dimensionality and becomes computationally infeasible
C. It cannot handle categorical parameters
D. It requires GPU acceleration

44 Which method involves training a model on the full dataset and testing on a single observation, repeated for every observation?

A. K-Fold CV
B. Leave-One-Out Cross-Validation (LOOCV)
C. Stratified CV
D. Hold-out method

45 In the context of XGBoost, what is the purpose of the Gamma () parameter?

A. It is the learning rate
B. It is the minimum loss reduction required to make a further partition on a leaf node
C. It is the maximum depth
D. It is the subsample ratio

46 When using TimeSeriesSplit for cross-validation, how are the training and validation sets created?

A. Randomly shuffling time points
B. Successive training sets are supersets of those that come before them, preserving temporal order
C. Standard K-Fold split
D. Using future data to predict past data

47 Which of the following is an example of Automated Machine Learning (AutoML) capabilities regarding ensembles?

A. Manually tuning a Decision Tree
B. Automatically selecting, tuning, and stacking multiple models to form an ensemble
C. Writing a loop for Grid Search
D. Calculating the mean of a column

48 In Soft Voting, if Classifier A predicts [0.9, 0.1] and Classifier B predicts [0.6, 0.4] for classes [0, 1], what is the averaged probability for Class 0?

A. 0.9
B. 0.75
C. 0.6
D. 1.5

49 Why might one choose Random Search or Bayesian Optimization over manual tuning?

A. Manual tuning is always superior due to human intuition
B. To remove human bias and systematically explore the hyperparameter space more efficiently
C. Because manual tuning is not supported by Python libraries
D. To increase the bias of the model

50 What is the primary difference between Bagging and Pasting?

A. Bagging uses decision trees, Pasting uses SVMs
B. Bagging samples with replacement, Pasting samples without replacement
C. Bagging is for regression, Pasting is for classification
D. Pasting allows parallel processing, Bagging does not