Unit 3 - Practice Quiz

INT394

1 Which of the following statements best describes K-Nearest Neighbours (KNN)?

A. It is an Eager learning algorithm that builds a model during training.
B. It is a probabilistic algorithm based on Bayes' Theorem.
C. It is a Lazy learning algorithm that stores the dataset and performs computation only during prediction.
D. It is a linear regression model used for classification.

2 In the K-Nearest Neighbours algorithm, what happens when the value of is very small (e.g., )?

A. The model becomes very simple and has high bias.
B. The decision boundary becomes smooth.
C. The model captures noise in the training data, leading to overfitting.
D. The model will always predict the majority class of the entire dataset.

3 Which distance metric is most commonly used in KNN for continuous variables, defined as ?

A. Manhattan Distance
B. Hamming Distance
C. Euclidean Distance
D. Minkowski Distance

4 Why is feature scaling (normalization/standardization) important in KNN?

A. To prevent the algorithm from running too slowly.
B. Because KNN is based on distance metrics, and features with larger scales will dominate the distance calculation.
C. To convert all categorical variables into numbers.
D. It is not required for KNN.

5 What is the primary disadvantage of KNN as the dataset size grows?

A. Training time becomes exponentially high.
B. Prediction time becomes very slow because it scans the entire dataset.
C. It cannot handle multi-class classification.
D. It requires too many hyperparameters.

6 Which of the following is a specific strategy to choose the optimal value of in KNN?

A. Always choose .
B. Always choose equal to the square root of the number of features.
C. Use Cross-Validation (e.g., Elbow method) to minimize error.
D. Choose the largest odd number possible.

7 In a Decision Tree, what does a leaf node represent?

A. A decision rule.
B. A test on a specific attribute.
C. A class label or a continuous value.
D. The root of the tree.

8 Decision Trees are considered 'greedy' algorithms. What does this mean?

A. They consume a lot of memory.
B. They make the locally optimal choice at each step with the hope of finding a global optimum.
C. They revisit previous decisions to optimize the structure.
D. They require all features to be used in the tree.

9 Which metric does the ID3 algorithm use to select the best attribute for splitting?

A. Gini Index
B. Information Gain
C. Gain Ratio
D. Chi-Square

10 If a dataset has a completely homogeneous distribution (all examples belong to one class), what is its Entropy?

A. $0$
B. $1$
C. $0.5$
D. Undefined

11 Calculate the Entropy of a binary classification problem where (positive probability) is $0.5$ and (negative probability) is $0.5$.

A. $0$
B. $0.5$
C. $1$
D. $2$

12 What is a major drawback of using Information Gain (as in ID3)?

A. It is computationally expensive.
B. It cannot handle binary data.
C. It is biased towards attributes with a large number of distinct values.
D. It only works for regression.

13 Which algorithm was introduced to overcome the bias of Information Gain towards attributes with many values?

A. CART
B. ID3
C. C4.5
D. KNN

14 Which impurity measure is used by the CART (Classification and Regression Trees) algorithm?

A. Entropy
B. Gini Index
C. Log-Loss
D. T-test

15 What is the range of the Gini Index for a binary classification problem?

A.
B.
C.
D.

16 How does the CART algorithm handle splits?

A. It creates multi-way splits based on all categories.
B. It produces only binary splits (two child nodes).
C. It splits based on the highest standard deviation.
D. It does not split; it uses clustering.

17 What is the primary purpose of 'Pruning' in Decision Trees?

A. To increase the depth of the tree.
B. To reduce the complexity of the tree and prevent overfitting.
C. To add more features to the dataset.
D. To speed up the training process.

18 Which of the following describes 'Pre-pruning'?

A. Growing the full tree and then removing nodes.
B. Halt the construction of the tree early if goodness measures fall below a threshold.
C. Converting the tree into rules.
D. Using ensemble methods instead of a single tree.

19 Which of the following is a technique used in Post-pruning?

A. Maximum Depth limiting
B. Cost Complexity Pruning (Weakest Link Pruning)
C. Minimum Samples Split
D. Maximum Leaf Nodes

20 Handling missing values in C4.5 involves:

A. Deleting the rows with missing values.
B. Replacing missing values with the global mean.
C. Distributing the instance to all child nodes with weights proportional to the population of the child nodes.
D. Stopping the algorithm.

21 Which of the following scenarios suggests a Decision Tree is overfitting?

A. Low training error, low testing error.
B. High training error, high testing error.
C. Low training error, high testing error.
D. High training error, low testing error.

22 What is the main idea behind Ensemble Learning?

A. To find the single best algorithm for a problem.
B. To combine multiple weak models to create a strong predictive model.
C. To reduce the size of the dataset.
D. To unsupervisedly cluster data.

23 Ensemble methods generally aim to reduce which two sources of error?

A. Computation and Memory
B. Bias and Variance
C. Precision and Recall
D. False Positives and False Negatives

24 Which ensemble method relies on 'Bootstrap Aggregating'?

A. Boosting
B. Bagging
C. Stacking
D. Cascading

25 In Bagging, how are the datasets for the individual models created?

A. By splitting the data into disjoint folds.
B. By sampling with replacement from the original dataset.
C. By sampling without replacement.
D. By selecting only the difficult instances.

26 Random Forest is a modification of Bagging. What specific feature does it add?

A. It uses neural networks as base learners.
B. It boosts the weight of misclassified samples.
C. It selects a random subset of features for splitting at each node.
D. It performs post-pruning on all trees.

27 What is Out-of-Bag (OOB) error in Random Forests?

A. The error calculated on the validation set.
B. The error calculated on the data samples that were not included in the bootstrap sample for a specific tree.
C. The error due to missing values.
D. The error calculated after pruning.

28 Which of the following is true about Boosting algorithms?

A. Models are trained in parallel.
B. Models are trained sequentially, with each correcting the errors of the predecessor.
C. It increases the variance of the model.
D. It does not use weights for instances.

29 In AdaBoost (Adaptive Boosting), how are weights updated?

A. All weights remain constant.
B. Weights of correctly classified instances are increased.
C. Weights of misclassified instances are increased.
D. Weights are assigned randomly.

30 What is the key difference between Bagging and Boosting regarding Bias and Variance?

A. Bagging reduces bias; Boosting reduces variance.
B. Bagging reduces variance; Boosting primarily reduces bias.
C. Both reduce only variance.
D. Both reduce only bias.

31 Gradient Boosting differs from AdaBoost in that it:

A. Updates instance weights directly.
B. Optimizes a loss function by training new models on the residual errors of previous models.
C. Uses a voting mechanism.
D. Cannot handle regression problems.

32 What is 'Stacking' (Stacked Generalization)?

A. A method to stack data vertically.
B. Combining multiple weak learners using a simple average.
C. Training a meta-model to learn how to combine the predictions of base models.
D. Using a stack data structure for decision trees.

33 In the context of Ensemble voting, what is 'Hard Voting'?

A. Averaging the probabilities.
B. Selecting the class with the majority of votes from the classifiers.
C. Using a weighted average.
D. Selecting the class predicted by the most complex model.

34 What is 'Soft Voting'?

A. Selecting the majority class.
B. Averaging the predicted class probabilities and choosing the class with the highest average probability.
C. Random selection.
D. Voting only on easy instances.

35 Which algorithm is a popular implementation of Gradient Boosting?

A. XGBoost
B. C4.5
C. Apriori
D. K-Means

36 The 'Curse of Dimensionality' negatively impacts KNN because:

A. It increases the bias.
B. In high-dimensional space, data becomes sparse, and all points tend to be equidistant.
C. It reduces the number of features.
D. KNN cannot handle more than 3 dimensions.

37 When building a decision tree for regression (e.g., CART), what is the typical splitting criterion?

A. Information Gain
B. Gini Index
C. Sum of Squared Errors (Variance reduction)
D. Gain Ratio

38 Can Decision Trees typically handle both numerical and categorical data?

A. No, only numerical.
B. No, only categorical.
C. Yes, they can handle both.
D. Only if converted to binary.

39 What is the definition of a 'Weak Learner' in boosting?

A. A model that performs slightly better than random guessing.
B. A model with 100% accuracy.
C. A model that is underfitted.
D. A model that takes a long time to train.

40 In ID3, what is the formula for Information Gain given Entropy ?

A.
B.
C.
D.

41 Which of the following is NOT an ensemble method?

A. Random Forest
B. AdaBoost
C. Logistic Regression
D. Gradient Boosting Machines

42 Why does Random Forest generally perform better than a single Decision Tree?

A. It is easier to interpret.
B. It reduces the risk of overfitting by averaging multiple trees.
C. It requires less training data.
D. It uses a deeper tree structure.

43 What is the role of the 'Learning Rate' in Gradient Boosting?

A. It determines the size of the tree.
B. It scales the contribution of each tree to the final prediction.
C. It determines the number of neighbors in KNN.
D. It sets the initial weights of the data points.

44 In the context of Weighted Majority Voting (Ensemble), how is the final prediction made?

A. All models have equal say.
B. Models are assigned weights based on their performance (e.g., accuracy), and the weighted sum determines the class.
C. The model with the highest weight decides alone.
D. The user manually selects the best model.

45 Which distance metric corresponds to norm?

A. Euclidean Distance
B. Manhattan Distance
C. Chebyshev Distance
D. Minkowski Distance

46 If a Decision Tree is fully grown until every leaf is pure, what is the likely outcome?

A. High Bias
B. High Variance (Overfitting)
C. Low Variance
D. Underfitting

47 Which algorithm uses the concept of 'Stump' (a one-level decision tree) as its typical weak learner?

A. Random Forest
B. AdaBoost
C. KNN
D. Stacking

48 How does 'Averaging' differ from 'Voting' in ensembles?

A. Averaging is for regression; Voting is for classification.
B. Averaging is for classification; Voting is for regression.
C. They are exactly the same.
D. Averaging is used only in Boosting.

49 Which equation represents the Gini Index for a node with class probabilities ?

A.
B.
C.
D.

50 What is the primary benefit of Stacking over Voting/Averaging?

A. It is faster to train.
B. It learns the optimal combination of base models rather than assuming equal or fixed weights.
C. It requires fewer base models.
D. It does not require a validation set.