Unit 3 - Practice Quiz

INT394 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following statements best describes K-Nearest Neighbours (KNN)?

A. It is a linear regression model used for classification.
B. It is an Eager learning algorithm that builds a model during training.
C. It is a probabilistic algorithm based on Bayes' Theorem.
D. It is a Lazy learning algorithm that stores the dataset and performs computation only during prediction.

2 In the K-Nearest Neighbours algorithm, what happens when the value of is very small (e.g., )?

A. The decision boundary becomes smooth.
B. The model captures noise in the training data, leading to overfitting.
C. The model becomes very simple and has high bias.
D. The model will always predict the majority class of the entire dataset.

3 Which distance metric is most commonly used in KNN for continuous variables, defined as ?

A. Euclidean Distance
B. Hamming Distance
C. Minkowski Distance
D. Manhattan Distance

4 Why is feature scaling (normalization/standardization) important in KNN?

A. To convert all categorical variables into numbers.
B. To prevent the algorithm from running too slowly.
C. It is not required for KNN.
D. Because KNN is based on distance metrics, and features with larger scales will dominate the distance calculation.

5 What is the primary disadvantage of KNN as the dataset size grows?

A. Training time becomes exponentially high.
B. It cannot handle multi-class classification.
C. It requires too many hyperparameters.
D. Prediction time becomes very slow because it scans the entire dataset.

6 Which of the following is a specific strategy to choose the optimal value of in KNN?

A. Always choose .
B. Always choose equal to the square root of the number of features.
C. Use Cross-Validation (e.g., Elbow method) to minimize error.
D. Choose the largest odd number possible.

7 In a Decision Tree, what does a leaf node represent?

A. A decision rule.
B. A test on a specific attribute.
C. The root of the tree.
D. A class label or a continuous value.

8 Decision Trees are considered 'greedy' algorithms. What does this mean?

A. They consume a lot of memory.
B. They make the locally optimal choice at each step with the hope of finding a global optimum.
C. They require all features to be used in the tree.
D. They revisit previous decisions to optimize the structure.

9 Which metric does the ID3 algorithm use to select the best attribute for splitting?

A. Gini Index
B. Gain Ratio
C. Chi-Square
D. Information Gain

10 If a dataset has a completely homogeneous distribution (all examples belong to one class), what is its Entropy?

A. $1$
B. Undefined
C. $0.5$
D. $0$

11 Calculate the Entropy of a binary classification problem where (positive probability) is $0.5$ and (negative probability) is $0.5$.

A. $1$
B. $0$
C. $2$
D. $0.5$

12 What is a major drawback of using Information Gain (as in ID3)?

A. It cannot handle binary data.
B. It is biased towards attributes with a large number of distinct values.
C. It only works for regression.
D. It is computationally expensive.

13 Which algorithm was introduced to overcome the bias of Information Gain towards attributes with many values?

A. C4.5
B. ID3
C. KNN
D. CART

14 Which impurity measure is used by the CART (Classification and Regression Trees) algorithm?

A. Log-Loss
B. Gini Index
C. Entropy
D. T-test

15 What is the range of the Gini Index for a binary classification problem?

A.
B.
C.
D.

16 How does the CART algorithm handle splits?

A. It does not split; it uses clustering.
B. It creates multi-way splits based on all categories.
C. It produces only binary splits (two child nodes).
D. It splits based on the highest standard deviation.

17 What is the primary purpose of 'Pruning' in Decision Trees?

A. To reduce the complexity of the tree and prevent overfitting.
B. To increase the depth of the tree.
C. To add more features to the dataset.
D. To speed up the training process.

18 Which of the following describes 'Pre-pruning'?

A. Using ensemble methods instead of a single tree.
B. Growing the full tree and then removing nodes.
C. Converting the tree into rules.
D. Halt the construction of the tree early if goodness measures fall below a threshold.

19 Which of the following is a technique used in Post-pruning?

A. Maximum Leaf Nodes
B. Cost Complexity Pruning (Weakest Link Pruning)
C. Maximum Depth limiting
D. Minimum Samples Split

20 Handling missing values in C4.5 involves:

A. Replacing missing values with the global mean.
B. Distributing the instance to all child nodes with weights proportional to the population of the child nodes.
C. Deleting the rows with missing values.
D. Stopping the algorithm.

21 Which of the following scenarios suggests a Decision Tree is overfitting?

A. High training error, high testing error.
B. Low training error, high testing error.
C. High training error, low testing error.
D. Low training error, low testing error.

22 What is the main idea behind Ensemble Learning?

A. To combine multiple weak models to create a strong predictive model.
B. To reduce the size of the dataset.
C. To unsupervisedly cluster data.
D. To find the single best algorithm for a problem.

23 Ensemble methods generally aim to reduce which two sources of error?

A. Bias and Variance
B. Precision and Recall
C. False Positives and False Negatives
D. Computation and Memory

24 Which ensemble method relies on 'Bootstrap Aggregating'?

A. Cascading
B. Stacking
C. Boosting
D. Bagging

25 In Bagging, how are the datasets for the individual models created?

A. By splitting the data into disjoint folds.
B. By sampling without replacement.
C. By sampling with replacement from the original dataset.
D. By selecting only the difficult instances.

26 Random Forest is a modification of Bagging. What specific feature does it add?

A. It boosts the weight of misclassified samples.
B. It uses neural networks as base learners.
C. It performs post-pruning on all trees.
D. It selects a random subset of features for splitting at each node.

27 What is Out-of-Bag (OOB) error in Random Forests?

A. The error calculated after pruning.
B. The error calculated on the validation set.
C. The error due to missing values.
D. The error calculated on the data samples that were not included in the bootstrap sample for a specific tree.

28 Which of the following is true about Boosting algorithms?

A. Models are trained sequentially, with each correcting the errors of the predecessor.
B. Models are trained in parallel.
C. It increases the variance of the model.
D. It does not use weights for instances.

29 In AdaBoost (Adaptive Boosting), how are weights updated?

A. Weights of correctly classified instances are increased.
B. Weights are assigned randomly.
C. All weights remain constant.
D. Weights of misclassified instances are increased.

30 What is the key difference between Bagging and Boosting regarding Bias and Variance?

A. Both reduce only variance.
B. Both reduce only bias.
C. Bagging reduces bias; Boosting reduces variance.
D. Bagging reduces variance; Boosting primarily reduces bias.

31 Gradient Boosting differs from AdaBoost in that it:

A. Uses a voting mechanism.
B. Cannot handle regression problems.
C. Updates instance weights directly.
D. Optimizes a loss function by training new models on the residual errors of previous models.

32 What is 'Stacking' (Stacked Generalization)?

A. Combining multiple weak learners using a simple average.
B. A method to stack data vertically.
C. Training a meta-model to learn how to combine the predictions of base models.
D. Using a stack data structure for decision trees.

33 In the context of Ensemble voting, what is 'Hard Voting'?

A. Selecting the class with the majority of votes from the classifiers.
B. Using a weighted average.
C. Selecting the class predicted by the most complex model.
D. Averaging the probabilities.

34 What is 'Soft Voting'?

A. Selecting the majority class.
B. Averaging the predicted class probabilities and choosing the class with the highest average probability.
C. Voting only on easy instances.
D. Random selection.

35 Which algorithm is a popular implementation of Gradient Boosting?

A. XGBoost
B. C4.5
C. K-Means
D. Apriori

36 The 'Curse of Dimensionality' negatively impacts KNN because:

A. In high-dimensional space, data becomes sparse, and all points tend to be equidistant.
B. It reduces the number of features.
C. It increases the bias.
D. KNN cannot handle more than 3 dimensions.

37 When building a decision tree for regression (e.g., CART), what is the typical splitting criterion?

A. Gini Index
B. Gain Ratio
C. Sum of Squared Errors (Variance reduction)
D. Information Gain

38 Can Decision Trees typically handle both numerical and categorical data?

A. Only if converted to binary.
B. No, only numerical.
C. Yes, they can handle both.
D. No, only categorical.

39 What is the definition of a 'Weak Learner' in boosting?

A. A model with 100% accuracy.
B. A model that takes a long time to train.
C. A model that performs slightly better than random guessing.
D. A model that is underfitted.

40 In ID3, what is the formula for Information Gain given Entropy ?

A.
B.
C.
D.

41 Which of the following is NOT an ensemble method?

A. Gradient Boosting Machines
B. AdaBoost
C. Logistic Regression
D. Random Forest

42 Why does Random Forest generally perform better than a single Decision Tree?

A. It is easier to interpret.
B. It reduces the risk of overfitting by averaging multiple trees.
C. It uses a deeper tree structure.
D. It requires less training data.

43 What is the role of the 'Learning Rate' in Gradient Boosting?

A. It scales the contribution of each tree to the final prediction.
B. It determines the number of neighbors in KNN.
C. It sets the initial weights of the data points.
D. It determines the size of the tree.

44 In the context of Weighted Majority Voting (Ensemble), how is the final prediction made?

A. Models are assigned weights based on their performance (e.g., accuracy), and the weighted sum determines the class.
B. The model with the highest weight decides alone.
C. All models have equal say.
D. The user manually selects the best model.

45 Which distance metric corresponds to norm?

A. Manhattan Distance
B. Chebyshev Distance
C. Euclidean Distance
D. Minkowski Distance

46 If a Decision Tree is fully grown until every leaf is pure, what is the likely outcome?

A. Underfitting
B. High Bias
C. High Variance (Overfitting)
D. Low Variance

47 Which algorithm uses the concept of 'Stump' (a one-level decision tree) as its typical weak learner?

A. KNN
B. AdaBoost
C. Random Forest
D. Stacking

48 How does 'Averaging' differ from 'Voting' in ensembles?

A. Averaging is for classification; Voting is for regression.
B. Averaging is used only in Boosting.
C. They are exactly the same.
D. Averaging is for regression; Voting is for classification.

49 Which equation represents the Gini Index for a node with class probabilities ?

A.
B.
C.
D.

50 What is the primary benefit of Stacking over Voting/Averaging?

A. It does not require a validation set.
B. It is faster to train.
C. It learns the optimal combination of base models rather than assuming equal or fixed weights.
D. It requires fewer base models.