1Which of the following statements best describes K-Nearest Neighbours (KNN)?
A.It is an Eager learning algorithm that builds a model during training.
B.It is a probabilistic algorithm based on Bayes' Theorem.
C.It is a Lazy learning algorithm that stores the dataset and performs computation only during prediction.
D.It is a linear regression model used for classification.
Correct Answer: It is a Lazy learning algorithm that stores the dataset and performs computation only during prediction.
Explanation:KNN is a lazy learner (instance-based learning) because it does not learn a discriminative function from the training data but memorizes the training dataset instead.
Incorrect! Try again.
2In the K-Nearest Neighbours algorithm, what happens when the value of is very small (e.g., )?
A.The model becomes very simple and has high bias.
B.The decision boundary becomes smooth.
C.The model captures noise in the training data, leading to overfitting.
D.The model will always predict the majority class of the entire dataset.
Correct Answer: The model captures noise in the training data, leading to overfitting.
Explanation:With a very small (like ), the algorithm is highly sensitive to noise and local outliers, resulting in a complex decision boundary and overfitting (high variance).
Incorrect! Try again.
3Which distance metric is most commonly used in KNN for continuous variables, defined as ?
A.Manhattan Distance
B.Hamming Distance
C.Euclidean Distance
D.Minkowski Distance
Correct Answer: Euclidean Distance
Explanation:The formula represents the Euclidean Distance, which is the straight-line distance between two points in Euclidean space.
Incorrect! Try again.
4Why is feature scaling (normalization/standardization) important in KNN?
A.To prevent the algorithm from running too slowly.
B.Because KNN is based on distance metrics, and features with larger scales will dominate the distance calculation.
C.To convert all categorical variables into numbers.
D.It is not required for KNN.
Correct Answer: Because KNN is based on distance metrics, and features with larger scales will dominate the distance calculation.
Explanation:If one feature has a range of 0-1000 and another 0-1, the distance calculation will be heavily biased toward the first feature. Scaling ensures all features contribute equally.
Incorrect! Try again.
5What is the primary disadvantage of KNN as the dataset size grows?
A.Training time becomes exponentially high.
B.Prediction time becomes very slow because it scans the entire dataset.
C.It cannot handle multi-class classification.
D.It requires too many hyperparameters.
Correct Answer: Prediction time becomes very slow because it scans the entire dataset.
Explanation:Since KNN is a lazy learner, the computational cost happens at the inference (prediction) stage. For every new query, it must calculate distances to all training points, making it slow for large datasets.
Incorrect! Try again.
6Which of the following is a specific strategy to choose the optimal value of in KNN?
A.Always choose .
B.Always choose equal to the square root of the number of features.
C.Use Cross-Validation (e.g., Elbow method) to minimize error.
D.Choose the largest odd number possible.
Correct Answer: Use Cross-Validation (e.g., Elbow method) to minimize error.
Explanation:Cross-validation involves testing different values and selecting the one that minimizes the validation error rate to balance bias and variance.
Incorrect! Try again.
7In a Decision Tree, what does a leaf node represent?
A.A decision rule.
B.A test on a specific attribute.
C.A class label or a continuous value.
D.The root of the tree.
Correct Answer: A class label or a continuous value.
Explanation:Leaf nodes (terminal nodes) represent the final outcome, which is a class label (in classification) or a numerical value (in regression).
Incorrect! Try again.
8Decision Trees are considered 'greedy' algorithms. What does this mean?
A.They consume a lot of memory.
B.They make the locally optimal choice at each step with the hope of finding a global optimum.
C.They revisit previous decisions to optimize the structure.
D.They require all features to be used in the tree.
Correct Answer: They make the locally optimal choice at each step with the hope of finding a global optimum.
Explanation:A greedy algorithm makes the best immediate decision at the current node (e.g., best split) without considering future consequences, often leading to a suboptimal global tree but offering computational efficiency.
Incorrect! Try again.
9Which metric does the ID3 algorithm use to select the best attribute for splitting?
A.Gini Index
B.Information Gain
C.Gain Ratio
D.Chi-Square
Correct Answer: Information Gain
Explanation:ID3 uses Information Gain, which relies on the reduction of Entropy, to determine the best attribute to split the data.
Incorrect! Try again.
10If a dataset has a completely homogeneous distribution (all examples belong to one class), what is its Entropy?
A.$0$
B.$1$
C.$0.5$
D.Undefined
Correct Answer: $0$
Explanation:Entropy measures impurity. If all examples are of the same class, there is no impurity/uncertainty, so Entropy is 0.
Incorrect! Try again.
11Calculate the Entropy of a binary classification problem where (positive probability) is $0.5$ and (negative probability) is $0.5$.
A.$0$
B.$0.5$
C.$1$
D.$2$
Correct Answer: $1$
Explanation:Using the formula , calculation: . This represents maximum impurity.
Incorrect! Try again.
12What is a major drawback of using Information Gain (as in ID3)?
A.It is computationally expensive.
B.It cannot handle binary data.
C.It is biased towards attributes with a large number of distinct values.
D.It only works for regression.
Correct Answer: It is biased towards attributes with a large number of distinct values.
Explanation:Information Gain favors attributes like 'ID' or 'Date' which have unique values for every instance, resulting in pure but useless splits. This bias is corrected by Gain Ratio.
Incorrect! Try again.
13Which algorithm was introduced to overcome the bias of Information Gain towards attributes with many values?
A.CART
B.ID3
C.C4.5
D.KNN
Correct Answer: C4.5
Explanation:The C4.5 algorithm introduced Gain Ratio, which normalizes the Information Gain by the 'Split Info' (intrinsic information) to penalize attributes with many branches.
Incorrect! Try again.
14Which impurity measure is used by the CART (Classification and Regression Trees) algorithm?
A.Entropy
B.Gini Index
C.Log-Loss
D.T-test
Correct Answer: Gini Index
Explanation:CART uses the Gini Index (or Gini Impurity) to measure the likelihood of incorrect classification of a new instance if it were randomly classified according to the distribution of class labels.
Incorrect! Try again.
15What is the range of the Gini Index for a binary classification problem?
A.
B.
C.
D.
Correct Answer:
Explanation:For binary classification, Gini is . The minimum is $0$ (pure) and the maximum is $0.5$ (perfectly mixed, ; ).
Incorrect! Try again.
16How does the CART algorithm handle splits?
A.It creates multi-way splits based on all categories.
B.It produces only binary splits (two child nodes).
C.It splits based on the highest standard deviation.
D.It does not split; it uses clustering.
Correct Answer: It produces only binary splits (two child nodes).
Explanation:CART constructs binary trees. Even if a categorical attribute has multiple values, CART groups them into two subsets for the split.
Incorrect! Try again.
17What is the primary purpose of 'Pruning' in Decision Trees?
A.To increase the depth of the tree.
B.To reduce the complexity of the tree and prevent overfitting.
C.To add more features to the dataset.
D.To speed up the training process.
Correct Answer: To reduce the complexity of the tree and prevent overfitting.
Explanation:Pruning removes sections of the tree that provide little power to classify instances, thereby reducing complexity and improving generalization (preventing overfitting).
Incorrect! Try again.
18Which of the following describes 'Pre-pruning'?
A.Growing the full tree and then removing nodes.
B.Halt the construction of the tree early if goodness measures fall below a threshold.
C.Converting the tree into rules.
D.Using ensemble methods instead of a single tree.
Correct Answer: Halt the construction of the tree early if goodness measures fall below a threshold.
Explanation:Pre-pruning stops the tree generation process before it fully classifies the training set (e.g., max depth reached, min samples per split), whereas post-pruning cuts back a fully grown tree.
Incorrect! Try again.
19Which of the following is a technique used in Post-pruning?
A.Maximum Depth limiting
B.Cost Complexity Pruning (Weakest Link Pruning)
C.Minimum Samples Split
D.Maximum Leaf Nodes
Correct Answer: Cost Complexity Pruning (Weakest Link Pruning)
Explanation:Cost Complexity Pruning is a post-pruning method that adds a penalty for the number of terminal nodes to the error rate, removing subtrees that do not justify their complexity.
Incorrect! Try again.
20Handling missing values in C4.5 involves:
A.Deleting the rows with missing values.
B.Replacing missing values with the global mean.
C.Distributing the instance to all child nodes with weights proportional to the population of the child nodes.
D.Stopping the algorithm.
Correct Answer: Distributing the instance to all child nodes with weights proportional to the population of the child nodes.
Explanation:C4.5 handles missing attribute values by sending the instance down all branches, weighted by the probability of seeing that branch's value in the training data.
Incorrect! Try again.
21Which of the following scenarios suggests a Decision Tree is overfitting?
A.Low training error, low testing error.
B.High training error, high testing error.
C.Low training error, high testing error.
D.High training error, low testing error.
Correct Answer: Low training error, high testing error.
Explanation:Overfitting occurs when the model memorizes the training data (low training error) but fails to generalize to unseen data (high testing error).
Incorrect! Try again.
22What is the main idea behind Ensemble Learning?
A.To find the single best algorithm for a problem.
B.To combine multiple weak models to create a strong predictive model.
C.To reduce the size of the dataset.
D.To unsupervisedly cluster data.
Correct Answer: To combine multiple weak models to create a strong predictive model.
Explanation:Ensemble learning aggregates the predictions of multiple base estimators (weak learners) to improve generalizability and robustness over a single estimator.
Incorrect! Try again.
23Ensemble methods generally aim to reduce which two sources of error?
A.Computation and Memory
B.Bias and Variance
C.Precision and Recall
D.False Positives and False Negatives
Correct Answer: Bias and Variance
Explanation:Ensembles aim to lower the Bias (error from erroneous assumptions) and Variance (sensitivity to small fluctuations in the training set).
Incorrect! Try again.
24Which ensemble method relies on 'Bootstrap Aggregating'?
A.Boosting
B.Bagging
C.Stacking
D.Cascading
Correct Answer: Bagging
Explanation:Bagging stands for Bootstrap Aggregating. It involves training models on random subsets of the data with replacement.
Incorrect! Try again.
25In Bagging, how are the datasets for the individual models created?
A.By splitting the data into disjoint folds.
B.By sampling with replacement from the original dataset.
C.By sampling without replacement.
D.By selecting only the difficult instances.
Correct Answer: By sampling with replacement from the original dataset.
Explanation:Bagging uses sampling with replacement, meaning the same data point can appear multiple times in a single training subset.
Incorrect! Try again.
26Random Forest is a modification of Bagging. What specific feature does it add?
A.It uses neural networks as base learners.
B.It boosts the weight of misclassified samples.
C.It selects a random subset of features for splitting at each node.
D.It performs post-pruning on all trees.
Correct Answer: It selects a random subset of features for splitting at each node.
Explanation:Random Forest decorrelates the trees by considering only a random subset of features at each split, preventing a single dominant feature from dictating the structure of all trees.
Incorrect! Try again.
27What is Out-of-Bag (OOB) error in Random Forests?
A.The error calculated on the validation set.
B.The error calculated on the data samples that were not included in the bootstrap sample for a specific tree.
C.The error due to missing values.
D.The error calculated after pruning.
Correct Answer: The error calculated on the data samples that were not included in the bootstrap sample for a specific tree.
Explanation:Since Bagging uses sampling with replacement, some data is left out for each tree. These OOB samples act as an internal validation set to estimate prediction error.
Incorrect! Try again.
28Which of the following is true about Boosting algorithms?
A.Models are trained in parallel.
B.Models are trained sequentially, with each correcting the errors of the predecessor.
C.It increases the variance of the model.
D.It does not use weights for instances.
Correct Answer: Models are trained sequentially, with each correcting the errors of the predecessor.
Explanation:Boosting is a sequential process where subsequent models focus on the instances that were misclassified by previous models.
Incorrect! Try again.
29In AdaBoost (Adaptive Boosting), how are weights updated?
A.All weights remain constant.
B.Weights of correctly classified instances are increased.
C.Weights of misclassified instances are increased.
D.Weights are assigned randomly.
Correct Answer: Weights of misclassified instances are increased.
Explanation:AdaBoost increases the weights of misclassified instances so that the next weak learner focuses more on getting these difficult cases right.
Incorrect! Try again.
30What is the key difference between Bagging and Boosting regarding Bias and Variance?
Explanation:Bagging (like Random Forest) averages high-variance models to reduce variance. Boosting takes high-bias weak learners and improves them sequentially to reduce bias.
Incorrect! Try again.
31Gradient Boosting differs from AdaBoost in that it:
A.Updates instance weights directly.
B.Optimizes a loss function by training new models on the residual errors of previous models.
C.Uses a voting mechanism.
D.Cannot handle regression problems.
Correct Answer: Optimizes a loss function by training new models on the residual errors of previous models.
Explanation:Gradient Boosting uses a gradient descent approach. Instead of updating weights, it trains the next model to predict the residuals (errors) of the current ensemble.
Incorrect! Try again.
32What is 'Stacking' (Stacked Generalization)?
A.A method to stack data vertically.
B.Combining multiple weak learners using a simple average.
C.Training a meta-model to learn how to combine the predictions of base models.
D.Using a stack data structure for decision trees.
Correct Answer: Training a meta-model to learn how to combine the predictions of base models.
Explanation:Stacking involves training base models and then training a meta-learner (or blender) that takes the predictions of the base models as input features to make the final prediction.
Incorrect! Try again.
33In the context of Ensemble voting, what is 'Hard Voting'?
A.Averaging the probabilities.
B.Selecting the class with the majority of votes from the classifiers.
C.Using a weighted average.
D.Selecting the class predicted by the most complex model.
Correct Answer: Selecting the class with the majority of votes from the classifiers.
Explanation:Hard Voting predicts the final class label based on the mode (majority vote) of the class labels predicted by the individual classifiers.
Incorrect! Try again.
34What is 'Soft Voting'?
A.Selecting the majority class.
B.Averaging the predicted class probabilities and choosing the class with the highest average probability.
C.Random selection.
D.Voting only on easy instances.
Correct Answer: Averaging the predicted class probabilities and choosing the class with the highest average probability.
Explanation:Soft Voting considers the confidence of the classifiers by averaging their probability outputs, often yielding better results than hard voting if the classifiers are well-calibrated.
Incorrect! Try again.
35Which algorithm is a popular implementation of Gradient Boosting?
A.XGBoost
B.C4.5
C.Apriori
D.K-Means
Correct Answer: XGBoost
Explanation:XGBoost (eXtreme Gradient Boosting) is a highly efficient and scalable implementation of gradient boosting.
Incorrect! Try again.
36The 'Curse of Dimensionality' negatively impacts KNN because:
A.It increases the bias.
B.In high-dimensional space, data becomes sparse, and all points tend to be equidistant.
C.It reduces the number of features.
D.KNN cannot handle more than 3 dimensions.
Correct Answer: In high-dimensional space, data becomes sparse, and all points tend to be equidistant.
Explanation:As dimensions increase, the volume of the space increases exponentially, making data sparse. Distance metrics lose meaning as the ratio of the distance to the nearest neighbor vs. the farthest neighbor approaches 1.
Incorrect! Try again.
37When building a decision tree for regression (e.g., CART), what is the typical splitting criterion?
A.Information Gain
B.Gini Index
C.Sum of Squared Errors (Variance reduction)
D.Gain Ratio
Correct Answer: Sum of Squared Errors (Variance reduction)
Explanation:For regression trees, the goal is to minimize the variance within the nodes. This is usually measured by the Sum of Squared Errors (SSE) or Mean Squared Error.
Incorrect! Try again.
38Can Decision Trees typically handle both numerical and categorical data?
A.No, only numerical.
B.No, only categorical.
C.Yes, they can handle both.
D.Only if converted to binary.
Correct Answer: Yes, they can handle both.
Explanation:Decision Trees are versatile and can natively handle both numerical and categorical attributes (though implementation details vary between ID3, C4.5, and CART).
Incorrect! Try again.
39What is the definition of a 'Weak Learner' in boosting?
A.A model that performs slightly better than random guessing.
B.A model with 100% accuracy.
C.A model that is underfitted.
D.A model that takes a long time to train.
Correct Answer: A model that performs slightly better than random guessing.
Explanation:In Boosting theory, a weak learner is a classifier that is only required to be slightly better than random chance (accuracy > 0.5 for binary classification).
Incorrect! Try again.
40In ID3, what is the formula for Information Gain given Entropy ?
A.
B.
C.
D.
Correct Answer:
Explanation:Information Gain is the entropy of the parent node minus the weighted sum of the entropy of the child nodes resulting from the split.
Incorrect! Try again.
41Which of the following is NOT an ensemble method?
A.Random Forest
B.AdaBoost
C.Logistic Regression
D.Gradient Boosting Machines
Correct Answer: Logistic Regression
Explanation:Logistic Regression is a standalone parametric linear classification model, not an ensemble method.
Incorrect! Try again.
42Why does Random Forest generally perform better than a single Decision Tree?
A.It is easier to interpret.
B.It reduces the risk of overfitting by averaging multiple trees.
C.It requires less training data.
D.It uses a deeper tree structure.
Correct Answer: It reduces the risk of overfitting by averaging multiple trees.
Explanation:A single deep decision tree has high variance (overfitting). Random Forest averages many uncorrelated trees, effectively reducing variance and improving generalization.
Incorrect! Try again.
43What is the role of the 'Learning Rate' in Gradient Boosting?
A.It determines the size of the tree.
B.It scales the contribution of each tree to the final prediction.
C.It determines the number of neighbors in KNN.
D.It sets the initial weights of the data points.
Correct Answer: It scales the contribution of each tree to the final prediction.
Explanation:The learning rate (shrinkage) shrinks the contribution of each new tree added to the model. A lower learning rate requires more trees but usually leads to better generalization.
Incorrect! Try again.
44In the context of Weighted Majority Voting (Ensemble), how is the final prediction made?
A.All models have equal say.
B.Models are assigned weights based on their performance (e.g., accuracy), and the weighted sum determines the class.
C.The model with the highest weight decides alone.
D.The user manually selects the best model.
Correct Answer: Models are assigned weights based on their performance (e.g., accuracy), and the weighted sum determines the class.
Explanation:In Weighted Voting, better-performing models are trusted more. Their votes are multiplied by a weight derived from their validation performance.
Incorrect! Try again.
45Which distance metric corresponds to norm?
A.Euclidean Distance
B.Manhattan Distance
C.Chebyshev Distance
D.Minkowski Distance
Correct Answer: Manhattan Distance
Explanation:Manhattan Distance is the sum of absolute differences (), which corresponds to the norm.
Incorrect! Try again.
46If a Decision Tree is fully grown until every leaf is pure, what is the likely outcome?
A.High Bias
B.High Variance (Overfitting)
C.Low Variance
D.Underfitting
Correct Answer: High Variance (Overfitting)
Explanation:A fully grown tree captures every distinct pattern and noise in the training data, leading to high variance and overfitting.
Incorrect! Try again.
47Which algorithm uses the concept of 'Stump' (a one-level decision tree) as its typical weak learner?
A.Random Forest
B.AdaBoost
C.KNN
D.Stacking
Correct Answer: AdaBoost
Explanation:AdaBoost typically uses Decision Stumps—trees with only one split—as the weak learners.
Incorrect! Try again.
48How does 'Averaging' differ from 'Voting' in ensembles?
A.Averaging is for regression; Voting is for classification.
B.Averaging is for classification; Voting is for regression.
C.They are exactly the same.
D.Averaging is used only in Boosting.
Correct Answer: Averaging is for regression; Voting is for classification.
Explanation:Generally, Voting (counting discrete class labels) is used for classification tasks, while Averaging (taking the mean of numerical predictions) is used for regression tasks.
Incorrect! Try again.
49Which equation represents the Gini Index for a node with class probabilities ?
A.
B.
C.
D.
Correct Answer:
Explanation:The Gini Index is calculated by subtracting the sum of the squared probabilities of each class from one.
Incorrect! Try again.
50What is the primary benefit of Stacking over Voting/Averaging?
A.It is faster to train.
B.It learns the optimal combination of base models rather than assuming equal or fixed weights.
C.It requires fewer base models.
D.It does not require a validation set.
Correct Answer: It learns the optimal combination of base models rather than assuming equal or fixed weights.
Explanation:Stacking uses a meta-learner (second-level model) to learn the best way to combine predictions, effectively correcting biases that simple voting or averaging might miss.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.