1Which of the following algorithms is categorized as a 'Lazy Learner'?
A.K-Nearest Neighbors
B.Support Vector Machines
C.Naïve Bayes
D.Decision Trees
Correct Answer: K-Nearest Neighbors
Explanation:
K-Nearest Neighbors (k-NN) is a lazy learner because it does not build a model during the training phase; instead, it stores the training data and performs computation only when a prediction is requested.
Incorrect! Try again.
2In K-Nearest Neighbors, what is the likely effect of choosing a very small value for 'k' (e.g., k=1)?
A.The model becomes too simple
B.The model ignores local patterns
C.High variance and overfitting
D.High bias and low variance
Correct Answer: High variance and overfitting
Explanation:
A very small 'k' makes the model extremely sensitive to noise in the training data, leading to complex decision boundaries and overfitting (high variance).
Incorrect! Try again.
3Which distance metric is most commonly used in k-NN for continuous numerical variables?
A.Euclidean distance
B.Cosine similarity
C.Hamming distance
D.Jaccard similarity
Correct Answer: Euclidean distance
Explanation:
Euclidean distance is the standard metric used to calculate the straight-line distance between two points in Euclidean space for continuous numerical data.
Incorrect! Try again.
4Why is feature scaling (normalization/standardization) important in k-NN?
A.To convert categorical data to numerical
B.To increase the value of k
C.To prevent features with larger scales from dominating the distance calculation
D.To reduce the size of the dataset
Correct Answer: To prevent features with larger scales from dominating the distance calculation
Explanation:
Since k-NN relies on distance calculations, features with large magnitudes will outweigh features with smaller magnitudes unless they are scaled to a comparable range.
Incorrect! Try again.
5The 'Naïve' in Naïve Bayes refers to which fundamental assumption?
A.All features are conditionally independent given the class
B.The prior probabilities are equal
C.The algorithm is simple to implement
D.All features are dependent on each other
Correct Answer: All features are conditionally independent given the class
Explanation:
The algorithm assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature, simplifying the calculation of posterior probabilities.
Incorrect! Try again.
6Naïve Bayes is based on which mathematical theorem?
A.Central Limit Theorem
B.Taylor's Theorem
C.Bayes' Theorem
D.Pythagorean Theorem
Correct Answer: Bayes' Theorem
Explanation:
Naïve Bayes classifiers apply Bayes' theorem to calculate the probability of a class given a set of feature values.
Incorrect! Try again.
7What is the purpose of Laplace Smoothing in Naïve Bayes?
A.To handle the problem of zero probability for unseen features
B.To normalize the data
C.To reduce the dimensionality
D.To handle missing values
Correct Answer: To handle the problem of zero probability for unseen features
Explanation:
If a feature value never occurs with a specific class in the training set, the probability becomes zero, wiping out the entire calculation. Laplace smoothing adds a small count to all probabilities to prevent this.
Incorrect! Try again.
8Which strategy is primarily used to build Decision Trees?
A.Lazy learning
B.Divide and Conquer
C.Gradient Descent
D.Backpropagation
Correct Answer: Divide and Conquer
Explanation:
Decision trees use a recursive divide-and-conquer strategy (recursive partitioning) to split the data into subsets based on feature values.
Incorrect! Try again.
9In a Decision Tree, what does a leaf node represent?
A.The entropy value
B.The root of the tree
C.A class label or decision
D.A feature to split on
Correct Answer: A class label or decision
Explanation:
Leaf nodes (terminal nodes) represent the final outcome, class label, or decision value after traversing the tree rules.
Incorrect! Try again.
10Which metric is commonly used to measure impurity in Decision Trees?
A.R-squared
B.Gini Index
C.Euclidean distance
D.Correlation Coefficient
Correct Answer: Gini Index
Explanation:
The Gini Index (or Entropy) is used to measure the impurity or disorder of a set of samples to determine the best split.
Incorrect! Try again.
11The process of removing branches from a decision tree to prevent overfitting is called:
A.Scaling
B.Pruning
C.Regularization
D.Boosting
Correct Answer: Pruning
Explanation:
Pruning reduces the size of decision trees by removing sections of the tree that provide little power to classify instances, thereby reducing complexity and overfitting.
Incorrect! Try again.
12Which concept represents the expected reduction in entropy caused by partitioning the examples according to an attribute?
A.Maximum Margin
B.Information Gain
C.Log Loss
D.Gini Impurity
Correct Answer: Information Gain
Explanation:
Information Gain measures the change in entropy (information) from a prior state to a state that takes some information is given, used to select the best split attribute.
Incorrect! Try again.
13In rule-based classification, what does 'coverage' refer to?
A.The number of instances that satisfy the rule's condition
B.The number of features used
C.The complexity of the rule
D.The accuracy of the rule
Correct Answer: The number of instances that satisfy the rule's condition
Explanation:
Coverage is the fraction or number of instances in the dataset that trigger the rule (i.e., satisfy the antecedent conditions).
Incorrect! Try again.
14The OneR (One Rule) algorithm generates rules based on:
A.The nearest neighbors
B.All attributes simultaneously
C.A random attribute
D.The single most informative attribute
Correct Answer: The single most informative attribute
Explanation:
OneR generates a simple rule set based on only one attribute—the one that yields the minimum error rate on the training data.
Incorrect! Try again.
15What is the primary objective of a Support Vector Machine (SVM)?
A.Find a hyperplane that maximizes the margin between classes
B.Create the deepest possible decision tree
C.Maximize the posterior probability
D.Minimize the number of features
Correct Answer: Find a hyperplane that maximizes the margin between classes
Explanation:
SVM aims to find the optimal hyperplane that separates data points of different classes with the maximum distance (margin) between the nearest points of each class.
Incorrect! Try again.
16The data points that lie closest to the decision boundary in an SVM are known as:
A.Outliers
B.Support Vectors
C.Noise
D.Centroids
Correct Answer: Support Vectors
Explanation:
Support vectors are the data points closest to the hyperplane. They are critical because they define the position and orientation of the hyperplane.
Incorrect! Try again.
17What technique does SVM use to handle non-linearly separable data?
A.Smoothing
B.Bagging
C.Pruning
D.Kernel Trick
Correct Answer: Kernel Trick
Explanation:
The Kernel Trick projects data into a higher-dimensional space where it becomes linearly separable, without explicitly computing the coordinates in that space.
Incorrect! Try again.
18In SVM, what is the role of the 'C' hyperparameter?
A.It controls the trade-off between maximizing the margin and minimizing classification errors
B.It determines the number of kernels
C.It calculates the Euclidean distance
D.It sets the depth of the tree
Correct Answer: It controls the trade-off between maximizing the margin and minimizing classification errors
Explanation:
A low 'C' allows more misclassifications (soft margin) for a wider margin, while a high 'C' penalizes misclassifications heavily (hard margin), potentially leading to overfitting.
Incorrect! Try again.
19In a Confusion Matrix, what does 'False Positive' (Type I Error) represent?
A.Incorrectly predicting the positive class when it is actually negative
B.Correctly predicting the positive class
C.Incorrectly predicting the negative class when it is actually positive
D.Correctly predicting the negative class
Correct Answer: Incorrectly predicting the positive class when it is actually negative
Explanation:
A False Positive occurs when the model predicts the positive class, but the true value is negative (e.g., diagnosing a healthy person as sick).
Incorrect! Try again.
20Which formula correctly calculates Accuracy?
A.2 (Precision Recall) / (Precision + Recall)
B.(TP + TN) / (TP + TN + FP + FN)
C.TP / (TP + FP)
D.TP / (TP + FN)
Correct Answer: (TP + TN) / (TP + TN + FP + FN)
Explanation:
Accuracy is the ratio of correctly predicted observations (both True Positives and True Negatives) to the total observations.
Incorrect! Try again.
21Accuracy is often a misleading metric when:
A.The dataset is imbalanced
B.The model is a decision tree
C.The dataset is small
D.The dataset is perfectly balanced
Correct Answer: The dataset is imbalanced
Explanation:
In imbalanced datasets (e.g., 99% Class A, 1% Class B), a model predicting only Class A achieves 99% accuracy but fails to detect Class B, making accuracy misleading.
Incorrect! Try again.
22Which metric represents the ratio of correctly predicted positive observations to the total predicted positives?
A.Accuracy
B.Precision
C.Recall
D.Specificity
Correct Answer: Precision
Explanation:
Precision = TP / (TP + FP). It answers the question: 'Of all the instances predicted as positive, how many were actually positive?'
Incorrect! Try again.
23Recall is also known as:
A.Specificity
B.Sensitivity
C.F1 Score
D.Precision
Correct Answer: Sensitivity
Explanation:
Recall (TP / (TP + FN)) is synonymous with Sensitivity or True Positive Rate.
Incorrect! Try again.
24The F1 Score is the harmonic mean of which two metrics?
A.Precision and Recall
B.Accuracy and Error Rate
C.Sensitivity and Specificity
D.True Positive Rate and False Positive Rate
Correct Answer: Precision and Recall
Explanation:
F1 Score balances Precision and Recall, providing a single metric that penalizes extreme values in either.
Incorrect! Try again.
25Which metric would be most important for a spam detection system where it is acceptable to miss some spam, but critical not to delete legitimate emails (high cost of False Positive)?
A.Precision
B.Log Loss
C.Recall
D.Sensitivity
Correct Answer: Precision
Explanation:
High Precision minimizes False Positives. In this scenario, marking a legitimate email as spam (False Positive) is the worst outcome.
Incorrect! Try again.
26Which metric would be most important for cancer detection where missing a positive case is dangerous (high cost of False Negative)?
A.Precision
B.Accuracy
C.Specificity
D.Recall
Correct Answer: Recall
Explanation:
High Recall ensures that most actual positive cases are detected, minimizing False Negatives.
Incorrect! Try again.
27What does AUC stand for in the context of model evaluation?
A.Accuracy Under Classification
B.Area Under the Curve
C.Algorithm User Context
D.Average Unit Cost
Correct Answer: Area Under the Curve
Explanation:
AUC refers to the Area Under the ROC (Receiver Operating Characteristic) Curve.
Incorrect! Try again.
28The ROC curve plots which two metrics against each other?
A.Sensitivity vs Specificity
B.Precision vs Recall
C.Accuracy vs Loss
D.True Positive Rate vs False Positive Rate
Correct Answer: True Positive Rate vs False Positive Rate
Explanation:
The ROC curve shows the trade-off between the True Positive Rate (Recall) on the y-axis and the False Positive Rate on the x-axis at various threshold settings.
Incorrect! Try again.
29An AUC score of 0.5 indicates:
A.A model with high precision
B.A model that predicts randomly
C.A perfect model
D.A model with zero error
Correct Answer: A model that predicts randomly
Explanation:
An AUC of 0.5 suggests the model has no discriminative ability, effectively guessing as well as a random coin toss. A perfect model has an AUC of 1.0.
Incorrect! Try again.
30Logarithmic Loss (Log Loss) penalizes a classifier based on:
A.The confidence of the predicted probabilities
B.The number of misclassifications only
C.The depth of the tree
D.The number of support vectors
Correct Answer: The confidence of the predicted probabilities
Explanation:
Log Loss takes into account the uncertainty of your prediction based on how much it varies from the actual label. A confident wrong prediction is penalized heavily.
Incorrect! Try again.
31What is the ideal value for Logarithmic Loss?
A.0.5
B.0
C.100
D.1
Correct Answer: 0
Explanation:
Log Loss measures error, so lower is better. A value of 0 represents a perfect model.
Incorrect! Try again.
32Which of the following is a disadvantage of Decision Trees?
A.Requires feature scaling
B.Difficult to interpret
C.Prone to overfitting if not pruned
D.Cannot handle categorical data
Correct Answer: Prone to overfitting if not pruned
Explanation:
Decision trees can become very complex and memorize noise in the training data (overfitting) if constraints or pruning are not applied.
Incorrect! Try again.
33Which algorithm is generally considered a 'Black Box' model due to low interpretability?
A.Rules (RIPPER)
B.Linear Regression
C.Decision Trees
D.Support Vector Machines (with RBF kernel)
Correct Answer: Support Vector Machines (with RBF kernel)
Explanation:
SVMs, especially with non-linear kernels, operate in high-dimensional spaces that are difficult for humans to visualize or interpret compared to logic-based trees or rules.
Incorrect! Try again.
34In Naïve Bayes, what is the 'Posterior Probability'?
A.The probability of the evidence regardless of class
B.The probability of the class given the evidence
C.The probability of the evidence given the class
D.The initial probability of the class
Correct Answer: The probability of the class given the evidence
Explanation:
The posterior probability P(Class|Features) is the updated probability of the class after observing the feature data.
Incorrect! Try again.
35Which classification algorithm is parametric?
A.None of the above
B.Decision Trees
C.Naïve Bayes
D.K-Nearest Neighbors
Correct Answer: Naïve Bayes
Explanation:
Naïve Bayes is a parametric algorithm because it summarizes data with a fixed set of parameters (means and variances for Gaussian NB, or probabilities) rather than storing the data itself.
Incorrect! Try again.
36What is the 'Hinge Loss' function associated with?
A.K-Means
B.Decision Trees
C.Logistic Regression
D.Support Vector Machines
Correct Answer: Support Vector Machines
Explanation:
Hinge loss is the loss function used for training classifiers like SVMs, penalizing predictions that are on the wrong side of the margin.
Incorrect! Try again.
37What happens to the computational cost of k-NN during the prediction phase as the dataset size grows?
A.It increases significantly
B.It remains constant
C.It decreases
D.It becomes zero
Correct Answer: It increases significantly
Explanation:
Because k-NN is a lazy learner, it scans the entire training dataset to calculate distances for every new prediction, making it slow for large datasets.
Incorrect! Try again.
38Entropy in Information Theory is a measure of:
A.Accuracy
B.Disorder or Uncertainty
C.Distance
D.Margin width
Correct Answer: Disorder or Uncertainty
Explanation:
Entropy quantifies the amount of uncertainty or impurity in a dataset. A completely homogeneous set has an entropy of 0.
Incorrect! Try again.
39RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is an algorithm used for:
A.Clustering
B.Rule Induction
C.Regression
D.Dimensionality Reduction
Correct Answer: Rule Induction
Explanation:
RIPPER is a popular algorithm for generating a set of IF-THEN rules to classify data.
Incorrect! Try again.
40Which of the following describes a 'False Negative' (Type II Error)?
A.Predicting Positive when actually Positive
B.Predicting Positive when actually Negative
C.Predicting Negative when actually Negative
D.Predicting Negative when actually Positive
Correct Answer: Predicting Negative when actually Positive
Explanation:
A False Negative misses a positive instance, classifying it as negative (e.g., failing to detect a fire).
Incorrect! Try again.
41If Precision = 1.0 and Recall = 1.0, what is the F1 Score?
A.0.5
B.0
C.1.0
D.2.0
Correct Answer: 1.0
Explanation:
F1 Score is the harmonic mean. If both inputs are 1, the mean is 1. (211)/(1+1) = 1.
Incorrect! Try again.
42Which evaluation metric calculates the proportion of actual negatives that are correctly identified?
A.Recall
B.Specificity
C.Sensitivity
D.Precision
Correct Answer: Specificity
Explanation:
Specificity = TN / (TN + FP). It measures the model's ability to identify negative instances.
Incorrect! Try again.
43In a decision tree, if a node contains only samples from a single class, its entropy is:
A.Infinite
B.0.5
C.0
D.1
Correct Answer: 0
Explanation:
If a node is pure (all samples belong to one class), there is no disorder, so Entropy is 0.
Incorrect! Try again.
44Which kernel is the default for non-linear SVMs in many libraries?
A.Polynomial
B.Radial Basis Function (RBF)
C.Sigmoid
D.Linear
Correct Answer: Radial Basis Function (RBF)
Explanation:
RBF is the most commonly used kernel for non-linear SVM classification because it handles infinite dimensional space effectively.
Incorrect! Try again.
45Generative models like Naïve Bayes model:
A.The distribution of individual classes (Joint probability)
B.The error gradients
C.The boundary between classes directly
D.The distance between points
Correct Answer: The distribution of individual classes (Joint probability)
Explanation:
Generative models learn how the data is generated (P(X|Y) and P(Y)) to estimate P(Y|X), whereas discriminative models learn the boundary directly.
Incorrect! Try again.
46Recursive Partitioning is a technique synonymous with:
A.Building Decision Trees
B.Calculating Bayes probabilities
C.Optimizing SVM margins
D.Calculating k-NN distances
Correct Answer: Building Decision Trees
Explanation:
Recursive partitioning splits the data into smaller subsets repeatedly, which is the process used to construct decision trees.
Incorrect! Try again.
47When interpreting a Confusion Matrix for a multi-class problem (e.g., 3 classes), the matrix dimensions are:
A.2x2
B.3x1
C.1x3
D.3x3
Correct Answer: 3x3
Explanation:
A confusion matrix size is N x N, where N is the number of classes.
Incorrect! Try again.
48Which algorithm is most sensitive to outliers?
A.Naïve Bayes
B.Rules
C.K-Nearest Neighbors
D.Decision Trees
Correct Answer: K-Nearest Neighbors
Explanation:
Since k-NN relies on local distance to neighbors, a single outlier can significantly skew predictions if 'k' is small.
Incorrect! Try again.
49What is the relationship between Error Rate and Accuracy?
A.Error Rate = Accuracy / 2
B.Error Rate = 1 + Accuracy
C.Error Rate = Accuracy
D.Error Rate = 1 - Accuracy
Correct Answer: Error Rate = 1 - Accuracy
Explanation:
They are complementary. If Accuracy is 90% (0.9), the Error Rate is 10% (0.1).
Incorrect! Try again.
50The 'Zero Frequency' problem in Naïve Bayes is solved using:
A.Feature Scaling
B.Laplace Smoothing
C.Kernel Trick
D.Pruning
Correct Answer: Laplace Smoothing
Explanation:
Laplace smoothing adds a small count (usually 1) to frequency counts to ensure no probability is ever essentially zero.