1Which of the following algorithms is categorized as a 'Lazy Learner'?
A.Decision Trees
B.Naïve Bayes
C.K-Nearest Neighbors
D.Support Vector Machines
Correct Answer: K-Nearest Neighbors
Explanation:K-Nearest Neighbors (k-NN) is a lazy learner because it does not build a model during the training phase; instead, it stores the training data and performs computation only when a prediction is requested.
Incorrect! Try again.
2In K-Nearest Neighbors, what is the likely effect of choosing a very small value for 'k' (e.g., k=1)?
A.High bias and low variance
B.The model becomes too simple
C.High variance and overfitting
D.The model ignores local patterns
Correct Answer: High variance and overfitting
Explanation:A very small 'k' makes the model extremely sensitive to noise in the training data, leading to complex decision boundaries and overfitting (high variance).
Incorrect! Try again.
3Which distance metric is most commonly used in k-NN for continuous numerical variables?
A.Hamming distance
B.Euclidean distance
C.Jaccard similarity
D.Cosine similarity
Correct Answer: Euclidean distance
Explanation:Euclidean distance is the standard metric used to calculate the straight-line distance between two points in Euclidean space for continuous numerical data.
Incorrect! Try again.
4Why is feature scaling (normalization/standardization) important in k-NN?
A.To increase the value of k
B.To prevent features with larger scales from dominating the distance calculation
C.To convert categorical data to numerical
D.To reduce the size of the dataset
Correct Answer: To prevent features with larger scales from dominating the distance calculation
Explanation:Since k-NN relies on distance calculations, features with large magnitudes will outweigh features with smaller magnitudes unless they are scaled to a comparable range.
Incorrect! Try again.
5The 'Naïve' in Naïve Bayes refers to which fundamental assumption?
A.The algorithm is simple to implement
B.All features are dependent on each other
C.All features are conditionally independent given the class
D.The prior probabilities are equal
Correct Answer: All features are conditionally independent given the class
Explanation:The algorithm assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature, simplifying the calculation of posterior probabilities.
Incorrect! Try again.
6Naïve Bayes is based on which mathematical theorem?
A.Pythagorean Theorem
B.Central Limit Theorem
C.Bayes' Theorem
D.Taylor's Theorem
Correct Answer: Bayes' Theorem
Explanation:Naïve Bayes classifiers apply Bayes' theorem to calculate the probability of a class given a set of feature values.
Incorrect! Try again.
7What is the purpose of Laplace Smoothing in Naïve Bayes?
A.To handle missing values
B.To handle the problem of zero probability for unseen features
C.To normalize the data
D.To reduce the dimensionality
Correct Answer: To handle the problem of zero probability for unseen features
Explanation:If a feature value never occurs with a specific class in the training set, the probability becomes zero, wiping out the entire calculation. Laplace smoothing adds a small count to all probabilities to prevent this.
Incorrect! Try again.
8Which strategy is primarily used to build Decision Trees?
A.Lazy learning
B.Divide and Conquer
C.Backpropagation
D.Gradient Descent
Correct Answer: Divide and Conquer
Explanation:Decision trees use a recursive divide-and-conquer strategy (recursive partitioning) to split the data into subsets based on feature values.
Incorrect! Try again.
9In a Decision Tree, what does a leaf node represent?
A.A feature to split on
B.The root of the tree
C.A class label or decision
D.The entropy value
Correct Answer: A class label or decision
Explanation:Leaf nodes (terminal nodes) represent the final outcome, class label, or decision value after traversing the tree rules.
Incorrect! Try again.
10Which metric is commonly used to measure impurity in Decision Trees?
A.Euclidean distance
B.Gini Index
C.Correlation Coefficient
D.R-squared
Correct Answer: Gini Index
Explanation:The Gini Index (or Entropy) is used to measure the impurity or disorder of a set of samples to determine the best split.
Incorrect! Try again.
11The process of removing branches from a decision tree to prevent overfitting is called:
A.Scaling
B.Regularization
C.Pruning
D.Boosting
Correct Answer: Pruning
Explanation:Pruning reduces the size of decision trees by removing sections of the tree that provide little power to classify instances, thereby reducing complexity and overfitting.
Incorrect! Try again.
12Which concept represents the expected reduction in entropy caused by partitioning the examples according to an attribute?
A.Gini Impurity
B.Information Gain
C.Log Loss
D.Maximum Margin
Correct Answer: Information Gain
Explanation:Information Gain measures the change in entropy (information) from a prior state to a state that takes some information is given, used to select the best split attribute.
Incorrect! Try again.
13In rule-based classification, what does 'coverage' refer to?
A.The accuracy of the rule
B.The number of instances that satisfy the rule's condition
C.The complexity of the rule
D.The number of features used
Correct Answer: The number of instances that satisfy the rule's condition
Explanation:Coverage is the fraction or number of instances in the dataset that trigger the rule (i.e., satisfy the antecedent conditions).
Incorrect! Try again.
14The OneR (One Rule) algorithm generates rules based on:
A.All attributes simultaneously
B.The single most informative attribute
C.A random attribute
D.The nearest neighbors
Correct Answer: The single most informative attribute
Explanation:OneR generates a simple rule set based on only one attribute—the one that yields the minimum error rate on the training data.
Incorrect! Try again.
15What is the primary objective of a Support Vector Machine (SVM)?
A.Minimize the number of features
B.Maximize the posterior probability
C.Find a hyperplane that maximizes the margin between classes
D.Create the deepest possible decision tree
Correct Answer: Find a hyperplane that maximizes the margin between classes
Explanation:SVM aims to find the optimal hyperplane that separates data points of different classes with the maximum distance (margin) between the nearest points of each class.
Incorrect! Try again.
16The data points that lie closest to the decision boundary in an SVM are known as:
A.Outliers
B.Support Vectors
C.Centroids
D.Noise
Correct Answer: Support Vectors
Explanation:Support vectors are the data points closest to the hyperplane. They are critical because they define the position and orientation of the hyperplane.
Incorrect! Try again.
17What technique does SVM use to handle non-linearly separable data?
A.Pruning
B.Kernel Trick
C.Smoothing
D.Bagging
Correct Answer: Kernel Trick
Explanation:The Kernel Trick projects data into a higher-dimensional space where it becomes linearly separable, without explicitly computing the coordinates in that space.
Incorrect! Try again.
18In SVM, what is the role of the 'C' hyperparameter?
A.It determines the number of kernels
B.It controls the trade-off between maximizing the margin and minimizing classification errors
C.It sets the depth of the tree
D.It calculates the Euclidean distance
Correct Answer: It controls the trade-off between maximizing the margin and minimizing classification errors
Explanation:A low 'C' allows more misclassifications (soft margin) for a wider margin, while a high 'C' penalizes misclassifications heavily (hard margin), potentially leading to overfitting.
Incorrect! Try again.
19In a Confusion Matrix, what does 'False Positive' (Type I Error) represent?
A.Correctly predicting the positive class
B.Correctly predicting the negative class
C.Incorrectly predicting the positive class when it is actually negative
D.Incorrectly predicting the negative class when it is actually positive
Correct Answer: Incorrectly predicting the positive class when it is actually negative
Explanation:A False Positive occurs when the model predicts the positive class, but the true value is negative (e.g., diagnosing a healthy person as sick).
Incorrect! Try again.
20Which formula correctly calculates Accuracy?
A.(TP + TN) / (TP + TN + FP + FN)
B.TP / (TP + FP)
C.TP / (TP + FN)
D.2 (Precision Recall) / (Precision + Recall)
Correct Answer: (TP + TN) / (TP + TN + FP + FN)
Explanation:Accuracy is the ratio of correctly predicted observations (both True Positives and True Negatives) to the total observations.
Incorrect! Try again.
21Accuracy is often a misleading metric when:
A.The dataset is small
B.The dataset is imbalanced
C.The dataset is perfectly balanced
D.The model is a decision tree
Correct Answer: The dataset is imbalanced
Explanation:In imbalanced datasets (e.g., 99% Class A, 1% Class B), a model predicting only Class A achieves 99% accuracy but fails to detect Class B, making accuracy misleading.
Incorrect! Try again.
22Which metric represents the ratio of correctly predicted positive observations to the total predicted positives?
A.Recall
B.Precision
C.Accuracy
D.Specificity
Correct Answer: Precision
Explanation:Precision = TP / (TP + FP). It answers the question: 'Of all the instances predicted as positive, how many were actually positive?'
Incorrect! Try again.
23Recall is also known as:
A.Precision
B.Specificity
C.Sensitivity
D.F1 Score
Correct Answer: Sensitivity
Explanation:Recall (TP / (TP + FN)) is synonymous with Sensitivity or True Positive Rate.
Incorrect! Try again.
24The F1 Score is the harmonic mean of which two metrics?
A.Accuracy and Error Rate
B.Precision and Recall
C.Sensitivity and Specificity
D.True Positive Rate and False Positive Rate
Correct Answer: Precision and Recall
Explanation:F1 Score balances Precision and Recall, providing a single metric that penalizes extreme values in either.
Incorrect! Try again.
25Which metric would be most important for a spam detection system where it is acceptable to miss some spam, but critical not to delete legitimate emails (high cost of False Positive)?
A.Recall
B.Precision
C.Log Loss
D.Sensitivity
Correct Answer: Precision
Explanation:High Precision minimizes False Positives. In this scenario, marking a legitimate email as spam (False Positive) is the worst outcome.
Incorrect! Try again.
26Which metric would be most important for cancer detection where missing a positive case is dangerous (high cost of False Negative)?
A.Precision
B.Recall
C.Specificity
D.Accuracy
Correct Answer: Recall
Explanation:High Recall ensures that most actual positive cases are detected, minimizing False Negatives.
Incorrect! Try again.
27What does AUC stand for in the context of model evaluation?
A.Area Under the Curve
B.Average Unit Cost
C.Algorithm User Context
D.Accuracy Under Classification
Correct Answer: Area Under the Curve
Explanation:AUC refers to the Area Under the ROC (Receiver Operating Characteristic) Curve.
Incorrect! Try again.
28The ROC curve plots which two metrics against each other?
A.Precision vs Recall
B.True Positive Rate vs False Positive Rate
C.Accuracy vs Loss
D.Sensitivity vs Specificity
Correct Answer: True Positive Rate vs False Positive Rate
Explanation:The ROC curve shows the trade-off between the True Positive Rate (Recall) on the y-axis and the False Positive Rate on the x-axis at various threshold settings.
Incorrect! Try again.
29An AUC score of 0.5 indicates:
A.A perfect model
B.A model that predicts randomly
C.A model with high precision
D.A model with zero error
Correct Answer: A model that predicts randomly
Explanation:An AUC of 0.5 suggests the model has no discriminative ability, effectively guessing as well as a random coin toss. A perfect model has an AUC of 1.0.
Incorrect! Try again.
30Logarithmic Loss (Log Loss) penalizes a classifier based on:
A.The number of misclassifications only
B.The confidence of the predicted probabilities
C.The depth of the tree
D.The number of support vectors
Correct Answer: The confidence of the predicted probabilities
Explanation:Log Loss takes into account the uncertainty of your prediction based on how much it varies from the actual label. A confident wrong prediction is penalized heavily.
Incorrect! Try again.
31What is the ideal value for Logarithmic Loss?
A.
B.0.5
C.1
D.100
Correct Answer:
Explanation:Log Loss measures error, so lower is better. A value of 0 represents a perfect model.
Incorrect! Try again.
32Which of the following is a disadvantage of Decision Trees?
A.Difficult to interpret
B.Requires feature scaling
C.Prone to overfitting if not pruned
D.Cannot handle categorical data
Correct Answer: Prone to overfitting if not pruned
Explanation:Decision trees can become very complex and memorize noise in the training data (overfitting) if constraints or pruning are not applied.
Incorrect! Try again.
33Which algorithm is generally considered a 'Black Box' model due to low interpretability?
A.Decision Trees
B.Rules (RIPPER)
C.Support Vector Machines (with RBF kernel)
D.Linear Regression
Correct Answer: Support Vector Machines (with RBF kernel)
Explanation:SVMs, especially with non-linear kernels, operate in high-dimensional spaces that are difficult for humans to visualize or interpret compared to logic-based trees or rules.
Incorrect! Try again.
34In Naïve Bayes, what is the 'Posterior Probability'?
A.The probability of the evidence given the class
B.The initial probability of the class
C.The probability of the class given the evidence
D.The probability of the evidence regardless of class
Correct Answer: The probability of the class given the evidence
Explanation:The posterior probability P(Class|Features) is the updated probability of the class after observing the feature data.
Incorrect! Try again.
35Which classification algorithm is parametric?
A.K-Nearest Neighbors
B.Decision Trees
C.Naïve Bayes
D.None of the above
Correct Answer: Naïve Bayes
Explanation:Naïve Bayes is a parametric algorithm because it summarizes data with a fixed set of parameters (means and variances for Gaussian NB, or probabilities) rather than storing the data itself.
Incorrect! Try again.
36What is the 'Hinge Loss' function associated with?
A.Logistic Regression
B.Decision Trees
C.Support Vector Machines
D.K-Means
Correct Answer: Support Vector Machines
Explanation:Hinge loss is the loss function used for training classifiers like SVMs, penalizing predictions that are on the wrong side of the margin.
Incorrect! Try again.
37What happens to the computational cost of k-NN during the prediction phase as the dataset size grows?
A.It remains constant
B.It decreases
C.It increases significantly
D.It becomes zero
Correct Answer: It increases significantly
Explanation:Because k-NN is a lazy learner, it scans the entire training dataset to calculate distances for every new prediction, making it slow for large datasets.
Incorrect! Try again.
38Entropy in Information Theory is a measure of:
A.Distance
B.Disorder or Uncertainty
C.Accuracy
D.Margin width
Correct Answer: Disorder or Uncertainty
Explanation:Entropy quantifies the amount of uncertainty or impurity in a dataset. A completely homogeneous set has an entropy of 0.
Incorrect! Try again.
39RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is an algorithm used for:
A.Rule Induction
B.Clustering
C.Regression
D.Dimensionality Reduction
Correct Answer: Rule Induction
Explanation:RIPPER is a popular algorithm for generating a set of IF-THEN rules to classify data.
Incorrect! Try again.
40Which of the following describes a 'False Negative' (Type II Error)?
A.Predicting Positive when actually Positive
B.Predicting Negative when actually Negative
C.Predicting Positive when actually Negative
D.Predicting Negative when actually Positive
Correct Answer: Predicting Negative when actually Positive
Explanation:A False Negative misses a positive instance, classifying it as negative (e.g., failing to detect a fire).
Incorrect! Try again.
41If Precision = 1.0 and Recall = 1.0, what is the F1 Score?
A.
B.0.5
C.1.0
D.2.0
Correct Answer: 1.0
Explanation:F1 Score is the harmonic mean. If both inputs are 1, the mean is 1. (211)/(1+1) = 1.
Incorrect! Try again.
42Which evaluation metric calculates the proportion of actual negatives that are correctly identified?
A.Sensitivity
B.Recall
C.Specificity
D.Precision
Correct Answer: Specificity
Explanation:Specificity = TN / (TN + FP). It measures the model's ability to identify negative instances.
Incorrect! Try again.
43In a decision tree, if a node contains only samples from a single class, its entropy is:
A.
B.0.5
C.1
D.Infinite
Correct Answer:
Explanation:If a node is pure (all samples belong to one class), there is no disorder, so Entropy is 0.
Incorrect! Try again.
44Which kernel is the default for non-linear SVMs in many libraries?
A.Linear
B.Polynomial
C.Radial Basis Function (RBF)
D.Sigmoid
Correct Answer: Radial Basis Function (RBF)
Explanation:RBF is the most commonly used kernel for non-linear SVM classification because it handles infinite dimensional space effectively.
Incorrect! Try again.
45Generative models like Naïve Bayes model:
A.The boundary between classes directly
B.The distribution of individual classes (Joint probability)
C.The distance between points
D.The error gradients
Correct Answer: The distribution of individual classes (Joint probability)
Explanation:Generative models learn how the data is generated (P(X|Y) and P(Y)) to estimate P(Y|X), whereas discriminative models learn the boundary directly.
Incorrect! Try again.
46Recursive Partitioning is a technique synonymous with:
A.Building Decision Trees
B.Calculating k-NN distances
C.Optimizing SVM margins
D.Calculating Bayes probabilities
Correct Answer: Building Decision Trees
Explanation:Recursive partitioning splits the data into smaller subsets repeatedly, which is the process used to construct decision trees.
Incorrect! Try again.
47When interpreting a Confusion Matrix for a multi-class problem (e.g., 3 classes), the matrix dimensions are:
A.2x2
B.3x3
C.1x3
D.3x1
Correct Answer: 3x3
Explanation:A confusion matrix size is N x N, where N is the number of classes.
Incorrect! Try again.
48Which algorithm is most sensitive to outliers?
A.Decision Trees
B.K-Nearest Neighbors
C.Naïve Bayes
D.Rules
Correct Answer: K-Nearest Neighbors
Explanation:Since k-NN relies on local distance to neighbors, a single outlier can significantly skew predictions if 'k' is small.
Incorrect! Try again.
49What is the relationship between Error Rate and Accuracy?
A.Error Rate = Accuracy
B.Error Rate = 1 - Accuracy
C.Error Rate = 1 + Accuracy
D.Error Rate = Accuracy / 2
Correct Answer: Error Rate = 1 - Accuracy
Explanation:They are complementary. If Accuracy is 90% (0.9), the Error Rate is 10% (0.1).
Incorrect! Try again.
50The 'Zero Frequency' problem in Naïve Bayes is solved using:
A.Pruning
B.Laplace Smoothing
C.Feature Scaling
D.Kernel Trick
Correct Answer: Laplace Smoothing
Explanation:Laplace smoothing adds a small count (usually 1) to frequency counts to ensure no probability is ever essentially zero.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.