1Which of the following best describes the goal of Supervised Learning?
A.To find hidden structures in unlabeled data.
B.To learn a mapping from input variables () to an output variable () using labeled training data.
C.To maximize a reward signal through interaction with an environment.
D.To reduce the dimensionality of the dataset.
Correct Answer: To learn a mapping from input variables () to an output variable () using labeled training data.
Explanation:Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs (labels).
Incorrect! Try again.
2In the Perceptron algorithm, what is the update rule for the weight vector given a learning rate , target label , and predicted label ?
A.
B.
C.
D.
Correct Answer:
Explanation:The Perceptron update rule adjusts the weights in the direction of the error, scaled by the input vector and the learning rate .
Incorrect! Try again.
3A single-layer Perceptron can only classify data correctly if the data is:
A.Non-linearly separable
B.Linearly separable
C.High dimensional
D.Normally distributed
Correct Answer: Linearly separable
Explanation:A single-layer perceptron forms a linear decision boundary. It cannot solve problems like XOR, which are not linearly separable.
Incorrect! Try again.
4Which activation function is primarily used in Logistic Regression to map predictions to probabilities between 0 and 1?
A.ReLU
B.Tanh
C.Sigmoid
D.Linear
Correct Answer: Sigmoid
Explanation:The Sigmoid function, defined as , maps any real-valued number into the range .
Incorrect! Try again.
5What is the mathematical formulation of the Sigmoid function ?
A.
B.
C.
D.
Correct Answer:
Explanation:The standard logistic or sigmoid function is .
Incorrect! Try again.
6Which cost function is typically minimized in Logistic Regression?
A.Mean Squared Error (MSE)
B.Hinge Loss
C.Binary Cross-Entropy (Log Loss)
D.Gini Impurity
Correct Answer: Binary Cross-Entropy (Log Loss)
Explanation:Logistic regression uses Log Loss (Binary Cross-Entropy) because it is a convex function for sigmoid outputs, ensuring global minima during optimization.
Incorrect! Try again.
7In Naïve Bayes, what is the fundamental assumption made about the features?
A.Features are dependent on each other.
B.Features are mutually exclusive.
C.Features are conditionally independent given the class label.
D.Features must be categorical.
Correct Answer: Features are conditionally independent given the class label.
Explanation:The 'Naïve' aspect of Naïve Bayes is the assumption that the presence of a particular feature in a class is unrelated to the presence of any other feature.
Incorrect! Try again.
8What is the formula for Bayes' Theorem regarding the probability of class given data ?
A.
B.
C.
D.
Correct Answer:
Explanation:Bayes' theorem states that the posterior probability is the likelihood times the prior, divided by the evidence.
Incorrect! Try again.
9Why is Laplace Smoothing (Additive Smoothing) used in Naïve Bayes?
A.To handle non-linear data.
B.To prevent the probability from becoming zero for unseen features.
C.To normalize the dataset.
D.To reduce the dimensionality of the data.
Correct Answer: To prevent the probability from becoming zero for unseen features.
Explanation:If a categorical feature value was not present in the training set for a specific class, the likelihood becomes zero, wiping out the entire probability calculation. Laplace smoothing adds a small count to avoid this.
Incorrect! Try again.
10In a Confusion Matrix, what does a False Positive (FP) represent?
A.The model correctly predicted the positive class.
B.The model incorrectly predicted the positive class (actual was negative).
C.The model incorrectly predicted the negative class (actual was positive).
D.The model correctly predicted the negative class.
Correct Answer: The model incorrectly predicted the positive class (actual was negative).
Explanation:A False Positive is a 'Type I error' where the model predicts the positive class, but the ground truth is negative.
Incorrect! Try again.
11How is Precision calculated?
A.
B.
C.
D.
Correct Answer:
Explanation:Precision measures the accuracy of positive predictions: True Positives divided by all predicted positives (True Positives + False Positives).
Incorrect! Try again.
12How is Recall (Sensitivity) calculated?
A.
B.
C.
D.
Correct Answer:
Explanation:Recall measures the ability of the model to find all the positive samples: True Positives divided by all actual positives (True Positives + False Negatives).
Incorrect! Try again.
13The F1-Score is the harmonic mean of which two metrics?
A.Accuracy and Precision
B.Specificity and Sensitivity
C.Precision and Recall
D.TPR and FPR
Correct Answer: Precision and Recall
Explanation:The F1-Score balances Precision and Recall, calculated as .
Incorrect! Try again.
14Which of the following models is known as a Lazy Learner?
A.Logistic Regression
B.Support Vector Machine
C.K-Nearest Neighbors (KNN)
D.Decision Tree
Correct Answer: K-Nearest Neighbors (KNN)
Explanation:KNN is a lazy learner because it does not learn a discriminative function from the training data but memorizes the training dataset instead.
Incorrect! Try again.
15In K-Nearest Neighbors (KNN), what is the effect of choosing a very small value for (e.g., )?
A.High Bias, Low Variance
B.The decision boundary becomes very smooth.
C.The model becomes computationally cheaper.
D.High Variance, Low Bias (Overfitting)
Correct Answer: High Variance, Low Bias (Overfitting)
Explanation:With a small K, the model captures local noise and outliers, leading to a complex decision boundary (Overfitting/High Variance).
Incorrect! Try again.
16Which distance metric is most commonly used in KNN for continuous variables?
A.Jaccard Distance
B.Hamming Distance
C.Euclidean Distance
D.Cosine Similarity
Correct Answer: Euclidean Distance
Explanation:Euclidean distance ( norm) is the straight-line distance between two points and is the default for continuous data in KNN.
Incorrect! Try again.
17Why is feature scaling (normalization/standardization) crucial for distance-based models like KNN and SVM?
A.It is required for the code to compile.
B.Distance calculations are dominated by features with larger magnitudes.
C.It converts all features to categorical data.
D.It increases the number of dimensions.
Correct Answer: Distance calculations are dominated by features with larger magnitudes.
Explanation:Without scaling, a feature ranging from 0-1000 will overpower a feature ranging from 0-1 in distance calculations, biasing the model.
Incorrect! Try again.
18In a Decision Tree, which metric is commonly used to measure impurity for classification tasks?
A.Mean Squared Error
B.Gini Impurity
C.R-Squared
D.Euclidean Distance
Correct Answer: Gini Impurity
Explanation:Gini Impurity (and Entropy) are the standard metrics used to evaluate how 'pure' a node is during the splitting process in classification trees.
Incorrect! Try again.
19What is the formula for Entropy for a binary classification problem with positive probability and negative probability ?
A.
B.
C.
D.
Correct Answer:
Explanation:Entropy measures disorder or uncertainty. The formula involves the summation of for all classes.
Incorrect! Try again.
20What is Information Gain in the context of Decision Trees?
A.The total number of nodes in the tree.
B.The decrease in entropy (or impurity) achieved by splitting a node.
C.The increase in accuracy on the test set.
D.The depth of the tree.
Correct Answer: The decrease in entropy (or impurity) achieved by splitting a node.
Explanation:Information Gain is the difference between the entropy of the parent node and the weighted average entropy of the child nodes. The algorithm chooses the split with the highest gain.
Incorrect! Try again.
21What technique is used to reduce overfitting in Decision Trees by removing sections of the tree that provide little power to classify instances?
A.Boosting
B.Bagging
C.Pruning
D.Scaling
Correct Answer: Pruning
Explanation:Pruning involves cutting back branches of the tree that use features with low importance or result in overfitting to noise.
Incorrect! Try again.
22What is the primary objective of a Support Vector Machine (SVM)?
A.To minimize the number of support vectors.
B.To find the hyperplane that maximizes the margin between classes.
C.To find the hyperplane that minimizes the distance to all points.
D.To calculate the conditional probability of classes.
Correct Answer: To find the hyperplane that maximizes the margin between classes.
Explanation:SVM aims to find the optimal decision boundary (hyperplane) that has the largest distance (margin) to the nearest training data points of any class.
Incorrect! Try again.
23In SVM, what are the Support Vectors?
A.The data points furthest from the decision boundary.
B.The data points that lie closest to the decision boundary.
C.The center points of each class cluster.
D.The incorrectly classified points only.
Correct Answer: The data points that lie closest to the decision boundary.
Explanation:Support vectors are the critical elements of the training set; they lie on the margin boundaries and define the position of the hyperplane.
Incorrect! Try again.
24What allows SVMs to classify non-linearly separable data by mapping inputs into a higher-dimensional space?
A.Regularization
B.The Kernel Trick
C.Gradient Descent
D.Backpropagation
Correct Answer: The Kernel Trick
Explanation:The Kernel Trick allows SVM to compute the dot product in a high-dimensional feature space without explicitly calculating the transformation, enabling non-linear classification.
Incorrect! Try again.
25In SVM, what is the role of the regularization parameter C?
A.It determines the number of kernels.
B.It controls the trade-off between maximizing the margin and minimizing training classification errors.
C.It sets the threshold for probability.
D.It initializes the weights to zero.
Correct Answer: It controls the trade-off between maximizing the margin and minimizing training classification errors.
Explanation:A large C penalizes misclassifications heavily (hard margin), while a small C allows more misclassifications to achieve a wider margin (soft margin).
Incorrect! Try again.
26The ROC Curve plots which two metrics against each other?
A.Precision vs Recall
B.True Positive Rate (TPR) vs False Positive Rate (FPR)
Explanation:The Receiver Operating Characteristic (ROC) curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied, plotting TPR against FPR.
Incorrect! Try again.
27What does an AUC (Area Under the Curve) of 0.5 indicate?
A.Perfect classification.
B.The model performs no better than random guessing.
C.The model predicts all negatives.
D.High precision but low recall.
Correct Answer: The model performs no better than random guessing.
Explanation:The diagonal line on an ROC curve represents random guessing, which has an area of 0.5.
Incorrect! Try again.
28When is the Precision-Recall (PR) Curve preferred over the ROC Curve?
A.When the classes are perfectly balanced.
B.When the dataset is small.
C.When there is a severe class imbalance (e.g., rare disease detection).
D.When using a Decision Tree.
Correct Answer: When there is a severe class imbalance (e.g., rare disease detection).
Explanation:ROC curves can present an overly optimistic view of performance on imbalanced datasets. The PR curve focuses on the minority class (Positives) and is more informative in these cases.
Incorrect! Try again.
29What is the formula for the False Positive Rate (FPR)?
A.
B.
C.
D.
Correct Answer:
Explanation:FPR is the ratio of negative instances that are incorrectly classified as positive: .
Incorrect! Try again.
30Which of the following is a parametric model?
A.K-Nearest Neighbors
B.Decision Tree
C.Logistic Regression
D.Random Forest
Correct Answer: Logistic Regression
Explanation:Logistic Regression summarizes data with a fixed set of parameters (weights). KNN and Decision Trees are non-parametric as the model structure grows with the data.
Incorrect! Try again.
31In the context of the Confusion Matrix, what is Specificity?
A.True Positive Rate
B.True Negative Rate ()
C.Precision
D.Accuracy
Correct Answer: True Negative Rate ()
Explanation:Specificity measures the proportion of actual negatives that are correctly identified.
Incorrect! Try again.
32Which model produces decision boundaries that are always orthogonal to the feature axes (axis-aligned)?
A.Logistic Regression
B.Linear SVM
C.Decision Tree
D.Perceptron
Correct Answer: Decision Tree
Explanation:Decision trees split data based on a threshold of a single feature at a time, resulting in rectangular decision regions aligned with the axes.
Incorrect! Try again.
33If a classifier has high Precision but low Recall, what does this imply?
A.It captures most positive cases but has many false alarms.
B.It rarely predicts positive, but when it does, it is usually correct.
C.It performs randomly.
D.It predicts positive for almost everything.
Correct Answer: It rarely predicts positive, but when it does, it is usually correct.
Explanation:High precision means low False Positives. Low recall means high False Negatives. The model is conservative/strict in labeling positives.
Incorrect! Try again.
34For a Gaussian Naïve Bayes classifier, the likelihood of a continuous feature is calculated using:
A.The binomial distribution formula.
B.The probability density function (PDF) of a normal distribution.
C.Simple counting of occurrences.
D.The sigmoid function.
Correct Answer: The probability density function (PDF) of a normal distribution.
Explanation:Gaussian NB assumes continuous features follow a normal (Gaussian) distribution and uses the Gaussian PDF to estimate likelihoods.
Incorrect! Try again.
35What is the Hinge Loss function used for?
A.Linear Regression
B.Logistic Regression
C.Support Vector Machines (SVM)
D.Decision Trees
Correct Answer: Support Vector Machines (SVM)
Explanation:Hinge Loss, defined as , is the standard loss function for SVMs.
Incorrect! Try again.
36Which point on the ROC space represents a perfect classifier?
A.(0, 0)
B.(1, 1)
C.(0, 1) [Top-Left corner]
D.(1, 0) [Bottom-Right corner]
Correct Answer: (0, 1) [Top-Left corner]
Explanation:A perfect classifier has a True Positive Rate of 1 and a False Positive Rate of 0.
Incorrect! Try again.
37In Logistic Regression, the 'Log-Odds' or 'Logit' is linear with respect to:
A.The input features .
B.The predicted probability .
C.The error term.
D.The number of iterations.
Correct Answer: The input features .
Explanation:. The log-odds is a linear combination of the inputs.
Incorrect! Try again.
38When using KNN, increasing the value of K generally leads to:
A.More complex decision boundaries.
B.Smoother decision boundaries.
C.Zero training error.
D.Higher variance.
Correct Answer: Smoother decision boundaries.
Explanation:A larger K averages over more neighbors, smoothing out the decision boundary and reducing variance (but potentially increasing bias).
Incorrect! Try again.
39Which algorithm constructs a separating hyperplane using a One-vs-Rest (OvR) or One-vs-One (OvO) strategy for multi-class classification?
A.Decision Trees
B.Naïve Bayes
C.SVM (and other binary linear classifiers)
D.KNN
Correct Answer: SVM (and other binary linear classifiers)
Explanation:SVM is inherently a binary classifier. To handle multiple classes, strategies like OvR (one classifier per class) or OvO (one classifier per pair of classes) are used.
Incorrect! Try again.
40Which of the following is considered a Distance-Based Model?
A.Decision Tree
B.Naïve Bayes
C.K-Nearest Neighbors
D.Random Forest
Correct Answer: K-Nearest Neighbors
Explanation:KNN relies entirely on calculating distances between data points to make predictions.
Incorrect! Try again.
41In a decision tree, if a node contains only samples from a single class, its Entropy is:
A.1
B.0
C.0.5
D.Infinite
Correct Answer: 0
Explanation:If a node is pure (all samples belong to one class), there is no uncertainty, so Entropy is 0.
Incorrect! Try again.
42The Bias term ( or ) in a linear model allows the hyperplane to:
A.Pass through the origin only.
B.Shift away from the origin.
C.Become non-linear.
D.Rotate but not shift.
Correct Answer: Shift away from the origin.
Explanation:Without a bias term, a linear model is forced to pass through the origin. The bias allows the decision boundary to be offset.
Incorrect! Try again.
43Which metric corresponds to the Area Under the Precision-Recall Curve (AUC-PR)?
A.Accuracy
B.Average Precision (AP)
C.F1-Score
D.Balanced Accuracy
Correct Answer: Average Precision (AP)
Explanation:The area under the PR curve is often summarized as Average Precision, which provides a single score for the quality of the model across all thresholds.
Incorrect! Try again.
44What is a disadvantage of the Perceptron algorithm compared to Logistic Regression?
A.It cannot handle continuous data.
B.It does not provide probabilistic outputs.
C.It is computationally more expensive.
D.It always overfits.
Correct Answer: It does not provide probabilistic outputs.
Explanation:Perceptrons output hard class labels (0 or 1) based on a step function, whereas Logistic Regression outputs probabilities.
Incorrect! Try again.
45In Naïve Bayes, if , , and , what is ?
A.0.5
B.0.8
C.1.0
D.0.2
Correct Answer: 1.0
Explanation:Using Bayes Theorem: .
Incorrect! Try again.
46Which of the following is a Generative Model?
A.Logistic Regression
B.Support Vector Machine
C.Naïve Bayes
D.Perceptron
Correct Answer: Naïve Bayes
Explanation:Naïve Bayes models the joint probability (how the data is generated), while the others are Discriminative models modeling .
Incorrect! Try again.
47What is the computational complexity of the Training phase for KNN?
A. or effectively zero.
B.
C.
D.
Correct Answer: or effectively zero.
Explanation:KNN is a lazy learner; training simply involves storing the dataset. The computational cost occurs during the prediction (inference) phase.
Incorrect! Try again.
48Which kernel function is used to create a Radial Basis Function (RBF) SVM?
A.
B.
C.
D.
Correct Answer:
Explanation:The RBF kernel (or Gaussian kernel) uses the exponential of the negative squared Euclidean distance.
Incorrect! Try again.
49If you have a dataset with 1000 negative samples and 10 positive samples, which evaluation metric is misleading?
A.Precision
B.Recall
C.F1-Score
D.Accuracy
Correct Answer: Accuracy
Explanation:A model that simply predicts 'Negative' for everything will have ~99% accuracy but is useless. Accuracy is misleading in imbalanced datasets.
Incorrect! Try again.
50What is the relationship between Decision Tree depth and Overfitting?
A.Deeper trees are less likely to overfit.
B.Deeper trees are more likely to overfit.
C.Depth has no impact on overfitting.
D.Shallow trees have high variance.
Correct Answer: Deeper trees are more likely to overfit.
Explanation:A deeper tree can create very complex decision boundaries that memorize the training data (including noise), leading to overfitting.