Unit 3 - Practice Quiz

CSE274 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following best describes the goal of Supervised Learning?

A. To find hidden structures in unlabeled data.
B. To learn a mapping from input variables () to an output variable () using labeled training data.
C. To maximize a reward signal through interaction with an environment.
D. To reduce the dimensionality of the dataset.

2 In the Perceptron algorithm, what is the update rule for the weight vector given a learning rate , target label , and predicted label ?

A.
B.
C.
D.

3 A single-layer Perceptron can only classify data correctly if the data is:

A. Non-linearly separable
B. Linearly separable
C. High dimensional
D. Normally distributed

4 Which activation function is primarily used in Logistic Regression to map predictions to probabilities between 0 and 1?

A. ReLU
B. Tanh
C. Sigmoid
D. Linear

5 What is the mathematical formulation of the Sigmoid function ?

A.
B.
C.
D.

6 Which cost function is typically minimized in Logistic Regression?

A. Mean Squared Error (MSE)
B. Hinge Loss
C. Binary Cross-Entropy (Log Loss)
D. Gini Impurity

7 In Naïve Bayes, what is the fundamental assumption made about the features?

A. Features are dependent on each other.
B. Features are mutually exclusive.
C. Features are conditionally independent given the class label.
D. Features must be categorical.

8 What is the formula for Bayes' Theorem regarding the probability of class given data ?

A.
B.
C.
D.

9 Why is Laplace Smoothing (Additive Smoothing) used in Naïve Bayes?

A. To handle non-linear data.
B. To prevent the probability from becoming zero for unseen features.
C. To normalize the dataset.
D. To reduce the dimensionality of the data.

10 In a Confusion Matrix, what does a False Positive (FP) represent?

A. The model correctly predicted the positive class.
B. The model incorrectly predicted the positive class (actual was negative).
C. The model incorrectly predicted the negative class (actual was positive).
D. The model correctly predicted the negative class.

11 How is Precision calculated?

A.
B.
C.
D.

12 How is Recall (Sensitivity) calculated?

A.
B.
C.
D.

13 The F1-Score is the harmonic mean of which two metrics?

A. Accuracy and Precision
B. Specificity and Sensitivity
C. Precision and Recall
D. TPR and FPR

14 Which of the following models is known as a Lazy Learner?

A. Logistic Regression
B. Support Vector Machine
C. K-Nearest Neighbors (KNN)
D. Decision Tree

15 In K-Nearest Neighbors (KNN), what is the effect of choosing a very small value for (e.g., )?

A. High Bias, Low Variance
B. The decision boundary becomes very smooth.
C. The model becomes computationally cheaper.
D. High Variance, Low Bias (Overfitting)

16 Which distance metric is most commonly used in KNN for continuous variables?

A. Jaccard Distance
B. Hamming Distance
C. Euclidean Distance
D. Cosine Similarity

17 Why is feature scaling (normalization/standardization) crucial for distance-based models like KNN and SVM?

A. It is required for the code to compile.
B. Distance calculations are dominated by features with larger magnitudes.
C. It converts all features to categorical data.
D. It increases the number of dimensions.

18 In a Decision Tree, which metric is commonly used to measure impurity for classification tasks?

A. Mean Squared Error
B. Gini Impurity
C. R-Squared
D. Euclidean Distance

19 What is the formula for Entropy for a binary classification problem with positive probability and negative probability ?

A.
B.
C.
D.

20 What is Information Gain in the context of Decision Trees?

A. The total number of nodes in the tree.
B. The decrease in entropy (or impurity) achieved by splitting a node.
C. The increase in accuracy on the test set.
D. The depth of the tree.

21 What technique is used to reduce overfitting in Decision Trees by removing sections of the tree that provide little power to classify instances?

A. Boosting
B. Bagging
C. Pruning
D. Scaling

22 What is the primary objective of a Support Vector Machine (SVM)?

A. To minimize the number of support vectors.
B. To find the hyperplane that maximizes the margin between classes.
C. To find the hyperplane that minimizes the distance to all points.
D. To calculate the conditional probability of classes.

23 In SVM, what are the Support Vectors?

A. The data points furthest from the decision boundary.
B. The data points that lie closest to the decision boundary.
C. The center points of each class cluster.
D. The incorrectly classified points only.

24 What allows SVMs to classify non-linearly separable data by mapping inputs into a higher-dimensional space?

A. Regularization
B. The Kernel Trick
C. Gradient Descent
D. Backpropagation

25 In SVM, what is the role of the regularization parameter C?

A. It determines the number of kernels.
B. It controls the trade-off between maximizing the margin and minimizing training classification errors.
C. It sets the threshold for probability.
D. It initializes the weights to zero.

26 The ROC Curve plots which two metrics against each other?

A. Precision vs Recall
B. True Positive Rate (TPR) vs False Positive Rate (FPR)
C. Sensitivity vs Specificity
D. Accuracy vs F1-Score

27 What does an AUC (Area Under the Curve) of 0.5 indicate?

A. Perfect classification.
B. The model performs no better than random guessing.
C. The model predicts all negatives.
D. High precision but low recall.

28 When is the Precision-Recall (PR) Curve preferred over the ROC Curve?

A. When the classes are perfectly balanced.
B. When the dataset is small.
C. When there is a severe class imbalance (e.g., rare disease detection).
D. When using a Decision Tree.

29 What is the formula for the False Positive Rate (FPR)?

A.
B.
C.
D.

30 Which of the following is a parametric model?

A. K-Nearest Neighbors
B. Decision Tree
C. Logistic Regression
D. Random Forest

31 In the context of the Confusion Matrix, what is Specificity?

A. True Positive Rate
B. True Negative Rate ()
C. Precision
D. Accuracy

32 Which model produces decision boundaries that are always orthogonal to the feature axes (axis-aligned)?

A. Logistic Regression
B. Linear SVM
C. Decision Tree
D. Perceptron

33 If a classifier has high Precision but low Recall, what does this imply?

A. It captures most positive cases but has many false alarms.
B. It rarely predicts positive, but when it does, it is usually correct.
C. It performs randomly.
D. It predicts positive for almost everything.

34 For a Gaussian Naïve Bayes classifier, the likelihood of a continuous feature is calculated using:

A. The binomial distribution formula.
B. The probability density function (PDF) of a normal distribution.
C. Simple counting of occurrences.
D. The sigmoid function.

35 What is the Hinge Loss function used for?

A. Linear Regression
B. Logistic Regression
C. Support Vector Machines (SVM)
D. Decision Trees

36 Which point on the ROC space represents a perfect classifier?

A. (0, 0)
B. (1, 1)
C. (0, 1) [Top-Left corner]
D. (1, 0) [Bottom-Right corner]

37 In Logistic Regression, the 'Log-Odds' or 'Logit' is linear with respect to:

A. The input features .
B. The predicted probability .
C. The error term.
D. The number of iterations.

38 When using KNN, increasing the value of K generally leads to:

A. More complex decision boundaries.
B. Smoother decision boundaries.
C. Zero training error.
D. Higher variance.

39 Which algorithm constructs a separating hyperplane using a One-vs-Rest (OvR) or One-vs-One (OvO) strategy for multi-class classification?

A. Decision Trees
B. Naïve Bayes
C. SVM (and other binary linear classifiers)
D. KNN

40 Which of the following is considered a Distance-Based Model?

A. Decision Tree
B. Naïve Bayes
C. K-Nearest Neighbors
D. Random Forest

41 In a decision tree, if a node contains only samples from a single class, its Entropy is:

A. 1
B. 0
C. 0.5
D. Infinite

42 The Bias term ( or ) in a linear model allows the hyperplane to:

A. Pass through the origin only.
B. Shift away from the origin.
C. Become non-linear.
D. Rotate but not shift.

43 Which metric corresponds to the Area Under the Precision-Recall Curve (AUC-PR)?

A. Accuracy
B. Average Precision (AP)
C. F1-Score
D. Balanced Accuracy

44 What is a disadvantage of the Perceptron algorithm compared to Logistic Regression?

A. It cannot handle continuous data.
B. It does not provide probabilistic outputs.
C. It is computationally more expensive.
D. It always overfits.

45 In Naïve Bayes, if , , and , what is ?

A. 0.5
B. 0.8
C. 1.0
D. 0.2

46 Which of the following is a Generative Model?

A. Logistic Regression
B. Support Vector Machine
C. Naïve Bayes
D. Perceptron

47 What is the computational complexity of the Training phase for KNN?

A. or effectively zero.
B.
C.
D.

48 Which kernel function is used to create a Radial Basis Function (RBF) SVM?

A.
B.
C.
D.

49 If you have a dataset with 1000 negative samples and 10 positive samples, which evaluation metric is misleading?

A. Precision
B. Recall
C. F1-Score
D. Accuracy

50 What is the relationship between Decision Tree depth and Overfitting?

A. Deeper trees are less likely to overfit.
B. Deeper trees are more likely to overfit.
C. Depth has no impact on overfitting.
D. Shallow trees have high variance.