Unit 2 - Practice Quiz

INT394

1 In the context of supervised learning, what is the primary goal of Classification?

A. To predict a continuous numerical value based on input features.
B. To group similar data points together without predefined labels.
C. To map input variables to discrete output categories or classes.
D. To reduce the dimensionality of the dataset.

2 What does a Decision Boundary represent in a classification problem?

A. The limit of the computational power required to train the model.
B. A hypersurface that partitions the underlying vector space into two or more sets, one for each class.
C. The boundary where the training data ends and the testing data begins.
D. The maximum error rate acceptable for the model.

3 Consider a linear classifier in a 2-dimensional feature space defined by . What geometric shape is the decision boundary?

A. A parabola
B. A circle
C. A straight line
D. A hyperbola

4 For a Linear Classifier, the decision rule is often given by . What is the role of the bias term ?

A. It rotates the decision boundary around the origin.
B. It scales the length of the weight vector .
C. It translates the decision boundary away from the origin.
D. It creates non-linear curves in the boundary.

5 In the One-vs-All (One-vs-Rest) strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

6 In the One-vs-One strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

7 When using the One-vs-One strategy, how is the final classification decision typically made for a new data point?

A. By selecting the class with the highest probability from a single classifier.
B. By averaging the regression outputs of all classifiers.
C. By a voting scheme where the class with the most 'wins' is selected.
D. By choosing the class that was trained last.

8 Which of the following is a potential disadvantage of the One-vs-All strategy when classes are imbalanced?

A. It requires too many classifiers to be trained.
B. The datasets for the binary classifiers become heavily skewed (e.g., 1 vs 99 others).
C. It cannot handle linear decision boundaries.
D. It is computationally more expensive than One-vs-One during inference.

9 In Bayes Theorem, given by , what is the term called?

A. Posterior
B. Prior
C. Likelihood
D. Evidence

10 In Bayes Theorem, , what does represent?

A. The probability of the data occurring regardless of the class.
B. The probability of the class before observing any data (Prior).
C. The probability of the class after observing the data (Posterior).
D. The conditional dependence of on .

11 What is the fundamental assumption of the Naïve Bayes classifier?

A. Features are dependent on each other given the class label.
B. All features contribute equally to the decision boundary regardless of the class.
C. Features are conditionally independent given the class label.
D. The prior probabilities of all classes are equal.

12 Which of the following equations represents the decision rule for a Naïve Bayes classifier (ignoring the evidence as it is constant for all classes)?

A.
B.
C.
D.

13 In a Naïve Bayes classifier, what is the Zero Frequency Problem?

A. When the prior probability of a class is zero.
B. When a feature value appears in the test set but was never observed with a specific class in the training set, resulting in a zero likelihood.
C. When the entire dataset has zero variance.
D. When the computation results in a divide-by-zero error during normalization.

14 What technique is commonly used to solve the Zero Frequency Problem in Naïve Bayes?

A. Feature Scaling
B. Gradient Descent
C. Laplace Smoothing (Additive Smoothing)
D. Pruning

15 Which variation of Naïve Bayes is most appropriate when feature values are continuous and assumed to follow a normal distribution?

A. Multinomial Naïve Bayes
B. Bernoulli Naïve Bayes
C. Gaussian Naïve Bayes
D. Poisson Naïve Bayes

16 In Bayesian Decision Theory, the concept of Risk is defined as:

A. The probability of choosing the wrong class.
B. The expected loss associated with a decision rule.
C. The computational complexity of the algorithm.
D. The inverse of the likelihood function.

17 If we use the Zero-One Loss function (loss is 0 for correct classification, 1 for incorrect), minimizing the Risk is equivalent to:

A. Minimizing the squared error.
B. Maximizing the likelihood.
C. Minimizing the probability of error.
D. Maximizing the entropy.

18 Given the formula for Gaussian Naïve Bayes likelihood: , what parameters need to be estimated from the training data?

A. The weights and bias .
B. The mean and variance for each class and feature.
C. The median and mode.
D. The min and max values of .

19 Why is the Naïve Bayes classifier considered a Generative Model?

A. Because it generates new training data to balance classes.
B. Because it models the joint probability (via ) and captures how the data is generated.
C. Because it directly learns the decision boundary without modeling densities.
D. Because it uses genetic algorithms for optimization.

20 In the context of multi-class classification, if the decision regions are separated by linear boundaries, the classifier is known as:

A. A Linear Classifier
B. A Quadratic Classifier
C. A Non-parametric Classifier
D. A Decision Tree

21 Which version of Naïve Bayes is best suited for binary feature vectors (e.g., word presence/absence in text classification)?

A. Gaussian Naïve Bayes
B. Multinomial Naïve Bayes
C. Bernoulli Naïve Bayes
D. Linear Naïve Bayes

22 What is MAP estimation in the context of Bayesian classification?

A. Maximum Average Precision
B. Minimum Absolute Posteriori
C. Maximum A Posteriori
D. Mean Average Probability

23 How does Maximum Likelihood (ML) estimation differ from MAP estimation?

A. ML assumes a uniform prior (or ignores the prior), while MAP accounts for the prior .
B. ML is for regression, MAP is for classification.
C. MAP assumes a uniform likelihood, ML calculates likelihood.
D. There is no difference; they are identical.

24 What is the computational complexity of predicting a class for a single instance using Naïve Bayes with features and classes?

A.
B.
C.
D.

25 In Bayesian Decision Theory, typically denotes:

A. The probability of class .
B. The loss incurred by taking action when the true state of nature is .
C. The likelihood of feature .
D. The learning rate.

26 Which of the following is true regarding the Decision Boundary of a Gaussian Naïve Bayes classifier if all classes share the same covariance matrix?

A. The boundary is quadratic.
B. The boundary is linear.
C. The boundary is circular.
D. There is no decision boundary.

27 Why do we often work with Log-Probabilities (sums of logs) instead of direct probabilities (products) in Naïve Bayes?

A. To make the math harder.
B. Because logs are always positive.
C. To prevent numerical underflow when multiplying many small probabilities.
D. Because log probabilities are required by the Bayes theorem definition.

28 Which of the following text classification scenarios is Multinomial Naïve Bayes typically used for?

A. When features represent the presence/absence of words (binary).
B. When features represent word counts or term frequencies.
C. When features are continuous word embeddings.
D. When the text length is infinite.

29 A classifier that distinguishes between 'Spam' and 'Not Spam' is an example of:

A. Clustering
B. Regression
C. Binary Classification
D. Reinforcement Learning

30 In the formulation , if and , which side of the boundary does the point fall on?

A. Positive side ()
B. Negative side ()
C. On the boundary ()
D. Undefined

31 In Bayesian Decision Theory, the Evidence acts as a:

A. Weighting factor for the likelihood.
B. Normalization constant to ensure probabilities sum to 1.
C. Prior belief about the feature distribution.
D. Loss function.

32 Which of the following statements about Decision Regions is correct?

A. Decision regions can never be disjoint.
B. The union of all decision regions must cover the entire feature space.
C. Decision regions must always be convex.
D. Decision regions are only defined for training data.

33 The One-vs-One strategy generally requires more space to store models than One-vs-All ( vs ). Why might it still be preferred?

A. Each individual classifier is trained on a smaller subset of data (only two classes), potentially making training faster.
B. It is the only method that supports Neural Networks.
C. It does not require labels.
D. It guarantees 100% accuracy.

34 In a probabilistic classifier, if and , and the loss for misclassifying class 1 is much higher than misclassifying class 2, Bayesian Decision Theory might suggest:

A. Choosing class 1 regardless of cost.
B. Choosing class 2 if the expected risk is lower, even if the probability is lower.
C. Choosing the class with the highest probability always.
D. Refusing to classify.

35 What is the vector in a linear classifier geometrically orthogonal to?

A. The x-axis.
B. The y-axis.
C. The decision boundary (hyperplane).
D. The data points.

36 Which term calculates ?

A. Gaussian Likelihood
B. Gini Impurity
C. Laplace Smoothed Multinomial Likelihood
D. Posterior Probability

37 Why does the Naïve Bayes independence assumption often work well in practice even when features are somewhat dependent?

A. Because dependencies cancel each other out.
B. Because classification relies on the correct sign/ranking of the posterior, not the exact probability value.
C. Because real-world data is always independent.
D. Because the algorithm corrects the dependencies during training.

38 If a classifier produces a probability , it is known as a:

A. Hard Classifier
B. Soft (Probabilistic) Classifier
C. Deterministic Classifier
D. Regressive Classifier

39 In the context of Bayes Theorem, if the Prior is uniform for all classes, the MAP estimate is equivalent to maximizing:

A. The Evidence
B. The Likelihood
C. The Loss Function
D. The Variance

40 What is the dimension of the decision boundary for a binary classification problem with 10 input features?

A. 1
B. 2
C. 9
D. 10

41 Which of the following is NOT a property of a Linear Classifier?

A. Computationally efficient (fast inference).
B. Simple to interpret (weights indicate feature importance).
C. Can learn complex XOR relationships directly without feature engineering.
D. Less prone to overfitting compared to high-degree polynomial classifiers.

42 When applying Naïve Bayes, how is the Prior usually estimated from training data?

A. Average value of features for class .
B. Fraction of training samples belonging to class .
C. Correlation coefficient of class .
D. It is always set to 0.5.

43 Which statement best describes the Bayes Error Rate?

A. The error rate of a Naïve Bayes classifier.
B. The lowest possible error rate for any classifier on a given distribution.
C. The error rate when is ignored.
D. The rate at which the algorithm converges.

44 In a 3-class problem using One-vs-Rest, if the outputs of the three classifiers for a point are , , , which class is predicted?

A. Class 1
B. Class 2
C. Class 3
D. None

45 If features and are duplicates (), how does this affect Naïve Bayes?

A. It has no effect.
B. It improves accuracy by reinforcing the signal.
C. It violates the independence assumption and 'double counts' the importance of that feature.
D. It causes a division by zero.

46 What is a Reject Option in classification?

A. Removing outliers from the training set.
B. Refraining from making a prediction if the posterior probability is below a certain threshold.
C. Rejecting the null hypothesis.
D. Deleting features that are not useful.

47 Geometrically, what does the likelihood in a Gaussian Naïve Bayes represent?

A. The distance of from the decision boundary.
B. The density of the data point within the cluster of class .
C. The probability of the class .
D. The volume of the dataset.

48 Which of the following is an example of a Discriminative approach to classification?

A. Naïve Bayes
B. Logistic Regression
C. Hidden Markov Models
D. Gaussian Mixture Models

49 In the context of the One-vs-One strategy, if there is a tie in the voting (e.g., class A and class B both get same number of votes), how is it typically resolved?

A. Random selection or based on highest aggregate confidence score.
B. The model crashes.
C. Both classes are returned.
D. The process is restarted.

50 What happens to the decision boundary in a Linear Classifier if we multiply all weights and bias by a positive constant ?

A. The boundary shifts.
B. The boundary rotates.
C. The boundary remains unchanged.
D. The boundary becomes non-linear.