Unit 2 - Practice Quiz

INT394 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 In the context of supervised learning, what is the primary goal of Classification?

A. To group similar data points together without predefined labels.
B. To map input variables to discrete output categories or classes.
C. To reduce the dimensionality of the dataset.
D. To predict a continuous numerical value based on input features.

2 What does a Decision Boundary represent in a classification problem?

A. The boundary where the training data ends and the testing data begins.
B. The maximum error rate acceptable for the model.
C. The limit of the computational power required to train the model.
D. A hypersurface that partitions the underlying vector space into two or more sets, one for each class.

3 Consider a linear classifier in a 2-dimensional feature space defined by . What geometric shape is the decision boundary?

A. A circle
B. A parabola
C. A hyperbola
D. A straight line

4 For a Linear Classifier, the decision rule is often given by . What is the role of the bias term ?

A. It translates the decision boundary away from the origin.
B. It rotates the decision boundary around the origin.
C. It scales the length of the weight vector .
D. It creates non-linear curves in the boundary.

5 In the One-vs-All (One-vs-Rest) strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

6 In the One-vs-One strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

7 When using the One-vs-One strategy, how is the final classification decision typically made for a new data point?

A. By a voting scheme where the class with the most 'wins' is selected.
B. By averaging the regression outputs of all classifiers.
C. By choosing the class that was trained last.
D. By selecting the class with the highest probability from a single classifier.

8 Which of the following is a potential disadvantage of the One-vs-All strategy when classes are imbalanced?

A. The datasets for the binary classifiers become heavily skewed (e.g., 1 vs 99 others).
B. It requires too many classifiers to be trained.
C. It cannot handle linear decision boundaries.
D. It is computationally more expensive than One-vs-One during inference.

9 In Bayes Theorem, given by , what is the term called?

A. Evidence
B. Likelihood
C. Posterior
D. Prior

10 In Bayes Theorem, , what does represent?

A. The probability of the data occurring regardless of the class.
B. The probability of the class after observing the data (Posterior).
C. The probability of the class before observing any data (Prior).
D. The conditional dependence of on .

11 What is the fundamental assumption of the Naïve Bayes classifier?

A. The prior probabilities of all classes are equal.
B. Features are conditionally independent given the class label.
C. All features contribute equally to the decision boundary regardless of the class.
D. Features are dependent on each other given the class label.

12 Which of the following equations represents the decision rule for a Naïve Bayes classifier (ignoring the evidence as it is constant for all classes)?

A.
B.
C.
D.

13 In a Naïve Bayes classifier, what is the Zero Frequency Problem?

A. When the entire dataset has zero variance.
B. When a feature value appears in the test set but was never observed with a specific class in the training set, resulting in a zero likelihood.
C. When the prior probability of a class is zero.
D. When the computation results in a divide-by-zero error during normalization.

14 What technique is commonly used to solve the Zero Frequency Problem in Naïve Bayes?

A. Laplace Smoothing (Additive Smoothing)
B. Gradient Descent
C. Feature Scaling
D. Pruning

15 Which variation of Naïve Bayes is most appropriate when feature values are continuous and assumed to follow a normal distribution?

A. Multinomial Naïve Bayes
B. Gaussian Naïve Bayes
C. Bernoulli Naïve Bayes
D. Poisson Naïve Bayes

16 In Bayesian Decision Theory, the concept of Risk is defined as:

A. The computational complexity of the algorithm.
B. The probability of choosing the wrong class.
C. The expected loss associated with a decision rule.
D. The inverse of the likelihood function.

17 If we use the Zero-One Loss function (loss is 0 for correct classification, 1 for incorrect), minimizing the Risk is equivalent to:

A. Maximizing the entropy.
B. Minimizing the probability of error.
C. Maximizing the likelihood.
D. Minimizing the squared error.

18 Given the formula for Gaussian Naïve Bayes likelihood: , what parameters need to be estimated from the training data?

A. The median and mode.
B. The min and max values of .
C. The weights and bias .
D. The mean and variance for each class and feature.

19 Why is the Naïve Bayes classifier considered a Generative Model?

A. Because it directly learns the decision boundary without modeling densities.
B. Because it generates new training data to balance classes.
C. Because it models the joint probability (via ) and captures how the data is generated.
D. Because it uses genetic algorithms for optimization.

20 In the context of multi-class classification, if the decision regions are separated by linear boundaries, the classifier is known as:

A. A Linear Classifier
B. A Quadratic Classifier
C. A Non-parametric Classifier
D. A Decision Tree

21 Which version of Naïve Bayes is best suited for binary feature vectors (e.g., word presence/absence in text classification)?

A. Bernoulli Naïve Bayes
B. Linear Naïve Bayes
C. Multinomial Naïve Bayes
D. Gaussian Naïve Bayes

22 What is MAP estimation in the context of Bayesian classification?

A. Maximum Average Precision
B. Minimum Absolute Posteriori
C. Mean Average Probability
D. Maximum A Posteriori

23 How does Maximum Likelihood (ML) estimation differ from MAP estimation?

A. There is no difference; they are identical.
B. MAP assumes a uniform likelihood, ML calculates likelihood.
C. ML is for regression, MAP is for classification.
D. ML assumes a uniform prior (or ignores the prior), while MAP accounts for the prior .

24 What is the computational complexity of predicting a class for a single instance using Naïve Bayes with features and classes?

A.
B.
C.
D.

25 In Bayesian Decision Theory, typically denotes:

A. The likelihood of feature .
B. The learning rate.
C. The loss incurred by taking action when the true state of nature is .
D. The probability of class .

26 Which of the following is true regarding the Decision Boundary of a Gaussian Naïve Bayes classifier if all classes share the same covariance matrix?

A. The boundary is linear.
B. The boundary is quadratic.
C. There is no decision boundary.
D. The boundary is circular.

27 Why do we often work with Log-Probabilities (sums of logs) instead of direct probabilities (products) in Naïve Bayes?

A. Because log probabilities are required by the Bayes theorem definition.
B. To prevent numerical underflow when multiplying many small probabilities.
C. Because logs are always positive.
D. To make the math harder.

28 Which of the following text classification scenarios is Multinomial Naïve Bayes typically used for?

A. When the text length is infinite.
B. When features represent the presence/absence of words (binary).
C. When features are continuous word embeddings.
D. When features represent word counts or term frequencies.

29 A classifier that distinguishes between 'Spam' and 'Not Spam' is an example of:

A. Binary Classification
B. Reinforcement Learning
C. Clustering
D. Regression

30 In the formulation , if and , which side of the boundary does the point fall on?

A. Positive side ()
B. Negative side ()
C. On the boundary ()
D. Undefined

31 In Bayesian Decision Theory, the Evidence acts as a:

A. Prior belief about the feature distribution.
B. Loss function.
C. Weighting factor for the likelihood.
D. Normalization constant to ensure probabilities sum to 1.

32 Which of the following statements about Decision Regions is correct?

A. Decision regions are only defined for training data.
B. Decision regions must always be convex.
C. The union of all decision regions must cover the entire feature space.
D. Decision regions can never be disjoint.

33 The One-vs-One strategy generally requires more space to store models than One-vs-All ( vs ). Why might it still be preferred?

A. It guarantees 100% accuracy.
B. Each individual classifier is trained on a smaller subset of data (only two classes), potentially making training faster.
C. It is the only method that supports Neural Networks.
D. It does not require labels.

34 In a probabilistic classifier, if and , and the loss for misclassifying class 1 is much higher than misclassifying class 2, Bayesian Decision Theory might suggest:

A. Choosing class 1 regardless of cost.
B. Choosing the class with the highest probability always.
C. Refusing to classify.
D. Choosing class 2 if the expected risk is lower, even if the probability is lower.

35 What is the vector in a linear classifier geometrically orthogonal to?

A. The data points.
B. The decision boundary (hyperplane).
C. The y-axis.
D. The x-axis.

36 Which term calculates ?

A. Laplace Smoothed Multinomial Likelihood
B. Gaussian Likelihood
C. Gini Impurity
D. Posterior Probability

37 Why does the Naïve Bayes independence assumption often work well in practice even when features are somewhat dependent?

A. Because the algorithm corrects the dependencies during training.
B. Because classification relies on the correct sign/ranking of the posterior, not the exact probability value.
C. Because real-world data is always independent.
D. Because dependencies cancel each other out.

38 If a classifier produces a probability , it is known as a:

A. Hard Classifier
B. Deterministic Classifier
C. Regressive Classifier
D. Soft (Probabilistic) Classifier

39 In the context of Bayes Theorem, if the Prior is uniform for all classes, the MAP estimate is equivalent to maximizing:

A. The Variance
B. The Likelihood
C. The Evidence
D. The Loss Function

40 What is the dimension of the decision boundary for a binary classification problem with 10 input features?

A. 1
B. 9
C. 2
D. 10

41 Which of the following is NOT a property of a Linear Classifier?

A. Simple to interpret (weights indicate feature importance).
B. Computationally efficient (fast inference).
C. Can learn complex XOR relationships directly without feature engineering.
D. Less prone to overfitting compared to high-degree polynomial classifiers.

42 When applying Naïve Bayes, how is the Prior usually estimated from training data?

A. Correlation coefficient of class .
B. It is always set to 0.5.
C. Fraction of training samples belonging to class .
D. Average value of features for class .

43 Which statement best describes the Bayes Error Rate?

A. The rate at which the algorithm converges.
B. The error rate when is ignored.
C. The error rate of a Naïve Bayes classifier.
D. The lowest possible error rate for any classifier on a given distribution.

44 In a 3-class problem using One-vs-Rest, if the outputs of the three classifiers for a point are , , , which class is predicted?

A. Class 3
B. None
C. Class 1
D. Class 2

45 If features and are duplicates (), how does this affect Naïve Bayes?

A. It causes a division by zero.
B. It has no effect.
C. It violates the independence assumption and 'double counts' the importance of that feature.
D. It improves accuracy by reinforcing the signal.

46 What is a Reject Option in classification?

A. Deleting features that are not useful.
B. Rejecting the null hypothesis.
C. Refraining from making a prediction if the posterior probability is below a certain threshold.
D. Removing outliers from the training set.

47 Geometrically, what does the likelihood in a Gaussian Naïve Bayes represent?

A. The probability of the class .
B. The volume of the dataset.
C. The density of the data point within the cluster of class .
D. The distance of from the decision boundary.

48 Which of the following is an example of a Discriminative approach to classification?

A. Gaussian Mixture Models
B. Hidden Markov Models
C. Logistic Regression
D. Naïve Bayes

49 In the context of the One-vs-One strategy, if there is a tie in the voting (e.g., class A and class B both get same number of votes), how is it typically resolved?

A. The process is restarted.
B. Random selection or based on highest aggregate confidence score.
C. Both classes are returned.
D. The model crashes.

50 What happens to the decision boundary in a Linear Classifier if we multiply all weights and bias by a positive constant ?

A. The boundary shifts.
B. The boundary remains unchanged.
C. The boundary becomes non-linear.
D. The boundary rotates.