Unit 2 - Practice Quiz

INT394 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 In the context of supervised learning, what is the primary goal of Classification?

A. To group similar data points together without predefined labels.
B. To reduce the dimensionality of the dataset.
C. To map input variables to discrete output categories or classes.
D. To predict a continuous numerical value based on input features.

2 What does a Decision Boundary represent in a classification problem?

A. A hypersurface that partitions the underlying vector space into two or more sets, one for each class.
B. The boundary where the training data ends and the testing data begins.
C. The limit of the computational power required to train the model.
D. The maximum error rate acceptable for the model.

3 Consider a linear classifier in a 2-dimensional feature space defined by . What geometric shape is the decision boundary?

A. A parabola
B. A hyperbola
C. A circle
D. A straight line

4 For a Linear Classifier, the decision rule is often given by . What is the role of the bias term ?

A. It creates non-linear curves in the boundary.
B. It rotates the decision boundary around the origin.
C. It scales the length of the weight vector .
D. It translates the decision boundary away from the origin.

5 In the One-vs-All (One-vs-Rest) strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

6 In the One-vs-One strategy for multi-class classification with classes, how many binary classifiers are trained?

A.
B.
C.
D.

7 When using the One-vs-One strategy, how is the final classification decision typically made for a new data point?

A. By a voting scheme where the class with the most 'wins' is selected.
B. By averaging the regression outputs of all classifiers.
C. By choosing the class that was trained last.
D. By selecting the class with the highest probability from a single classifier.

8 Which of the following is a potential disadvantage of the One-vs-All strategy when classes are imbalanced?

A. It requires too many classifiers to be trained.
B. It cannot handle linear decision boundaries.
C. It is computationally more expensive than One-vs-One during inference.
D. The datasets for the binary classifiers become heavily skewed (e.g., 1 vs 99 others).

9 In Bayes Theorem, given by , what is the term called?

A. Evidence
B. Posterior
C. Prior
D. Likelihood

10 In Bayes Theorem, , what does represent?

A. The probability of the data occurring regardless of the class.
B. The probability of the class after observing the data (Posterior).
C. The conditional dependence of on .
D. The probability of the class before observing any data (Prior).

11 What is the fundamental assumption of the Naïve Bayes classifier?

A. All features contribute equally to the decision boundary regardless of the class.
B. Features are conditionally independent given the class label.
C. Features are dependent on each other given the class label.
D. The prior probabilities of all classes are equal.

12 Which of the following equations represents the decision rule for a Naïve Bayes classifier (ignoring the evidence as it is constant for all classes)?

A.
B.
C.
D.

13 In a Naïve Bayes classifier, what is the Zero Frequency Problem?

A. When the prior probability of a class is zero.
B. When the computation results in a divide-by-zero error during normalization.
C. When a feature value appears in the test set but was never observed with a specific class in the training set, resulting in a zero likelihood.
D. When the entire dataset has zero variance.

14 What technique is commonly used to solve the Zero Frequency Problem in Naïve Bayes?

A. Pruning
B. Laplace Smoothing (Additive Smoothing)
C. Gradient Descent
D. Feature Scaling

15 Which variation of Naïve Bayes is most appropriate when feature values are continuous and assumed to follow a normal distribution?

A. Poisson Naïve Bayes
B. Gaussian Naïve Bayes
C. Multinomial Naïve Bayes
D. Bernoulli Naïve Bayes

16 In Bayesian Decision Theory, the concept of Risk is defined as:

A. The probability of choosing the wrong class.
B. The expected loss associated with a decision rule.
C. The computational complexity of the algorithm.
D. The inverse of the likelihood function.

17 If we use the Zero-One Loss function (loss is 0 for correct classification, 1 for incorrect), minimizing the Risk is equivalent to:

A. Minimizing the squared error.
B. Minimizing the probability of error.
C. Maximizing the likelihood.
D. Maximizing the entropy.

18 Given the formula for Gaussian Naïve Bayes likelihood: , what parameters need to be estimated from the training data?

A. The median and mode.
B. The mean and variance for each class and feature.
C. The weights and bias .
D. The min and max values of .

19 Why is the Naïve Bayes classifier considered a Generative Model?

A. Because it uses genetic algorithms for optimization.
B. Because it directly learns the decision boundary without modeling densities.
C. Because it generates new training data to balance classes.
D. Because it models the joint probability (via ) and captures how the data is generated.

20 In the context of multi-class classification, if the decision regions are separated by linear boundaries, the classifier is known as:

A. A Decision Tree
B. A Linear Classifier
C. A Quadratic Classifier
D. A Non-parametric Classifier

21 Which version of Naïve Bayes is best suited for binary feature vectors (e.g., word presence/absence in text classification)?

A. Bernoulli Naïve Bayes
B. Gaussian Naïve Bayes
C. Linear Naïve Bayes
D. Multinomial Naïve Bayes

22 What is MAP estimation in the context of Bayesian classification?

A. Mean Average Probability
B. Maximum Average Precision
C. Minimum Absolute Posteriori
D. Maximum A Posteriori

23 How does Maximum Likelihood (ML) estimation differ from MAP estimation?

A. ML assumes a uniform prior (or ignores the prior), while MAP accounts for the prior .
B. There is no difference; they are identical.
C. MAP assumes a uniform likelihood, ML calculates likelihood.
D. ML is for regression, MAP is for classification.

24 What is the computational complexity of predicting a class for a single instance using Naïve Bayes with features and classes?

A.
B.
C.
D.

25 In Bayesian Decision Theory, typically denotes:

A. The learning rate.
B. The likelihood of feature .
C. The loss incurred by taking action when the true state of nature is .
D. The probability of class .

26 Which of the following is true regarding the Decision Boundary of a Gaussian Naïve Bayes classifier if all classes share the same covariance matrix?

A. The boundary is circular.
B. The boundary is quadratic.
C. The boundary is linear.
D. There is no decision boundary.

27 Why do we often work with Log-Probabilities (sums of logs) instead of direct probabilities (products) in Naïve Bayes?

A. Because log probabilities are required by the Bayes theorem definition.
B. To make the math harder.
C. To prevent numerical underflow when multiplying many small probabilities.
D. Because logs are always positive.

28 Which of the following text classification scenarios is Multinomial Naïve Bayes typically used for?

A. When features are continuous word embeddings.
B. When features represent word counts or term frequencies.
C. When the text length is infinite.
D. When features represent the presence/absence of words (binary).

29 A classifier that distinguishes between 'Spam' and 'Not Spam' is an example of:

A. Regression
B. Clustering
C. Binary Classification
D. Reinforcement Learning

30 In the formulation , if and , which side of the boundary does the point fall on?

A. Positive side ()
B. Negative side ()
C. Undefined
D. On the boundary ()

31 In Bayesian Decision Theory, the Evidence acts as a:

A. Prior belief about the feature distribution.
B. Weighting factor for the likelihood.
C. Normalization constant to ensure probabilities sum to 1.
D. Loss function.

32 Which of the following statements about Decision Regions is correct?

A. Decision regions can never be disjoint.
B. Decision regions are only defined for training data.
C. The union of all decision regions must cover the entire feature space.
D. Decision regions must always be convex.

33 The One-vs-One strategy generally requires more space to store models than One-vs-All ( vs ). Why might it still be preferred?

A. It guarantees 100% accuracy.
B. It is the only method that supports Neural Networks.
C. It does not require labels.
D. Each individual classifier is trained on a smaller subset of data (only two classes), potentially making training faster.

34 In a probabilistic classifier, if and , and the loss for misclassifying class 1 is much higher than misclassifying class 2, Bayesian Decision Theory might suggest:

A. Choosing the class with the highest probability always.
B. Refusing to classify.
C. Choosing class 2 if the expected risk is lower, even if the probability is lower.
D. Choosing class 1 regardless of cost.

35 What is the vector in a linear classifier geometrically orthogonal to?

A. The y-axis.
B. The decision boundary (hyperplane).
C. The x-axis.
D. The data points.

36 Which term calculates ?

A. Posterior Probability
B. Gini Impurity
C. Gaussian Likelihood
D. Laplace Smoothed Multinomial Likelihood

37 Why does the Naïve Bayes independence assumption often work well in practice even when features are somewhat dependent?

A. Because the algorithm corrects the dependencies during training.
B. Because dependencies cancel each other out.
C. Because real-world data is always independent.
D. Because classification relies on the correct sign/ranking of the posterior, not the exact probability value.

38 If a classifier produces a probability , it is known as a:

A. Hard Classifier
B. Deterministic Classifier
C. Soft (Probabilistic) Classifier
D. Regressive Classifier

39 In the context of Bayes Theorem, if the Prior is uniform for all classes, the MAP estimate is equivalent to maximizing:

A. The Variance
B. The Loss Function
C. The Evidence
D. The Likelihood

40 What is the dimension of the decision boundary for a binary classification problem with 10 input features?

A. 2
B. 9
C. 10
D. 1

41 Which of the following is NOT a property of a Linear Classifier?

A. Less prone to overfitting compared to high-degree polynomial classifiers.
B. Can learn complex XOR relationships directly without feature engineering.
C. Computationally efficient (fast inference).
D. Simple to interpret (weights indicate feature importance).

42 When applying Naïve Bayes, how is the Prior usually estimated from training data?

A. Average value of features for class .
B. Fraction of training samples belonging to class .
C. Correlation coefficient of class .
D. It is always set to 0.5.

43 Which statement best describes the Bayes Error Rate?

A. The rate at which the algorithm converges.
B. The error rate of a Naïve Bayes classifier.
C. The lowest possible error rate for any classifier on a given distribution.
D. The error rate when is ignored.

44 In a 3-class problem using One-vs-Rest, if the outputs of the three classifiers for a point are , , , which class is predicted?

A. None
B. Class 3
C. Class 2
D. Class 1

45 If features and are duplicates (), how does this affect Naïve Bayes?

A. It has no effect.
B. It improves accuracy by reinforcing the signal.
C. It causes a division by zero.
D. It violates the independence assumption and 'double counts' the importance of that feature.

46 What is a Reject Option in classification?

A. Rejecting the null hypothesis.
B. Deleting features that are not useful.
C. Refraining from making a prediction if the posterior probability is below a certain threshold.
D. Removing outliers from the training set.

47 Geometrically, what does the likelihood in a Gaussian Naïve Bayes represent?

A. The density of the data point within the cluster of class .
B. The distance of from the decision boundary.
C. The probability of the class .
D. The volume of the dataset.

48 Which of the following is an example of a Discriminative approach to classification?

A. Hidden Markov Models
B. Logistic Regression
C. Naïve Bayes
D. Gaussian Mixture Models

49 In the context of the One-vs-One strategy, if there is a tie in the voting (e.g., class A and class B both get same number of votes), how is it typically resolved?

A. Random selection or based on highest aggregate confidence score.
B. The model crashes.
C. Both classes are returned.
D. The process is restarted.

50 What happens to the decision boundary in a Linear Classifier if we multiply all weights and bias by a positive constant ?

A. The boundary rotates.
B. The boundary becomes non-linear.
C. The boundary remains unchanged.
D. The boundary shifts.