Unit 2 - Practice Quiz

INT395

1 In the context of supervised learning, what distinguishes a classification problem from a regression problem?

A. The input variables are continuous.
B. The target variable is continuous.
C. The target variable is categorical or discrete.
D. The training data is unlabeled.

2 Which scikit-learn method is primarily used to train a classifier on a dataset ?

A. model.predict(X, y)
B. model.fit(X, y)
C. model.transform(X, y)
D. model.score(X, y)

3 What is the standard shape of the input matrix expected by scikit-learn classifiers?

A. (n_features, n_samples)
B. (n_samples, n_features)
C. (n_samples, n_samples)
D. (n_features, n_classes)

4 Which of the following metrics is defined as the ratio of correctly predicted observations to the total observations?

A. Recall
B. Precision
C. Accuracy
D. F1-Score

5 In a Confusion Matrix, what does a False Positive (FP) represent?

A. The model predicted Positive, and the actual class was Positive.
B. The model predicted Negative, and the actual class was Positive.
C. The model predicted Positive, but the actual class was Negative.
D. The model predicted Negative, and the actual class was Negative.

6 Which metric is best suited for a classification problem where False Negatives are much more costly than False Positives (e.g., detecting a deadly disease)?

A. Precision
B. Recall
C. Specificity
D. Accuracy

7 Calculate the Precision given: , , .

A. 0.83
B. 0.91
C. 0.50
D. 0.20

8 The F1-Score is the harmonic mean of which two metrics?

A. Accuracy and Recall
B. Precision and Recall
C. Specificity and Sensitivity
D. TPR and FPR

9 In an ROC curve, the x-axis and y-axis represent which metrics respectively?

A. Precision vs Recall
B. False Positive Rate vs True Positive Rate
C. True Negative Rate vs True Positive Rate
D. Recall vs Accuracy

10 What does an AUC (Area Under Curve) score of 0.5 imply about a classifier?

A. It is a perfect classifier.
B. It has zero errors.
C. It performs no better than random guessing.
D. It predicts the negative class always.

11 When using classification_report in scikit-learn, what does the macro avg represent?

A. The weighted average based on support size.
B. The unweighted mean of the metric for each label.
C. The accuracy of the model.
D. The standard deviation of the metric.

12 Which issue makes Accuracy a misleading metric?

A. High computational cost.
B. It cannot be calculated for multiclass problems.
C. Imbalanced datasets.
D. Linearly separable data.

13 What is the activation function used in the standard Perceptron algorithm for binary classification?

A. Sigmoid function
B. ReLU function
C. Heaviside step function
D. Tanh function

14 The Perceptron algorithm is guaranteed to converge only if:

A. The learning rate is greater than 1.
B. The data is linearly separable.
C. The data is normally distributed.
D. The weights are initialized to zero.

15 Which function maps the output of a linear equation to a probability value in in Logistic Regression?

A. Logarithm
B. Sigmoid (Logistic)
C. Softmax
D. Step

16 The decision boundary generated by a standard Logistic Regression model is:

A. Linear
B. Circular
C. Polynomial
D. Irregular

17 In scikit-learn's LogisticRegression, what is the purpose of the parameter C?

A. It controls the learning rate.
B. It is the inverse of regularization strength.
C. It sets the number of iterations.
D. It determines the kernel type.

18 Which loss function is minimized in Logistic Regression?

A. Mean Squared Error
B. Hinge Loss
C. Log Loss (Cross-Entropy)
D. Gini Impurity

19 How does k-Nearest Neighbors (k-NN) classify a new data point?

A. By finding the best splitting feature.
B. By calculating the probability using Bayes' theorem.
C. By taking a majority vote of the closest training examples.
D. By projecting the point onto a hyperplane.

20 Why is k-NN often referred to as a lazy learner?

A. It trains very slowly.
B. It only generalizes the data during the prediction phase.
C. It uses a simple distance metric.
D. It ignores outliers.

21 In k-NN, what is the effect of choosing a very small value for (e.g., )?

A. High Bias, Low Variance (Underfitting)
B. Low Bias, High Variance (Overfitting)
C. The model becomes a linear classifier.
D. The decision boundary becomes smooth.

22 Which preprocessing step is critical for k-NN performance?

A. Feature Scaling
B. One-hot encoding target labels
C. Increasing the number of features
D. Removing correlations

23 Which distance metric is calculated as ?

A. Euclidean Distance
B. Manhattan Distance
C. Minkowski Distance
D. Cosine Similarity

24 In a Decision Tree, what does a leaf node represent?

A. A feature to split on.
B. A decision rule.
C. A class label or probability.
D. The root of the tree.

25 Which metric does the CART algorithm (used by scikit-learn for Decision Trees) use by default to measure impurity?

A. Entropy
B. Gini Impurity
C. Log Loss
D. Mean Squared Error

26 Calculate the Gini Impurity of a node containing 3 positive samples and 3 negative samples.

A. 0.0
B. 0.25
C. 0.5
D. 1.0

27 Which hyperparameter in DecisionTreeClassifier can be used to control overfitting?

A. learning_rate
B. max_depth
C. kernel
D. C

28 What is the concept of Information Gain in Decision Trees?

A. The increase in accuracy after a split.
B. The reduction in entropy (or impurity) achieved by a split.
C. The total number of nodes in the tree.
D. The time taken to train the tree.

29 Decision Trees split the feature space into regions using boundaries that are:

A. Curved
B. Orthogonal to the feature axes
C. Diagonal
D. Circular

30 The primary objective of a Support Vector Machine (SVM) is to find a hyperplane that:

A. Minimizes the number of support vectors.
B. Maximizes the margin between classes.
C. Passes through the mean of the data.
D. Separates data with zero error regardless of margin.

31 What are Support Vectors in SVM?

A. The data points furthest from the decision boundary.
B. The data points closest to the decision boundary.
C. The misclassified data points.
D. The centroids of the classes.

32 Which technique allows SVM to perform non-linear classification?

A. Gradient Descent
B. The Kernel Trick
C. Bagging
D. Pruning

33 In SVC (Support Vector Classifier), what does a high value of Gamma () imply for an RBF kernel?

A. The decision boundary will be nearly linear.
B. Each training example has a wide-reaching influence.
C. The model fits the training data very closely (potential overfitting).
D. The margin becomes wider.

34 Which scikit-learn class is used for Support Vector Classification?

A. sklearn.svm.SVR
B. sklearn.svm.SVC
C. sklearn.linear_model.SGDClassifier
D. sklearn.tree.DecisionTreeClassifier

35 The Naïve Bayes classifier is based on which statistical theorem?

A. Central Limit Theorem
B. Bayes' Theorem
C. Pythagorean Theorem
D. Gauss-Markov Theorem

36 What is the "Naïve" assumption in Naïve Bayes?

A. All features are equally important.
B. All features are mutually independent given the class.
C. The data follows a normal distribution.
D. The classes are balanced.

37 Which variant of Naïve Bayes is best suited for continuous data assuming a bell-curve distribution?

A. MultinomialNB
B. BernoulliNB
C. GaussianNB
D. ComplementNB

38 In Text Classification with word counts, which Naïve Bayes variant is typically used?

A. GaussianNB
B. MultinomialNB
C. LinearNB
D. LogisticNB

39 What is Laplace Smoothing used for in Naïve Bayes?

A. To handle continuous variables.
B. To prevent zero probabilities for unseen features.
C. To reduce the number of features.
D. To normalize the dataset.

40 Which of the following classifiers is a Generative Model?

A. Logistic Regression
B. Support Vector Machine
C. Naïve Bayes
D. Decision Tree

41 To handle a multi-class classification problem with a binary classifier like Logistic Regression, which strategy is commonly used?

A. One-vs-Rest (OvR)
B. Gradient Boosting
C. Pruning
D. Kernel Trick

42 Which metric is calculated using the formula: ?

A. Accuracy
B. F1-Score
C. Specificity
D. Matthews Correlation Coefficient

43 If a Decision Tree is fully grown until all leaves are pure, it is likely to have:

A. High Bias
B. Low Variance
C. High Variance (Overfitting)
D. Low Accuracy on training data

44 In the context of the Confusion Matrix, Specificity is also known as:

A. True Positive Rate
B. True Negative Rate
C. False Positive Rate
D. Precision

45 Which scikit-learn utility is best used to split data into training and testing sets?

A. cross_val_score
B. train_test_split
C. GridSearchCV
D. StandardScaler

46 What happens to the decision boundary of a Logistic Regression model if the regularization parameter is very small?

A. The model overfits.
B. The coefficients become large.
C. The model underfits (high bias).
D. The boundary becomes non-linear.

47 Which of the following algorithms does NOT produce a linear decision boundary (without kernels)?

A. Logistic Regression
B. Linear Perceptron
C. Linear SVM
D. k-Nearest Neighbors

48 In SVM, which kernel is defined as ?

A. Linear Kernel
B. RBF Kernel
C. Polynomial Kernel
D. Sigmoid Kernel

49 What is the primary advantage of Naïve Bayes classifiers regarding training time?

A. They are very slow due to iterative optimization.
B. They are fast because they require a single pass over the data.
C. They are slow because they calculate distances between all points.
D. They depend on the number of support vectors.