Unit 2 - Practice Quiz

INT395 49 Questions
0 Correct 0 Wrong 49 Left
0/49

1 In the context of supervised learning, what distinguishes a classification problem from a regression problem?

A. The training data is unlabeled.
B. The target variable is categorical or discrete.
C. The input variables are continuous.
D. The target variable is continuous.

2 Which scikit-learn method is primarily used to train a classifier on a dataset ?

A. model.transform(X, y)
B. model.fit(X, y)
C. model.predict(X, y)
D. model.score(X, y)

3 What is the standard shape of the input matrix expected by scikit-learn classifiers?

A. (n_features, n_classes)
B. (n_samples, n_samples)
C. (n_samples, n_features)
D. (n_features, n_samples)

4 Which of the following metrics is defined as the ratio of correctly predicted observations to the total observations?

A. Accuracy
B. Precision
C. F1-Score
D. Recall

5 In a Confusion Matrix, what does a False Positive (FP) represent?

A. The model predicted Negative, and the actual class was Positive.
B. The model predicted Positive, and the actual class was Positive.
C. The model predicted Negative, and the actual class was Negative.
D. The model predicted Positive, but the actual class was Negative.

6 Which metric is best suited for a classification problem where False Negatives are much more costly than False Positives (e.g., detecting a deadly disease)?

A. Precision
B. Accuracy
C. Specificity
D. Recall

7 Calculate the Precision given: , , .

A. 0.91
B. 0.83
C. 0.20
D. 0.50

8 The F1-Score is the harmonic mean of which two metrics?

A. Precision and Recall
B. TPR and FPR
C. Specificity and Sensitivity
D. Accuracy and Recall

9 In an ROC curve, the x-axis and y-axis represent which metrics respectively?

A. True Negative Rate vs True Positive Rate
B. False Positive Rate vs True Positive Rate
C. Precision vs Recall
D. Recall vs Accuracy

10 What does an AUC (Area Under Curve) score of 0.5 imply about a classifier?

A. It is a perfect classifier.
B. It predicts the negative class always.
C. It has zero errors.
D. It performs no better than random guessing.

11 When using classification_report in scikit-learn, what does the macro avg represent?

A. The unweighted mean of the metric for each label.
B. The weighted average based on support size.
C. The standard deviation of the metric.
D. The accuracy of the model.

12 Which issue makes Accuracy a misleading metric?

A. Imbalanced datasets.
B. High computational cost.
C. Linearly separable data.
D. It cannot be calculated for multiclass problems.

13 What is the activation function used in the standard Perceptron algorithm for binary classification?

A. Tanh function
B. Sigmoid function
C. Heaviside step function
D. ReLU function

14 The Perceptron algorithm is guaranteed to converge only if:

A. The data is normally distributed.
B. The weights are initialized to zero.
C. The learning rate is greater than 1.
D. The data is linearly separable.

15 Which function maps the output of a linear equation to a probability value in in Logistic Regression?

A. Sigmoid (Logistic)
B. Step
C. Softmax
D. Logarithm

16 The decision boundary generated by a standard Logistic Regression model is:

A. Linear
B. Irregular
C. Circular
D. Polynomial

17 In scikit-learn's LogisticRegression, what is the purpose of the parameter C?

A. It controls the learning rate.
B. It determines the kernel type.
C. It is the inverse of regularization strength.
D. It sets the number of iterations.

18 Which loss function is minimized in Logistic Regression?

A. Hinge Loss
B. Gini Impurity
C. Mean Squared Error
D. Log Loss (Cross-Entropy)

19 How does k-Nearest Neighbors (k-NN) classify a new data point?

A. By finding the best splitting feature.
B. By projecting the point onto a hyperplane.
C. By calculating the probability using Bayes' theorem.
D. By taking a majority vote of the closest training examples.

20 Why is k-NN often referred to as a lazy learner?

A. It uses a simple distance metric.
B. It only generalizes the data during the prediction phase.
C. It ignores outliers.
D. It trains very slowly.

21 In k-NN, what is the effect of choosing a very small value for (e.g., )?

A. High Bias, Low Variance (Underfitting)
B. Low Bias, High Variance (Overfitting)
C. The model becomes a linear classifier.
D. The decision boundary becomes smooth.

22 Which preprocessing step is critical for k-NN performance?

A. Removing correlations
B. Feature Scaling
C. One-hot encoding target labels
D. Increasing the number of features

23 Which distance metric is calculated as ?

A. Minkowski Distance
B. Manhattan Distance
C. Euclidean Distance
D. Cosine Similarity

24 In a Decision Tree, what does a leaf node represent?

A. The root of the tree.
B. A class label or probability.
C. A feature to split on.
D. A decision rule.

25 Which metric does the CART algorithm (used by scikit-learn for Decision Trees) use by default to measure impurity?

A. Gini Impurity
B. Mean Squared Error
C. Entropy
D. Log Loss

26 Calculate the Gini Impurity of a node containing 3 positive samples and 3 negative samples.

A. 0.25
B. 0.0
C. 1.0
D. 0.5

27 Which hyperparameter in DecisionTreeClassifier can be used to control overfitting?

A. learning_rate
B. max_depth
C. C
D. kernel

28 What is the concept of Information Gain in Decision Trees?

A. The time taken to train the tree.
B. The increase in accuracy after a split.
C. The reduction in entropy (or impurity) achieved by a split.
D. The total number of nodes in the tree.

29 Decision Trees split the feature space into regions using boundaries that are:

A. Circular
B. Curved
C. Diagonal
D. Orthogonal to the feature axes

30 The primary objective of a Support Vector Machine (SVM) is to find a hyperplane that:

A. Separates data with zero error regardless of margin.
B. Minimizes the number of support vectors.
C. Maximizes the margin between classes.
D. Passes through the mean of the data.

31 What are Support Vectors in SVM?

A. The misclassified data points.
B. The centroids of the classes.
C. The data points furthest from the decision boundary.
D. The data points closest to the decision boundary.

32 Which technique allows SVM to perform non-linear classification?

A. The Kernel Trick
B. Gradient Descent
C. Bagging
D. Pruning

33 In SVC (Support Vector Classifier), what does a high value of Gamma () imply for an RBF kernel?

A. The margin becomes wider.
B. The model fits the training data very closely (potential overfitting).
C. Each training example has a wide-reaching influence.
D. The decision boundary will be nearly linear.

34 Which scikit-learn class is used for Support Vector Classification?

A. sklearn.tree.DecisionTreeClassifier
B. sklearn.svm.SVR
C. sklearn.svm.SVC
D. sklearn.linear_model.SGDClassifier

35 The Naïve Bayes classifier is based on which statistical theorem?

A. Central Limit Theorem
B. Pythagorean Theorem
C. Gauss-Markov Theorem
D. Bayes' Theorem

36 What is the "Naïve" assumption in Naïve Bayes?

A. All features are mutually independent given the class.
B. The classes are balanced.
C. All features are equally important.
D. The data follows a normal distribution.

37 Which variant of Naïve Bayes is best suited for continuous data assuming a bell-curve distribution?

A. MultinomialNB
B. GaussianNB
C. ComplementNB
D. BernoulliNB

38 In Text Classification with word counts, which Naïve Bayes variant is typically used?

A. GaussianNB
B. LinearNB
C. LogisticNB
D. MultinomialNB

39 What is Laplace Smoothing used for in Naïve Bayes?

A. To normalize the dataset.
B. To prevent zero probabilities for unseen features.
C. To reduce the number of features.
D. To handle continuous variables.

40 Which of the following classifiers is a Generative Model?

A. Logistic Regression
B. Support Vector Machine
C. Naïve Bayes
D. Decision Tree

41 To handle a multi-class classification problem with a binary classifier like Logistic Regression, which strategy is commonly used?

A. Gradient Boosting
B. Kernel Trick
C. One-vs-Rest (OvR)
D. Pruning

42 Which metric is calculated using the formula: ?

A. F1-Score
B. Matthews Correlation Coefficient
C. Specificity
D. Accuracy

43 If a Decision Tree is fully grown until all leaves are pure, it is likely to have:

A. High Variance (Overfitting)
B. High Bias
C. Low Variance
D. Low Accuracy on training data

44 In the context of the Confusion Matrix, Specificity is also known as:

A. False Positive Rate
B. True Positive Rate
C. True Negative Rate
D. Precision

45 Which scikit-learn utility is best used to split data into training and testing sets?

A. GridSearchCV
B. cross_val_score
C. StandardScaler
D. train_test_split

46 What happens to the decision boundary of a Logistic Regression model if the regularization parameter is very small?

A. The boundary becomes non-linear.
B. The model underfits (high bias).
C. The model overfits.
D. The coefficients become large.

47 Which of the following algorithms does NOT produce a linear decision boundary (without kernels)?

A. k-Nearest Neighbors
B. Linear Perceptron
C. Logistic Regression
D. Linear SVM

48 In SVM, which kernel is defined as ?

A. Linear Kernel
B. Polynomial Kernel
C. RBF Kernel
D. Sigmoid Kernel

49 What is the primary advantage of Naïve Bayes classifiers regarding training time?

A. They depend on the number of support vectors.
B. They are fast because they require a single pass over the data.
C. They are very slow due to iterative optimization.
D. They are slow because they calculate distances between all points.