1In the context of supervised learning, what distinguishes a classification problem from a regression problem?
A.The input variables are continuous.
B.The target variable is continuous.
C.The target variable is categorical or discrete.
D.The training data is unlabeled.
Correct Answer: The target variable is categorical or discrete.
Explanation:Classification involves predicting a discrete class label (e.g., Spam/Not Spam), whereas regression involves predicting a continuous quantity.
Incorrect! Try again.
2Which scikit-learn method is primarily used to train a classifier on a dataset ?
A.model.predict(X, y)
B.model.fit(X, y)
C.model.transform(X, y)
D.model.score(X, y)
Correct Answer: model.fit(X, y)
Explanation:In scikit-learn, the fit method is used to train the model parameters using the training data and labels .
Incorrect! Try again.
3What is the standard shape of the input matrix expected by scikit-learn classifiers?
A.(n_features, n_samples)
B.(n_samples, n_features)
C.(n_samples, n_samples)
D.(n_features, n_classes)
Correct Answer: (n_samples, n_features)
Explanation:Scikit-learn expects data in a 2D array where rows represent samples and columns represent features.
Incorrect! Try again.
4Which of the following metrics is defined as the ratio of correctly predicted observations to the total observations?
A.Recall
B.Precision
C.Accuracy
D.F1-Score
Correct Answer: Accuracy
Explanation:Accuracy is calculated as .
Incorrect! Try again.
5In a Confusion Matrix, what does a False Positive (FP) represent?
A.The model predicted Positive, and the actual class was Positive.
B.The model predicted Negative, and the actual class was Positive.
C.The model predicted Positive, but the actual class was Negative.
D.The model predicted Negative, and the actual class was Negative.
Correct Answer: The model predicted Positive, but the actual class was Negative.
Explanation:A False Positive is a 'Type I error' where the model incorrectly predicts the positive class for a negative instance.
Incorrect! Try again.
6Which metric is best suited for a classification problem where False Negatives are much more costly than False Positives (e.g., detecting a deadly disease)?
A.Precision
B.Recall
C.Specificity
D.Accuracy
Correct Answer: Recall
Explanation:Recall (Sensitivity) measures the proportion of actual positives identified. High recall minimizes false negatives.
Incorrect! Try again.
7Calculate the Precision given: , , .
A.0.83
B.0.91
C.0.50
D.0.20
Correct Answer: 0.83
Explanation:Precision is .
Incorrect! Try again.
8The F1-Score is the harmonic mean of which two metrics?
A.Accuracy and Recall
B.Precision and Recall
C.Specificity and Sensitivity
D.TPR and FPR
Correct Answer: Precision and Recall
Explanation:The F1-Score balances Precision and Recall: .
Incorrect! Try again.
9In an ROC curve, the x-axis and y-axis represent which metrics respectively?
A.Precision vs Recall
B.False Positive Rate vs True Positive Rate
C.True Negative Rate vs True Positive Rate
D.Recall vs Accuracy
Correct Answer: False Positive Rate vs True Positive Rate
Explanation:The Receiver Operating Characteristic (ROC) curve plots the True Positive Rate (Sensitivity) against the False Positive Rate ().
Incorrect! Try again.
10What does an AUC (Area Under Curve) score of 0.5 imply about a classifier?
A.It is a perfect classifier.
B.It has zero errors.
C.It performs no better than random guessing.
D.It predicts the negative class always.
Correct Answer: It performs no better than random guessing.
Explanation:An AUC of 0.5 represents a model with no discrimination capacity, equivalent to flipping a coin.
Incorrect! Try again.
11When using classification_report in scikit-learn, what does the macro avg represent?
A.The weighted average based on support size.
B.The unweighted mean of the metric for each label.
C.The accuracy of the model.
D.The standard deviation of the metric.
Correct Answer: The unweighted mean of the metric for each label.
Explanation:Macro average calculates metrics for each class independently and then takes the average, treating all classes equally regardless of imbalance.
Incorrect! Try again.
12Which issue makes Accuracy a misleading metric?
A.High computational cost.
B.It cannot be calculated for multiclass problems.
C.Imbalanced datasets.
D.Linearly separable data.
Correct Answer: Imbalanced datasets.
Explanation:In highly imbalanced datasets (e.g., 99% Class A, 1% Class B), a model predicting only Class A achieves 99% accuracy but is useless.
Incorrect! Try again.
13What is the activation function used in the standard Perceptron algorithm for binary classification?
A.Sigmoid function
B.ReLU function
C.Heaviside step function
D.Tanh function
Correct Answer: Heaviside step function
Explanation:The standard Perceptron uses a step function that outputs 1 if and 0 (or -1) otherwise.
Incorrect! Try again.
14The Perceptron algorithm is guaranteed to converge only if:
A.The learning rate is greater than 1.
B.The data is linearly separable.
C.The data is normally distributed.
D.The weights are initialized to zero.
Correct Answer: The data is linearly separable.
Explanation:The Perceptron Convergence Theorem states that if the data is linearly separable, the algorithm will find a separating hyperplane in a finite number of steps.
Incorrect! Try again.
15Which function maps the output of a linear equation to a probability value in in Logistic Regression?
A.Logarithm
B.Sigmoid (Logistic)
C.Softmax
D.Step
Correct Answer: Sigmoid (Logistic)
Explanation:The Sigmoid function, , maps real-valued numbers to the range .
Incorrect! Try again.
16The decision boundary generated by a standard Logistic Regression model is:
A.Linear
B.Circular
C.Polynomial
D.Irregular
Correct Answer: Linear
Explanation:Logistic regression is a linear classifier because the decision boundary is determined by a linear combination of inputs ().
Incorrect! Try again.
17In scikit-learn's LogisticRegression, what is the purpose of the parameter C?
A.It controls the learning rate.
B.It is the inverse of regularization strength.
C.It sets the number of iterations.
D.It determines the kernel type.
Correct Answer: It is the inverse of regularization strength.
Explanation:Smaller values of C specify stronger regularization (penalizing large weights), while larger values imply weaker regularization.
Incorrect! Try again.
18Which loss function is minimized in Logistic Regression?
A.Mean Squared Error
B.Hinge Loss
C.Log Loss (Cross-Entropy)
D.Gini Impurity
Correct Answer: Log Loss (Cross-Entropy)
Explanation:Logistic regression minimizes the negative log-likelihood, also known as Log Loss or Binary Cross-Entropy.
Incorrect! Try again.
19How does k-Nearest Neighbors (k-NN) classify a new data point?
A.By finding the best splitting feature.
B.By calculating the probability using Bayes' theorem.
C.By taking a majority vote of the closest training examples.
D.By projecting the point onto a hyperplane.
Correct Answer: By taking a majority vote of the closest training examples.
Explanation:k-NN identifies the training samples closest to the query point and assigns the most frequent class among them.
Incorrect! Try again.
20Why is k-NN often referred to as a lazy learner?
A.It trains very slowly.
B.It only generalizes the data during the prediction phase.
C.It uses a simple distance metric.
D.It ignores outliers.
Correct Answer: It only generalizes the data during the prediction phase.
Explanation:Lazy learners do not build a model during the training phase; they store the training data and perform computation only when a prediction is requested.
Incorrect! Try again.
21In k-NN, what is the effect of choosing a very small value for (e.g., )?
A.High Bias, Low Variance (Underfitting)
B.Low Bias, High Variance (Overfitting)
C.The model becomes a linear classifier.
D.The decision boundary becomes smooth.
Correct Answer: Low Bias, High Variance (Overfitting)
Explanation:With , the model is very sensitive to noise in the training data, leading to complex decision boundaries and potential overfitting (High Variance).
Incorrect! Try again.
22Which preprocessing step is critical for k-NN performance?
A.Feature Scaling
B.One-hot encoding target labels
C.Increasing the number of features
D.Removing correlations
Correct Answer: Feature Scaling
Explanation:Since k-NN relies on distance calculations (like Euclidean), features with larger scales can dominate the distance metric if not normalized or standardized.
Incorrect! Try again.
23Which distance metric is calculated as ?
A.Euclidean Distance
B.Manhattan Distance
C.Minkowski Distance
D.Cosine Similarity
Correct Answer: Manhattan Distance
Explanation:Manhattan distance (L1 norm) is the sum of the absolute differences between the coordinates.
Incorrect! Try again.
24In a Decision Tree, what does a leaf node represent?
A.A feature to split on.
B.A decision rule.
C.A class label or probability.
D.The root of the tree.
Correct Answer: A class label or probability.
Explanation:Leaf nodes are the terminal nodes of the tree where no further splitting occurs, providing the final prediction.
Incorrect! Try again.
25Which metric does the CART algorithm (used by scikit-learn for Decision Trees) use by default to measure impurity?
A.Entropy
B.Gini Impurity
C.Log Loss
D.Mean Squared Error
Correct Answer: Gini Impurity
Explanation:Scikit-learn's DecisionTreeClassifier uses criterion='gini' by default. It measures the probability of misclassifying a randomly chosen element.
Incorrect! Try again.
26Calculate the Gini Impurity of a node containing 3 positive samples and 3 negative samples.
A.0.0
B.0.25
C.0.5
D.1.0
Correct Answer: 0.5
Explanation:Gini = .
Incorrect! Try again.
27Which hyperparameter in DecisionTreeClassifier can be used to control overfitting?
A.learning_rate
B.max_depth
C.kernel
D.C
Correct Answer: max_depth
Explanation:Limiting max_depth prevents the tree from growing too complex and memorizing the training noise.
Incorrect! Try again.
28What is the concept of Information Gain in Decision Trees?
A.The increase in accuracy after a split.
B.The reduction in entropy (or impurity) achieved by a split.
C.The total number of nodes in the tree.
D.The time taken to train the tree.
Correct Answer: The reduction in entropy (or impurity) achieved by a split.
Explanation:Information Gain measures how much information a feature provides about the class, calculated as Entropy(parent) - Weighted Average Entropy(children).
Incorrect! Try again.
29Decision Trees split the feature space into regions using boundaries that are:
A.Curved
B.Orthogonal to the feature axes
C.Diagonal
D.Circular
Correct Answer: Orthogonal to the feature axes
Explanation:Standard decision trees make splits based on single features (e.g., ), resulting in rectangular decision regions aligned with the axes.
Incorrect! Try again.
30The primary objective of a Support Vector Machine (SVM) is to find a hyperplane that:
A.Minimizes the number of support vectors.
B.Maximizes the margin between classes.
C.Passes through the mean of the data.
D.Separates data with zero error regardless of margin.
Correct Answer: Maximizes the margin between classes.
Explanation:SVM seeks the 'maximum margin hyperplane' to improve the generalization ability of the classifier.
Incorrect! Try again.
31What are Support Vectors in SVM?
A.The data points furthest from the decision boundary.
B.The data points closest to the decision boundary.
C.The misclassified data points.
D.The centroids of the classes.
Correct Answer: The data points closest to the decision boundary.
Explanation:Support vectors are the critical elements of the training set that lie on the margin boundaries; they essentially define the hyperplane.
Incorrect! Try again.
32Which technique allows SVM to perform non-linear classification?
A.Gradient Descent
B.The Kernel Trick
C.Bagging
D.Pruning
Correct Answer: The Kernel Trick
Explanation:The Kernel Trick maps input data into a higher-dimensional space where a linear separator can be found, without explicitly computing the coordinates.
Incorrect! Try again.
33In SVC (Support Vector Classifier), what does a high value of Gamma () imply for an RBF kernel?
A.The decision boundary will be nearly linear.
B.Each training example has a wide-reaching influence.
C.The model fits the training data very closely (potential overfitting).
D.The margin becomes wider.
Correct Answer: The model fits the training data very closely (potential overfitting).
Explanation:High gamma means only points very close to the decision boundary are considered, leading to complex, tight boundaries that capture noise.
Incorrect! Try again.
34Which scikit-learn class is used for Support Vector Classification?
A.sklearn.svm.SVR
B.sklearn.svm.SVC
C.sklearn.linear_model.SGDClassifier
D.sklearn.tree.DecisionTreeClassifier
Correct Answer: sklearn.svm.SVC
Explanation:SVC stands for Support Vector Classification. SVR is for Regression.
Incorrect! Try again.
35The Naïve Bayes classifier is based on which statistical theorem?
A.Central Limit Theorem
B.Bayes' Theorem
C.Pythagorean Theorem
D.Gauss-Markov Theorem
Correct Answer: Bayes' Theorem
Explanation:Naïve Bayes applies Bayes' Theorem: .
Incorrect! Try again.
36What is the "Naïve" assumption in Naïve Bayes?
A.All features are equally important.
B.All features are mutually independent given the class.
C.The data follows a normal distribution.
D.The classes are balanced.
Correct Answer: All features are mutually independent given the class.
Explanation:It assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature, simplifying the computation.
Incorrect! Try again.
37Which variant of Naïve Bayes is best suited for continuous data assuming a bell-curve distribution?
A.MultinomialNB
B.BernoulliNB
C.GaussianNB
D.ComplementNB
Correct Answer: GaussianNB
Explanation:Gaussian Naïve Bayes assumes that the likelihood of the features is Gaussian (normally distributed).
Incorrect! Try again.
38In Text Classification with word counts, which Naïve Bayes variant is typically used?
A.GaussianNB
B.MultinomialNB
C.LinearNB
D.LogisticNB
Correct Answer: MultinomialNB
Explanation:Multinomial Naïve Bayes is suitable for features that represent counts or discrete frequencies (like word counts in text).
Incorrect! Try again.
39What is Laplace Smoothing used for in Naïve Bayes?
A.To handle continuous variables.
B.To prevent zero probabilities for unseen features.
C.To reduce the number of features.
D.To normalize the dataset.
Correct Answer: To prevent zero probabilities for unseen features.
Explanation:If a feature value in the test set was not present in the training set for a class, the probability becomes 0. Smoothing adds a small count (usually 1) to avoid this.
Incorrect! Try again.
40Which of the following classifiers is a Generative Model?
A.Logistic Regression
B.Support Vector Machine
C.Naïve Bayes
D.Decision Tree
Correct Answer: Naïve Bayes
Explanation:Naïve Bayes models the joint probability (how the data is generated), whereas the others are Discriminative models modeling directly.
Incorrect! Try again.
41To handle a multi-class classification problem with a binary classifier like Logistic Regression, which strategy is commonly used?
A.One-vs-Rest (OvR)
B.Gradient Boosting
C.Pruning
D.Kernel Trick
Correct Answer: One-vs-Rest (OvR)
Explanation:OvR trains one classifier per class (Class vs all other classes) and selects the class with the highest confidence score.
Incorrect! Try again.
42Which metric is calculated using the formula: ?
A.Accuracy
B.F1-Score
C.Specificity
D.Matthews Correlation Coefficient
Correct Answer: F1-Score
Explanation:This is an algebraic rearrangement of the F1-Score formula (Harmonic mean of Precision and Recall).
Incorrect! Try again.
43If a Decision Tree is fully grown until all leaves are pure, it is likely to have:
A.High Bias
B.Low Variance
C.High Variance (Overfitting)
D.Low Accuracy on training data
Correct Answer: High Variance (Overfitting)
Explanation:A fully grown tree captures all the noise and specific patterns of the training data, leading to poor generalization (Overfitting/High Variance).
Incorrect! Try again.
44In the context of the Confusion Matrix, Specificity is also known as:
A.True Positive Rate
B.True Negative Rate
C.False Positive Rate
D.Precision
Correct Answer: True Negative Rate
Explanation:Specificity measures the proportion of actual negatives that are correctly identified: .
Incorrect! Try again.
45Which scikit-learn utility is best used to split data into training and testing sets?
A.cross_val_score
B.train_test_split
C.GridSearchCV
D.StandardScaler
Correct Answer: train_test_split
Explanation:train_test_split from sklearn.model_selection is the standard function to shuffle and split arrays into training and testing subsets.
Incorrect! Try again.
46What happens to the decision boundary of a Logistic Regression model if the regularization parameter is very small?
A.The model overfits.
B.The coefficients become large.
C.The model underfits (high bias).
D.The boundary becomes non-linear.
Correct Answer: The model underfits (high bias).
Explanation:A small C implies strong regularization, forcing weights to be small. This restricts model complexity, potentially leading to underfitting.
Incorrect! Try again.
47Which of the following algorithms does NOT produce a linear decision boundary (without kernels)?
A.Logistic Regression
B.Linear Perceptron
C.Linear SVM
D.k-Nearest Neighbors
Correct Answer: k-Nearest Neighbors
Explanation:k-NN produces complex, non-linear decision boundaries based on local neighborhoods, unlike the other linear methods.
Incorrect! Try again.
48In SVM, which kernel is defined as ?
A.Linear Kernel
B.RBF Kernel
C.Polynomial Kernel
D.Sigmoid Kernel
Correct Answer: Polynomial Kernel
Explanation:This is the mathematical formulation for a Polynomial kernel of degree .
Incorrect! Try again.
49What is the primary advantage of Naïve Bayes classifiers regarding training time?
A.They are very slow due to iterative optimization.
B.They are fast because they require a single pass over the data.
C.They are slow because they calculate distances between all points.
D.They depend on the number of support vectors.
Correct Answer: They are fast because they require a single pass over the data.
Explanation:Naïve Bayes is computationally efficient because it only requires calculating prior probabilities and conditional feature probabilities, which can be done in one pass.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.