1 $Which of the following algorithms is categorized as a 'Lazy Learner'?$

A.

Decision Trees

B.

Naïve Bayes

C.

K-Nearest Neighbors

D.

Support Vector Machines

2 $In K-Nearest Neighbors, what is the likely effect of choosing a very small value for 'k' (e.g., k=1)?$

A.

High bias and low variance

B.

The model becomes too simple

C.

High variance and overfitting

D.

The model ignores local patterns

3 $Which distance metric is most commonly used in k-NN for continuous numerical variables?$

A.

Hamming distance

B.

Euclidean distance

C.

Jaccard similarity

D.

Cosine similarity

4 $Why is feature scaling (normalization/standardization) important in k-NN?$

A.

To increase the value of k

B.

To prevent features with larger scales from dominating the distance calculation

C.

To convert categorical data to numerical

D.

To reduce the size of the dataset

5 $The 'Naïve' in Naïve Bayes refers to which fundamental assumption?$

A.

The algorithm is simple to implement

B.

All features are dependent on each other

C.

All features are conditionally independent given the class

D.

The prior probabilities are equal

6 $Naïve Bayes is based on which mathematical theorem?$

A.

Pythagorean Theorem

B.

Central Limit Theorem

C.

Bayes' Theorem

D.

Taylor's Theorem

7 $What is the purpose of Laplace Smoothing in Naïve Bayes?$

A.

To handle missing values

B.

To handle the problem of zero probability for unseen features

C.

To normalize the data

D.

To reduce the dimensionality

8 $Which strategy is primarily used to build Decision Trees?$

A.

Lazy learning

B.

Divide and Conquer

C.

Backpropagation

D.

Gradient Descent

9 $In a Decision Tree, what does a leaf node represent?$

A.

A feature to split on

B.

The root of the tree

C.

A class label or decision

D.

The entropy value

10 $Which metric is commonly used to measure impurity in Decision Trees?$

A.

Euclidean distance

B.

Gini Index

C.

Correlation Coefficient

D.

R-squared

11 $The process of removing branches from a decision tree to prevent overfitting is called:$

A.

Scaling

B.

Regularization

C.

Pruning

D.

Boosting

12 $Which concept represents the expected reduction in entropy caused by partitioning the examples according to an attribute?$

A.

Gini Impurity

B.

Information Gain

C.

Log Loss

D.

Maximum Margin

13 $In rule-based classification, what does 'coverage' refer to?$

A.

The accuracy of the rule

B.

The number of instances that satisfy the rule's condition

C.

The complexity of the rule

D.

The number of features used

14 $The OneR (One Rule) algorithm generates rules based on:$

A.

All attributes simultaneously

B.

The single most informative attribute

C.

A random attribute

D.

The nearest neighbors

15 $What is the primary objective of a Support Vector Machine (SVM)?$

A.

Minimize the number of features

B.

Maximize the posterior probability

C.

Find a hyperplane that maximizes the margin between classes

D.

Create the deepest possible decision tree

16 $The data points that lie closest to the decision boundary in an SVM are known as:$

A.

Outliers

B.

Support Vectors

C.

Centroids

D.

Noise

17 $What technique does SVM use to handle non-linearly separable data?$

A.

Pruning

B.

Kernel Trick

C.

Smoothing

D.

Bagging

18 $In SVM, what is the role of the 'C' hyperparameter?$

A.

It determines the number of kernels

B.

It controls the trade-off between maximizing the margin and minimizing classification errors

C.

It sets the depth of the tree

D.

It calculates the Euclidean distance

19 $In a Confusion Matrix, what does 'False Positive' (Type I Error) represent?$

A.

Correctly predicting the positive class

B.

Correctly predicting the negative class

C.

Incorrectly predicting the positive class when it is actually negative

D.

Incorrectly predicting the negative class when it is actually positive

20 $Which formula correctly calculates Accuracy?$

A.

(TP + TN) / (TP + TN + FP + FN)

B.

TP / (TP + FP)

C.

TP / (TP + FN)

D.

2 (Precision Recall) / (Precision + Recall)

21 $Accuracy is often a misleading metric when:$

A.

The dataset is small

B.

The dataset is imbalanced

C.

The dataset is perfectly balanced

D.

The model is a decision tree

22 $Which metric represents the ratio of correctly predicted positive observations to the total predicted positives?$

A.

Recall

B.

Precision

C.

Accuracy

D.

Specificity

23 $Recall is also known as:$

A.

Precision

B.

Specificity

C.

Sensitivity

D.

F1 Score

24 $The F1 Score is the harmonic mean of which two metrics?$

A.

Accuracy and Error Rate

B.

Precision and Recall

C.

Sensitivity and Specificity

D.

True Positive Rate and False Positive Rate

25 $Which metric would be most important for a spam detection system where it is acceptable to miss some spam, but critical not to delete legitimate emails (high cost of False Positive)?$

A.

Recall

B.

Precision

C.

Log Loss

D.

Sensitivity

26 $Which metric would be most important for cancer detection where missing a positive case is dangerous (high cost of False Negative)?$

A.

Precision

B.

Recall

C.

Specificity

D.

Accuracy

27 $What does AUC stand for in the context of model evaluation?$

A.

Area Under the Curve

B.

Average Unit Cost

C.

Algorithm User Context

D.

Accuracy Under Classification

28 $The ROC curve plots which two metrics against each other?$

A.

Precision vs Recall

B.

True Positive Rate vs False Positive Rate

C.

Accuracy vs Loss

D.

Sensitivity vs Specificity

29 $An AUC score of 0.5 indicates:$

A.

A perfect model

B.

A model that predicts randomly

C.

A model with high precision

D.

A model with zero error

30 $Logarithmic Loss (Log Loss) penalizes a classifier based on:$

A.

The number of misclassifications only

B.

The confidence of the predicted probabilities

C.

The depth of the tree

D.

The number of support vectors

31 $What is the ideal value for Logarithmic Loss?$

A.

0

B.

0.5

C.

1

D.

100

32 $Which of the following is a disadvantage of Decision Trees?$

A.

Difficult to interpret

B.

Requires feature scaling

C.

Prone to overfitting if not pruned

D.

Cannot handle categorical data

33 $Which algorithm is generally considered a 'Black Box' model due to low interpretability?$

A.

Decision Trees

B.

Rules (RIPPER)

C.

Support Vector Machines (with RBF kernel)

D.

Linear Regression

34 $In Naïve Bayes, what is the 'Posterior Probability'?$

A.

The probability of the evidence given the class

B.

The initial probability of the class

C.

The probability of the class given the evidence

D.

The probability of the evidence regardless of class

35 $Which classification algorithm is parametric?$

A.

K-Nearest Neighbors

B.

Decision Trees

C.

Naïve Bayes

D.

None of the above

36 $What is the 'Hinge Loss' function associated with?$

A.

Logistic Regression

B.

Decision Trees

C.

Support Vector Machines

D.

K-Means

37 $What happens to the computational cost of k-NN during the prediction phase as the dataset size grows?$

A.

It remains constant

B.

It decreases

C.

It increases significantly

D.

It becomes zero

38 $Entropy in Information Theory is a measure of:$

A.

Distance

B.

Disorder or Uncertainty

C.

Accuracy

D.

Margin width

39 $RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is an algorithm used for:$

A.

Rule Induction

B.

Clustering

C.

Regression

D.

Dimensionality Reduction

40 $Which of the following describes a 'False Negative' (Type II Error)?$

A.

Predicting Positive when actually Positive

B.

Predicting Negative when actually Negative

C.

Predicting Positive when actually Negative

D.

Predicting Negative when actually Positive

41 $If Precision = 1.0 and Recall = 1.0, what is the F1 Score?$

A.

0

B.

0.5

C.

1.0

D.

2.0

42 $Which evaluation metric calculates the proportion of actual negatives that are correctly identified?$

A.

Sensitivity

B.

Recall

C.

Specificity

D.

Precision

43 $In a decision tree, if a node contains only samples from a single class, its entropy is:$

A.

0

B.

0.5

C.

1

D.

Infinite

44 $Which kernel is the default for non-linear SVMs in many libraries?$

A.

Linear

B.

Polynomial

C.

Radial Basis Function (RBF)

D.

Sigmoid

45 $Generative models like Naïve Bayes model:$

A.

The boundary between classes directly

B.

The distribution of individual classes (Joint probability)

C.

The distance between points

D.

The error gradients

46 $Recursive Partitioning is a technique synonymous with:$

A.

Building Decision Trees

B.

Calculating k-NN distances

C.

Optimizing SVM margins

D.

Calculating Bayes probabilities

47 $When interpreting a Confusion Matrix for a multi-class problem (e.g., 3 classes), the matrix dimensions are:$

A.

2x2

B.

3x3

C.

1x3

D.

3x1

48 $Which algorithm is most sensitive to outliers?$

A.

Decision Trees

B.

K-Nearest Neighbors

C.

Naïve Bayes

D.

Rules

49 $What is the relationship between Error Rate and Accuracy?$

A.

Error Rate = Accuracy

B.

Error Rate = 1 - Accuracy

C.

Error Rate = 1 + Accuracy

D.

Error Rate = Accuracy / 2

50 $The 'Zero Frequency' problem in Naïve Bayes is solved using:$

A.

Pruning

B.

Laplace Smoothing

C.

Feature Scaling

D.

Kernel Trick

Unit 3 - Practice Quiz