Unit 3 - Practice Quiz

INT234 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following algorithms is categorized as a 'Lazy Learner'?

A. K-Nearest Neighbors
B. Support Vector Machines
C. Naïve Bayes
D. Decision Trees

2 In K-Nearest Neighbors, what is the likely effect of choosing a very small value for 'k' (e.g., k=1)?

A. The model becomes too simple
B. The model ignores local patterns
C. High variance and overfitting
D. High bias and low variance

3 Which distance metric is most commonly used in k-NN for continuous numerical variables?

A. Euclidean distance
B. Cosine similarity
C. Hamming distance
D. Jaccard similarity

4 Why is feature scaling (normalization/standardization) important in k-NN?

A. To convert categorical data to numerical
B. To increase the value of k
C. To prevent features with larger scales from dominating the distance calculation
D. To reduce the size of the dataset

5 The 'Naïve' in Naïve Bayes refers to which fundamental assumption?

A. All features are conditionally independent given the class
B. The prior probabilities are equal
C. The algorithm is simple to implement
D. All features are dependent on each other

6 Naïve Bayes is based on which mathematical theorem?

A. Central Limit Theorem
B. Taylor's Theorem
C. Bayes' Theorem
D. Pythagorean Theorem

7 What is the purpose of Laplace Smoothing in Naïve Bayes?

A. To handle the problem of zero probability for unseen features
B. To normalize the data
C. To reduce the dimensionality
D. To handle missing values

8 Which strategy is primarily used to build Decision Trees?

A. Lazy learning
B. Divide and Conquer
C. Gradient Descent
D. Backpropagation

9 In a Decision Tree, what does a leaf node represent?

A. The entropy value
B. The root of the tree
C. A class label or decision
D. A feature to split on

10 Which metric is commonly used to measure impurity in Decision Trees?

A. R-squared
B. Gini Index
C. Euclidean distance
D. Correlation Coefficient

11 The process of removing branches from a decision tree to prevent overfitting is called:

A. Scaling
B. Pruning
C. Regularization
D. Boosting

12 Which concept represents the expected reduction in entropy caused by partitioning the examples according to an attribute?

A. Maximum Margin
B. Information Gain
C. Log Loss
D. Gini Impurity

13 In rule-based classification, what does 'coverage' refer to?

A. The number of instances that satisfy the rule's condition
B. The number of features used
C. The complexity of the rule
D. The accuracy of the rule

14 The OneR (One Rule) algorithm generates rules based on:

A. The nearest neighbors
B. All attributes simultaneously
C. A random attribute
D. The single most informative attribute

15 What is the primary objective of a Support Vector Machine (SVM)?

A. Find a hyperplane that maximizes the margin between classes
B. Create the deepest possible decision tree
C. Maximize the posterior probability
D. Minimize the number of features

16 The data points that lie closest to the decision boundary in an SVM are known as:

A. Outliers
B. Support Vectors
C. Noise
D. Centroids

17 What technique does SVM use to handle non-linearly separable data?

A. Smoothing
B. Bagging
C. Pruning
D. Kernel Trick

18 In SVM, what is the role of the 'C' hyperparameter?

A. It controls the trade-off between maximizing the margin and minimizing classification errors
B. It determines the number of kernels
C. It calculates the Euclidean distance
D. It sets the depth of the tree

19 In a Confusion Matrix, what does 'False Positive' (Type I Error) represent?

A. Incorrectly predicting the positive class when it is actually negative
B. Correctly predicting the positive class
C. Incorrectly predicting the negative class when it is actually positive
D. Correctly predicting the negative class

20 Which formula correctly calculates Accuracy?

A. 2 (Precision Recall) / (Precision + Recall)
B. (TP + TN) / (TP + TN + FP + FN)
C. TP / (TP + FP)
D. TP / (TP + FN)

21 Accuracy is often a misleading metric when:

A. The dataset is imbalanced
B. The model is a decision tree
C. The dataset is small
D. The dataset is perfectly balanced

22 Which metric represents the ratio of correctly predicted positive observations to the total predicted positives?

A. Accuracy
B. Precision
C. Recall
D. Specificity

23 Recall is also known as:

A. Specificity
B. Sensitivity
C. F1 Score
D. Precision

24 The F1 Score is the harmonic mean of which two metrics?

A. Precision and Recall
B. Accuracy and Error Rate
C. Sensitivity and Specificity
D. True Positive Rate and False Positive Rate

25 Which metric would be most important for a spam detection system where it is acceptable to miss some spam, but critical not to delete legitimate emails (high cost of False Positive)?

A. Precision
B. Log Loss
C. Recall
D. Sensitivity

26 Which metric would be most important for cancer detection where missing a positive case is dangerous (high cost of False Negative)?

A. Precision
B. Accuracy
C. Specificity
D. Recall

27 What does AUC stand for in the context of model evaluation?

A. Accuracy Under Classification
B. Area Under the Curve
C. Algorithm User Context
D. Average Unit Cost

28 The ROC curve plots which two metrics against each other?

A. Sensitivity vs Specificity
B. Precision vs Recall
C. Accuracy vs Loss
D. True Positive Rate vs False Positive Rate

29 An AUC score of 0.5 indicates:

A. A model with high precision
B. A model that predicts randomly
C. A perfect model
D. A model with zero error

30 Logarithmic Loss (Log Loss) penalizes a classifier based on:

A. The confidence of the predicted probabilities
B. The number of misclassifications only
C. The depth of the tree
D. The number of support vectors

31 What is the ideal value for Logarithmic Loss?

A. 0.5
B. 0
C. 100
D. 1

32 Which of the following is a disadvantage of Decision Trees?

A. Requires feature scaling
B. Difficult to interpret
C. Prone to overfitting if not pruned
D. Cannot handle categorical data

33 Which algorithm is generally considered a 'Black Box' model due to low interpretability?

A. Rules (RIPPER)
B. Linear Regression
C. Decision Trees
D. Support Vector Machines (with RBF kernel)

34 In Naïve Bayes, what is the 'Posterior Probability'?

A. The probability of the evidence regardless of class
B. The probability of the class given the evidence
C. The probability of the evidence given the class
D. The initial probability of the class

35 Which classification algorithm is parametric?

A. None of the above
B. Decision Trees
C. Naïve Bayes
D. K-Nearest Neighbors

36 What is the 'Hinge Loss' function associated with?

A. K-Means
B. Decision Trees
C. Logistic Regression
D. Support Vector Machines

37 What happens to the computational cost of k-NN during the prediction phase as the dataset size grows?

A. It increases significantly
B. It remains constant
C. It decreases
D. It becomes zero

38 Entropy in Information Theory is a measure of:

A. Accuracy
B. Disorder or Uncertainty
C. Distance
D. Margin width

39 RIPPER (Repeated Incremental Pruning to Produce Error Reduction) is an algorithm used for:

A. Clustering
B. Rule Induction
C. Regression
D. Dimensionality Reduction

40 Which of the following describes a 'False Negative' (Type II Error)?

A. Predicting Positive when actually Positive
B. Predicting Positive when actually Negative
C. Predicting Negative when actually Negative
D. Predicting Negative when actually Positive

41 If Precision = 1.0 and Recall = 1.0, what is the F1 Score?

A. 0.5
B. 0
C. 1.0
D. 2.0

42 Which evaluation metric calculates the proportion of actual negatives that are correctly identified?

A. Recall
B. Specificity
C. Sensitivity
D. Precision

43 In a decision tree, if a node contains only samples from a single class, its entropy is:

A. Infinite
B. 0.5
C. 0
D. 1

44 Which kernel is the default for non-linear SVMs in many libraries?

A. Polynomial
B. Radial Basis Function (RBF)
C. Sigmoid
D. Linear

45 Generative models like Naïve Bayes model:

A. The distribution of individual classes (Joint probability)
B. The error gradients
C. The boundary between classes directly
D. The distance between points

46 Recursive Partitioning is a technique synonymous with:

A. Building Decision Trees
B. Calculating Bayes probabilities
C. Optimizing SVM margins
D. Calculating k-NN distances

47 When interpreting a Confusion Matrix for a multi-class problem (e.g., 3 classes), the matrix dimensions are:

A. 2x2
B. 3x1
C. 1x3
D. 3x3

48 Which algorithm is most sensitive to outliers?

A. Naïve Bayes
B. Rules
C. K-Nearest Neighbors
D. Decision Trees

49 What is the relationship between Error Rate and Accuracy?

A. Error Rate = Accuracy / 2
B. Error Rate = 1 + Accuracy
C. Error Rate = Accuracy
D. Error Rate = 1 - Accuracy

50 The 'Zero Frequency' problem in Naïve Bayes is solved using:

A. Feature Scaling
B. Laplace Smoothing
C. Kernel Trick
D. Pruning