Unit 5 - Practice Quiz

INT255 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary geometric goal of a Support Vector Machine (SVM) for classification?

Geometric interpretation of classification margins Easy
A. To find the line that passes through the most data points
B. To maximize the margin between classes
C. To connect all data points of the same class
D. To minimize the number of support vectors

2 In the context of SVM, what are "support vectors"?

Geometric interpretation of classification margins Easy
A. The data points that are misclassified
B. All the data points in the training set
C. The data points that lie on or closest to the margin boundaries
D. The data points that are furthest from the decision boundary

3 What is the decision boundary created by a linear SVM called?

Geometric interpretation of classification margins Easy
A. A centroid
B. A hyperplane
C. A regression line
D. A decision tree

4 A Hard Margin SVM is suitable only when the training data is...

Hard margin and soft margin SVM Easy
A. Perfectly linearly separable
B. Clustered into a single group
C. Very large
D. Not linearly separable

5 What is the main advantage of a Soft Margin SVM over a Hard Margin SVM?

Hard margin and soft margin SVM Easy
A. It always finds a wider margin
B. It can handle data that is not linearly separable and is more robust to outliers
C. It only works for non-linear data
D. It is computationally faster

6 In a Soft Margin SVM, what is the role of the slack variable ?

Hard margin and soft margin SVM Easy
A. It defines the width of the margin
B. It is the weight vector of the hyperplane
C. It measures how much a data point violates the margin
D. It is a random noise parameter

7 What does the hyperparameter control in a Soft Margin SVM?

Hard margin and soft margin SVM Easy
A. The learning rate of the optimizer
B. The type of kernel to be used
C. The number of dimensions in the feature space
D. The trade-off between maximizing the margin and minimizing classification errors

8 What is the main purpose of using the Lagrangian formulation in the context of SVM optimization?

Lagrangian formulation Easy
A. To visualize the data in 2D
B. To select the best kernel function automatically
C. To convert a constrained optimization problem into a form that is easier to solve
D. To increase the number of features

9 In the Lagrangian formulation of SVM, what are the variables called?

Lagrangian formulation Easy
A. Slack variables
B. Lagrange multipliers
C. Bias terms
D. Weight vectors

10 According to the Karush-Kuhn-Tucker (KKT) conditions for SVM, if a data point is NOT a support vector, its corresponding Lagrange multiplier will be:

Lagrangian formulation Easy
A.
B.
C.
D.

11 The primal optimization problem for a hard-margin SVM aims to minimize which quantity?

Primal and dual optimization problems Easy
A. The number of misclassified points
B. The bias term,
C. The norm of the weight vector,
D. The sum of the distances from the margin

12 A primary motivation for solving the dual problem instead of the primal problem in SVMs is that it enables the use of:

Primal and dual optimization problems Easy
A. Gradient descent
B. Feature scaling
C. Regularization
D. The kernel trick

13 What is the fundamental idea behind the "kernel trick"?

Kernel trick and kernel functions Easy
A. To reduce the dimensionality of the data before classification
B. To convert a classification problem into a regression problem
C. To compute dot products in a high-dimensional feature space without explicitly transforming the data
D. To randomly guess the support vectors to speed up training

14 Which of the following is a widely used kernel function in SVMs for handling non-linear data?

Kernel trick and kernel functions Easy
A. Radial Basis Function (RBF) kernel
B. Mean Squared Error (MSE) kernel
C. Stochastic Gradient Descent (SGD) kernel
D. Cross-Entropy kernel

15 Using a linear kernel in an SVM is equivalent to...

Kernel trick and kernel functions Easy
A. Always misclassifying half the data
B. Using a very complex RBF kernel
C. Applying no non-linear transformation and finding a linear separator in the original feature space
D. Projecting the data into an infinite-dimensional space

16 When is it most appropriate to use a non-linear kernel like the Polynomial or RBF kernel?

Kernel trick and kernel functions Easy
A. When you have a very small number of features
B. When the decision boundary between the classes is likely non-linear
C. When the data is perfectly linearly separable
D. When you want the fastest possible training time

17 The task of training an SVM is fundamentally what type of mathematical problem?

Optimization perspective of SVM training Easy
A. A convex quadratic programming problem
B. A non-convex optimization problem
C. A linear programming problem
D. A system of linear equations

18 The objective function for a hard-margin SVM is to minimize . This is equivalent to maximizing what geometric quantity?

Optimization perspective of SVM training Easy
A. The angle between the support vectors
B. The distance to the origin
C. The margin, which is proportional to
D. The number of support vectors

19 In the dual formulation of SVM, the final decision function for a new data point depends on...

Primal and dual optimization problems Easy
A. The dot product of with only the support vectors
B. The average of all feature vectors
C. All the data points in the training set
D. Only the bias term

20 For a 2D dataset, a linear SVM's margin is visually represented by the region between two...

Geometric interpretation of classification margins Easy
A. Points
B. Concentric squares
C. Circles
D. Parallel lines

21 In a linearly separable dataset, if we scale all feature vectors by a factor of 2 (i.e., ), how does the maximal geometric margin of a hard-margin SVM change?

Geometric interpretation of classification margins Medium
A. It is halved.
B. It is squared.
C. It remains unchanged.
D. It is doubled.

22 In a soft-margin SVM, what is the effect of choosing a very large value for the hyperparameter ?

Hard margin and soft margin SVM Medium
A. It reduces the number of support vectors to zero.
B. It leads to a narrower margin and penalizes margin violations more heavily, behaving more like a hard-margin SVM.
C. It makes the decision boundary completely linear, regardless of the kernel used.
D. It leads to a wider margin and allows more margin violations.

23 What is the primary motivation for solving the dual optimization problem of an SVM instead of the primal problem?

Primal and dual optimization problems Medium
A. The primal problem is not a convex optimization problem, while the dual is.
B. The dual formulation allows the use of the kernel trick to handle non-linearly separable data.
C. The dual problem's objective function is simpler to differentiate.
D. The dual problem always has fewer constraints than the primal.

24 In the context of the SVM dual problem, the Karush-Kuhn-Tucker (KKT) conditions imply that for a data point that is NOT a support vector, its corresponding Lagrange multiplier must be:

Lagrangian formulation Medium
A.
B.
C.
D.

25 Consider a polynomial kernel . What does the parameter control?

Kernel trick and kernel functions Medium
A. The width of the margin.
B. The radial influence of a single training example.
C. The degree of the polynomial in the higher-dimensional feature space, influencing the complexity of the decision boundary.
D. The penalty for misclassification.

26 The primal optimization problem for a hard-margin SVM is to minimize subject to . This type of problem is best classified as:

Optimization perspective of SVM training Medium
A. Linear Programming (LP)
B. Integer Programming (IP)
C. Unconstrained Optimization
D. Quadratic Programming (QP)

27 Which of the following statements correctly describes the support vectors in a hard-margin linear SVM?

Geometric interpretation of classification margins Medium
A. They are the data points that lie exactly on the margin boundaries.
B. They are the data points that are misclassified by the hyperplane.
C. They are all the data points in the training set.
D. They are the data points furthest away from the decision boundary.

28 In a soft-margin SVM, a data point has a corresponding slack variable . What can you conclude about this point?

Hard margin and soft margin SVM Medium
A. The point lies on the correct side of the hyperplane but inside the margin.
B. The point is misclassified (on the wrong side of the hyperplane).
C. The point lies exactly on the decision boundary.
D. The point is correctly classified and outside the margin.

29 The objective function of the SVM dual problem is . What do the variables represent?

Lagrangian formulation Medium
A. The bias term of the hyperplane.
B. The slack variables for each data point.
C. The components of the weight vector .
D. The Lagrange multipliers associated with the margin constraints.

30 The Radial Basis Function (RBF) kernel is given by . What is the effect of a very small value?

Kernel trick and kernel functions Medium
A. It forces all data points to become support vectors.
B. It makes the decision boundary smoother and less complex, behaving like a linear classifier.
C. It creates a very complex, high-variance decision boundary that overfits the data.
D. It has no effect on the model's complexity.

31 In the SVM dual formulation, the weight vector can be expressed as a linear combination of which data points?

Primal and dual optimization problems Medium
A. Only the data points that are misclassified.
B. Only the data points that are not support vectors.
C. Only the support vectors.
D. All data points in the training set.

32 You train a soft-margin SVM and find that the optimal solution has many support vectors with . What does this imply about your choice of ?

Hard margin and soft margin SVM Medium
A. The choice of is irrelevant to the number of support vectors.
B. The value of is appropriately chosen for this dataset.
C. might be too small, allowing for a soft margin that misclassifies or violates the margin for many points.
D. is too large, causing the model to overfit.

33 Maximizing the geometric margin in a hard-margin SVM is equivalent to minimizing which of the following expressions?

Geometric interpretation of classification margins Medium
A.
B.
C.
D.

34 Which of the following is NOT a valid Mercer kernel (i.e., cannot be used as a kernel function in an SVM)?

Kernel trick and kernel functions Medium
A.
B. for some values of c
C.
D.

35 If an SVM is trained on data points with features, and , which formulation is generally more computationally efficient to solve?

Primal and dual optimization problems Medium
A. The dual problem.
B. Neither can be solved efficiently in this case.
C. The primal problem.
D. Both have the same computational complexity.

36 The decision function for a kernel SVM is given by . Why is this function efficient to evaluate for a new point even in a very high-dimensional feature space?

Optimization perspective of SVM training Medium
A. Because the number of support vectors (SV) is typically much smaller than the total number of training points.
B. Because the kernel function simplifies to a linear operation.
C. Because the Lagrange multipliers are always equal to 1.
D. Because the bias term is always zero.

37 In the soft-margin SVM Lagrangian, the term is added to the objective function. What is the role of this term?

Lagrangian formulation Medium
A. It forces the weight vector to have a smaller magnitude.
B. It normalizes the input features.
C. It acts as a penalty term to minimize the sum of slack variables, thereby reducing classification errors and margin violations.
D. It ensures the margin is as wide as possible.

38 You are working with text data represented by high-dimensional but sparse TF-IDF vectors. Which kernel is often a good starting choice for an SVM classifier in this scenario?

Kernel trick and kernel functions Medium
A. Sigmoid kernel
B. Radial Basis Function (RBF) kernel
C. Linear kernel
D. Polynomial kernel of a high degree

39 For a non-linearly separable dataset, which of the following statements is true?

Hard margin and soft margin SVM Medium
A. A hard-margin SVM will find a solution by ignoring the outliers.
B. A hard-margin SVM will find the optimal non-linear boundary.
C. A hard-margin SVM has no feasible solution.
D. A soft-margin SVM will perform identically to a hard-margin SVM.

40 The strong duality principle holds for the SVM optimization problem. What does this imply?

Primal and dual optimization problems Medium
A. The optimal value of the primal objective function is equal to the optimal value of the dual objective function.
B. The number of primal variables is equal to the number of dual variables.
C. The dual problem is always easier to solve than the primal problem.
D. The solution to the primal problem is always zero.

41 Consider a hard-margin SVM trained on a linearly separable dataset. If every feature vector is transformed to , where is a non-singular diagonal matrix with diagonal entries and not all are equal, how does this non-uniform scaling affect the geometric margin and the set of support vectors?

Geometric interpretation of classification margins Hard
A. The geometric margin will change, but the set of support vectors will remain unchanged.
B. The geometric margin will remain unchanged, but the set of support vectors may change.
C. Both the geometric margin and the set of support vectors are guaranteed to remain unchanged.
D. The geometric margin will change, and the set of support vectors may also change.

42 In a soft-margin SVM, the objective is to minimize . What is the precise consequence of setting the hyperparameter to a very large value (i.e., ) for a dataset that is not linearly separable?

Hard margin and soft margin SVM Hard
A. The optimization will result in a weight vector that is close to zero.
B. The model converges to the hard-margin SVM solution.
C. The optimization problem becomes infeasible.
D. The decision boundary will have a very small margin and will be highly sensitive to individual data points.

43 The dual formulation of the SVM is often preferred over the primal. Under which scenario is solving the primal problem using methods like stochastic gradient descent on the hinge loss formulation computationally more advantageous than solving the dual?

Primal and dual optimization problems Hard
A. When the number of features () is much larger than the number of training samples (), and a complex non-linear kernel is used.
B. When the number of training samples () is much larger than the number of features (), and a linear kernel is used.
C. The dual is always computationally superior to the primal when using a kernel.
D. When the Gram matrix is sparse.

44 Let and be two valid Mercer kernels. Which of the following operations is NOT guaranteed to produce a valid Mercer kernel?

Kernel trick and kernel functions Hard
A.
B.
C. for a constant
D. , where is a polynomial with non-negative coefficients.

45 In the dual formulation of a soft-margin SVM, consider the Karush-Kuhn-Tucker (KKT) conditions. If for a particular data point , its corresponding Lagrange multiplier is found to be exactly equal to the hyperparameter (i.e., ), what can be definitively concluded about this point?

Lagrangian formulation Hard
A. The point is a support vector that is either inside the margin or is misclassified, with slack variable .
B. The point is not a support vector and is correctly classified.
C. The point is a support vector that lies exactly on the margin.
D. The point is an outlier that has been ignored by the model.

46 The SVM optimization problem is a Quadratic Programming (QP) problem. What is the primary implication of this for the uniqueness of the solution?

Optimization perspective of SVM training Hard
A. The uniqueness of the solution depends entirely on the choice of the QP solver.
B. The solution is never unique, as multiple hyperplanes can achieve the same margin.
C. If a solution exists, the value of the objective function (the margin) is unique, and if the objective is strictly convex, the optimal weight vector is also unique.
D. A unique solution for both the weight vector and bias is always guaranteed.

47 In a hard-margin linear SVM, the margin is given by . How does the dimensionality of the feature space theoretically affect the maximum possible margin for a given dataset of points?

Geometric interpretation of classification margins Hard
A. Higher dimensionality always decreases the maximum possible margin due to the curse of dimensionality.
B. The margin is only dependent on the number of support vectors, not the dimensionality.
C. The dimensionality of the feature space has no theoretical relationship with the maximum possible margin.
D. Higher dimensionality generally allows for a larger maximum margin, as it provides more degrees of freedom to find a separating hyperplane.

48 A soft-margin SVM with a non-linear kernel is trained on a dataset. If you remove a data point which is correctly classified and lies strictly outside the margin (i.e., ), what is the most likely outcome upon retraining the SVM with the same hyperparameters?

Hard margin and soft margin SVM Hard
A. The model will now overfit the remaining data.
B. The decision boundary will remain exactly the same.
C. The decision boundary will change significantly.
D. The margin will decrease.

49 What is the primary reason that the kernel trick can be applied to the dual formulation of the SVM but not directly to the primal formulation?

Primal and dual optimization problems Hard
A. The dual objective function and the decision rule depend on the data only through dot products of feature vectors, whereas the primal depends on the feature vectors themselves.
B. The primal formulation does not involve a bias term .
C. The primal problem is non-convex, while the dual is convex.
D. The dual problem has fewer constraints than the primal problem.

50 Consider the Radial Basis Function (RBF) kernel with parameter . What happens to the decision boundary of an SVM as ?

Kernel trick and kernel functions Hard
A. The SVM fails to find any support vectors.
B. The decision boundary approaches a linear hyperplane.
C. The decision boundary becomes highly complex and overfits the data.
D. The influence of each support vector becomes extremely localized.

51 The Lagrangian for the hard-margin SVM primal problem is . What is the interpretation of the stationarity condition ?

Lagrangian formulation Hard
A. It determines the value of the optimal bias term .
B. It shows that the optimal weight vector must be a linear combination of the feature vectors of the support vectors.
C. It establishes the constraint in the dual problem.
D. It proves that the optimization problem is convex.

52 Consider the unconstrained hinge loss formulation of a linear SVM: . How does the solution change if the regularization term is changed from the L2-norm squared () to the L1-norm ()?

Hard margin and soft margin SVM Hard
A. The problem becomes non-convex and difficult to solve.
B. The margin is no longer maximized, and the model focuses only on minimizing classification errors.
C. The solution remains identical, as both are convex regularizers.
D. The optimization problem is no longer a QP problem, and the resulting weight vector is likely to be sparse (have many zero components).

53 Mercer's theorem provides the conditions for a function to be a valid kernel. It states that must be a continuous, symmetric function such that the matrix is positive semi-definite for any finite set of points . What does positive semi-definite imply in this context?

Kernel trick and kernel functions Hard
A. The kernel function corresponds to a dot product in a finite-dimensional space.
B. For any non-zero vector , the quadratic form .
C. The determinant of the Gram matrix must be strictly positive.
D. All entries of the Gram matrix must be non-negative.

54 If you add a new data point to a perfectly linearly separable dataset, under which condition is the hard-margin SVM decision boundary guaranteed to NOT change?

Geometric interpretation of classification margins Hard
A. If the new point is correctly classified and lies outside the existing margin.
B. If the new point is from the positive class.
C. If the new point lies exactly on the decision boundary.
D. The decision boundary will always change when a new data point is added.

55 In the context of SVMs, what is the 'duality gap' and what does it mean if it is zero?

Primal and dual optimization problems Hard
A. It is the difference between the primal objective value and the dual objective value. A zero gap (strong duality) means the optimal solutions to both problems are equivalent.
B. It is the difference in performance between a linear SVM and a kernelized SVM.
C. It is the number of misclassified points in a soft-margin SVM.
D. It is the geometric distance between the two margin boundaries.

56 In the soft-margin SVM, what is the role of the Lagrange multipliers associated with the constraints ?

Lagrangian formulation Hard
A. They enforce the relationship , which leads to the box constraint in the dual.
B. They are the weights of the support vectors in the final decision function.
C. They directly measure the geometric margin of the classifier.
D. They are hyperparameters that need to be tuned using cross-validation.

57 The Sequential Minimal Optimization (SMO) algorithm iteratively picks pairs of Lagrange multipliers to optimize. What is a common heuristic for choosing the first multiplier, ?

Optimization perspective of SVM training Hard
A. Choose the with the largest current value.
B. Choose an at random.
C. Choose an corresponding to the point furthest from the current decision boundary.
D. Choose an corresponding to a data point that most violates the KKT conditions.

58 Which of the following functions where is NOT a valid Mercer kernel?

Kernel trick and kernel functions Hard
A.
B. for integer
C.
D. for non-negative vectors

59 Consider training a soft-margin linear SVM. If the dataset is perfectly linearly separable with a large margin, and you choose a very small value for the hyperparameter (e.g., ), what is the likely outcome?

Hard margin and soft margin SVM Hard
A. All data points will become support vectors.
B. The model will be identical to the hard-margin SVM because the data is separable.
C. The optimization will fail because is too small.
D. The model may produce a decision boundary with a wider margin than the hard-margin SVM, potentially misclassifying some points even though a perfect separation is possible.

60 After solving the dual SVM problem and obtaining the optimal Lagrange multipliers , the weight vector is constructed as . What happens to the norm of this weight vector, , as the regularization parameter increases?

Primal and dual optimization problems Hard
A. It oscillates unpredictably.
B. It generally increases or stays the same.
C. It is independent of .
D. It generally decreases.