1 $What is the primary geometric goal of a Support Vector Machine (SVM) for classification?$

Geometric interpretation of classification margins Easy

A.

To find the line that passes through the most data points

B.

To minimize the number of support vectors

C.

To maximize the margin between classes

D.

To connect all data points of the same class

2 $In the context of SVM, what are "support vectors"?$

Geometric interpretation of classification margins Easy

A.

The data points that lie on or closest to the margin boundaries

B.

The data points that are misclassified

C.

The data points that are furthest from the decision boundary

D.

All the data points in the training set

3 $What is the decision boundary created by a linear SVM called?$

Geometric interpretation of classification margins Easy

A.

A centroid

B.

A decision tree

C.

A regression line

D.

A hyperplane

4 $A Hard Margin SVM is suitable only when the training data is...$

Hard margin and soft margin SVM Easy

A.

Clustered into a single group

B.

Very large

C.

Not linearly separable

D.

Perfectly linearly separable

5 $What is the main advantage of a Soft Margin SVM over a Hard Margin SVM?$

Hard margin and soft margin SVM Easy

A.

It can handle data that is not linearly separable and is more robust to outliers

B.

It always finds a wider margin

C.

It is computationally faster

D.

It only works for non-linear data

6 $In a Soft Margin SVM, what is the role of the slack variable ?$

Hard margin and soft margin SVM Easy

A.

It defines the width of the margin

B.

It is the weight vector of the hyperplane

C.

It is a random noise parameter

D.

It measures how much a data point violates the margin

7 $What does the hyperparameter control in a Soft Margin SVM?$

Hard margin and soft margin SVM Easy

A.

The trade-off between maximizing the margin and minimizing classification errors

B.

The type of kernel to be used

C.

The number of dimensions in the feature space

D.

The learning rate of the optimizer

8 $What is the main purpose of using the Lagrangian formulation in the context of SVM optimization?$

Lagrangian formulation Easy

A.

To visualize the data in 2D

B.

To increase the number of features

C.

To select the best kernel function automatically

D.

To convert a constrained optimization problem into a form that is easier to solve

9 $In the Lagrangian formulation of SVM, what are the variables called?$

Lagrangian formulation Easy

A.

Slack variables

B.

Lagrange multipliers

C.

Weight vectors

D.

Bias terms

10 $According to the Karush-Kuhn-Tucker (KKT) conditions for SVM, if a data point is NOT a support vector, its corresponding Lagrange multiplier will be:$

Lagrangian formulation Easy

A.

B.

C.

D.

11 $The primal optimization problem for a hard-margin SVM aims to minimize which quantity?$

Primal and dual optimization problems Easy

A.

The sum of the distances from the margin

B.

The norm of the weight vector,

C.

The bias term,

D.

The number of misclassified points

12 $A primary motivation for solving the dual problem instead of the primal problem in SVMs is that it enables the use of:$

Primal and dual optimization problems Easy

A.

The kernel trick

B.

Gradient descent

C.

Regularization

D.

Feature scaling

13 $What is the fundamental idea behind the "kernel trick"?$

Kernel trick and kernel functions Easy

A.

To randomly guess the support vectors to speed up training

B.

To compute dot products in a high-dimensional feature space without explicitly transforming the data

C.

To convert a classification problem into a regression problem

D.

To reduce the dimensionality of the data before classification

14 $Which of the following is a widely used kernel function in SVMs for handling non-linear data?$

Kernel trick and kernel functions Easy

A.

Stochastic Gradient Descent (SGD) kernel

B.

Mean Squared Error (MSE) kernel

C.

Cross-Entropy kernel

D.

Radial Basis Function (RBF) kernel

15 $Using a linear kernel in an SVM is equivalent to...$

Kernel trick and kernel functions Easy

A.

Applying no non-linear transformation and finding a linear separator in the original feature space

B.

Always misclassifying half the data

C.

Using a very complex RBF kernel

D.

Projecting the data into an infinite-dimensional space

16 $When is it most appropriate to use a non-linear kernel like the Polynomial or RBF kernel?$

Kernel trick and kernel functions Easy

A.

When you have a very small number of features

B.

When the data is perfectly linearly separable

C.

When the decision boundary between the classes is likely non-linear

D.

When you want the fastest possible training time

17 $The task of training an SVM is fundamentally what type of mathematical problem?$

Optimization perspective of SVM training Easy

A.

A convex quadratic programming problem

B.

A system of linear equations

C.

A linear programming problem

D.

A non-convex optimization problem

18 $The objective function for a hard-margin SVM is to minimize . This is equivalent to maximizing what geometric quantity?$

Optimization perspective of SVM training Easy

A.

The margin, which is proportional to

B.

The number of support vectors

C.

The angle between the support vectors

D.

The distance to the origin

19 $In the dual formulation of SVM, the final decision function for a new data point depends on...$

Primal and dual optimization problems Easy

A.

The dot product of with only the support vectors

B.

Only the bias term

C.

All the data points in the training set

D.

The average of all feature vectors

20 $For a 2D dataset, a linear SVM's margin is visually represented by the region between two...$

Geometric interpretation of classification margins Easy

A.

Concentric squares

B.

Points

C.

Parallel lines

D.

Circles

21 $In a linearly separable dataset, if we scale all feature vectors by a factor of 2 (i.e.,), how does the maximal geometric margin of a hard-margin SVM change?$

Geometric interpretation of classification margins Medium

A.

It is squared.

B.

It is halved.

C.

It remains unchanged.

D.

It is doubled.

22 $In a soft-margin SVM, what is the effect of choosing a very large value for the hyperparameter ?$

Hard margin and soft margin SVM Medium

A.

It leads to a wider margin and allows more margin violations.

B.

It leads to a narrower margin and penalizes margin violations more heavily, behaving more like a hard-margin SVM.

C.

It makes the decision boundary completely linear, regardless of the kernel used.

D.

It reduces the number of support vectors to zero.

23 $What is the primary motivation for solving the dual optimization problem of an SVM instead of the primal problem?$

Primal and dual optimization problems Medium

A.

The dual problem always has fewer constraints than the primal.

B.

The dual formulation allows the use of the kernel trick to handle non-linearly separable data.

C.

The dual problem's objective function is simpler to differentiate.

D.

The primal problem is not a convex optimization problem, while the dual is.

24 $In the context of the SVM dual problem, the Karush-Kuhn-Tucker (KKT) conditions imply that for a data point that is NOT a support vector, its corresponding Lagrange multiplier must be:$

Lagrangian formulation Medium

A.

B.

C.

D.

25 $Consider a polynomial kernel . What does the parameter control?$

Kernel trick and kernel functions Medium

A.

The degree of the polynomial in the higher-dimensional feature space, influencing the complexity of the decision boundary.

B.

The radial influence of a single training example.

C.

The penalty for misclassification.

D.

The width of the margin.

26 $The primal optimization problem for a hard-margin SVM is to minimize subject to . This type of problem is best classified as:$

Optimization perspective of SVM training Medium

A.

Integer Programming (IP)

B.

Quadratic Programming (QP)

C.

Linear Programming (LP)

D.

Unconstrained Optimization

27 $Which of the following statements correctly describes the support vectors in a hard-margin linear SVM?$

Geometric interpretation of classification margins Medium

A.

They are the data points that lie exactly on the margin boundaries.

B.

They are the data points furthest away from the decision boundary.

C.

They are the data points that are misclassified by the hyperplane.

D.

They are all the data points in the training set.

28 $In a soft-margin SVM, a data point has a corresponding slack variable . What can you conclude about this point?$

Hard margin and soft margin SVM Medium

A.

The point lies on the correct side of the hyperplane but inside the margin.

B.

The point is correctly classified and outside the margin.

C.

The point lies exactly on the decision boundary.

D.

The point is misclassified (on the wrong side of the hyperplane).

29 $The objective function of the SVM dual problem is . What do the variables represent?$

Lagrangian formulation Medium

A.

The components of the weight vector .

B.

The bias term of the hyperplane.

C.

The Lagrange multipliers associated with the margin constraints.

D.

The slack variables for each data point.

30 $The Radial Basis Function (RBF) kernel is given by . What is the effect of a very small value?$

Kernel trick and kernel functions Medium

A.

It creates a very complex, high-variance decision boundary that overfits the data.

B.

It makes the decision boundary smoother and less complex, behaving like a linear classifier.

C.

It forces all data points to become support vectors.

D.

It has no effect on the model's complexity.

31 $In the SVM dual formulation, the weight vector can be expressed as a linear combination of which data points?$

Primal and dual optimization problems Medium

A.

Only the data points that are not support vectors.

B.

Only the support vectors.

C.

Only the data points that are misclassified.

D.

All data points in the training set.

32 $You train a soft-margin SVM and find that the optimal solution has many support vectors with . What does this imply about your choice of ?$

Hard margin and soft margin SVM Medium

A.

The choice of is irrelevant to the number of support vectors.

B.

is too large, causing the model to overfit.

C.

The value of is appropriately chosen for this dataset.

D.

might be too small, allowing for a soft margin that misclassifies or violates the margin for many points.

33 $Maximizing the geometric margin in a hard-margin SVM is equivalent to minimizing which of the following expressions?$

Geometric interpretation of classification margins Medium

A.

B.

C.

D.

34 $Which of the following is NOT a valid Mercer kernel (i.e., cannot be used as a kernel function in an SVM)?$

Kernel trick and kernel functions Medium

A.

B.

for some values of c

C.

D.

35 $If an SVM is trained on data points with features, and, which formulation is generally more computationally efficient to solve?$

Primal and dual optimization problems Medium

A.

The primal problem.

B.

The dual problem.

C.

Both have the same computational complexity.

D.

Neither can be solved efficiently in this case.

36 $The decision function for a kernel SVM is given by . Why is this function efficient to evaluate for a new point even in a very high-dimensional feature space?$

Optimization perspective of SVM training Medium

A.

Because the kernel function simplifies to a linear operation.

B.

Because the Lagrange multipliers are always equal to 1.

C.

Because the bias term is always zero.

D.

Because the number of support vectors (SV) is typically much smaller than the total number of training points.

37 $In the soft-margin SVM Lagrangian, the term is added to the objective function. What is the role of this term?$

Lagrangian formulation Medium

A.

It acts as a penalty term to minimize the sum of slack variables, thereby reducing classification errors and margin violations.

B.

It forces the weight vector to have a smaller magnitude.

C.

It normalizes the input features.

D.

It ensures the margin is as wide as possible.

38 $You are working with text data represented by high-dimensional but sparse TF-IDF vectors. Which kernel is often a good starting choice for an SVM classifier in this scenario?$

Kernel trick and kernel functions Medium

A.

Sigmoid kernel

B.

Linear kernel

C.

Radial Basis Function (RBF) kernel

D.

Polynomial kernel of a high degree

39 $For a non-linearly separable dataset, which of the following statements is true?$

Hard margin and soft margin SVM Medium

A.

A hard-margin SVM will find a solution by ignoring the outliers.

B.

A hard-margin SVM will find the optimal non-linear boundary.

C.

A hard-margin SVM has no feasible solution.

D.

A soft-margin SVM will perform identically to a hard-margin SVM.

40 $The strong duality principle holds for the SVM optimization problem. What does this imply?$

Primal and dual optimization problems Medium

A.

The dual problem is always easier to solve than the primal problem.

B.

The optimal value of the primal objective function is equal to the optimal value of the dual objective function.

C.

The number of primal variables is equal to the number of dual variables.

D.

The solution to the primal problem is always zero.

41 $Consider a hard-margin SVM trained on a linearly separable dataset. If every feature vector is transformed to, where is a non-singular diagonal matrix with diagonal entries and not all are equal, how does this non-uniform scaling affect the geometric margin and the set of support vectors?$

Geometric interpretation of classification margins Hard

A.

The geometric margin will remain unchanged, but the set of support vectors may change.

B.

The geometric margin will change, and the set of support vectors may also change.

C.

Both the geometric margin and the set of support vectors are guaranteed to remain unchanged.

D.

The geometric margin will change, but the set of support vectors will remain unchanged.

42 $In a soft-margin SVM, the objective is to minimize . What is the precise consequence of setting the hyperparameter to a very large value (i.e.,) for a dataset that is not linearly separable?$

Hard margin and soft margin SVM Hard

A.

The optimization will result in a weight vector that is close to zero.

B.

The model converges to the hard-margin SVM solution.

C.

The optimization problem becomes infeasible.

D.

The decision boundary will have a very small margin and will be highly sensitive to individual data points.

43 $The dual formulation of the SVM is often preferred over the primal. Under which scenario is solving the primal problem using methods like stochastic gradient descent on the hinge loss formulation computationally more advantageous than solving the dual?$

Primal and dual optimization problems Hard

A.

When the number of features () is much larger than the number of training samples (), and a complex non-linear kernel is used.

B.

When the number of training samples () is much larger than the number of features (), and a linear kernel is used.

C.

The dual is always computationally superior to the primal when using a kernel.

D.

When the Gram matrix is sparse.

44 $Let and be two valid Mercer kernels. Which of the following operations is NOT guaranteed to produce a valid Mercer kernel?$

Kernel trick and kernel functions Hard

A.

B.

C.

for a constant

D.

, where is a polynomial with non-negative coefficients.

45 $In the dual formulation of a soft-margin SVM, consider the Karush-Kuhn-Tucker (KKT) conditions. If for a particular data point, its corresponding Lagrange multiplier is found to be exactly equal to the hyperparameter (i.e.,), what can be definitively concluded about this point?$

Lagrangian formulation Hard

A.

The point is not a support vector and is correctly classified.

B.

The point is a support vector that lies exactly on the margin.

C.

The point is a support vector that is either inside the margin or is misclassified, with slack variable .

D.

The point is an outlier that has been ignored by the model.

46 $The SVM optimization problem is a Quadratic Programming (QP) problem. What is the primary implication of this for the uniqueness of the solution?$

Optimization perspective of SVM training Hard

A.

If a solution exists, the value of the objective function (the margin) is unique, and if the objective is strictly convex, the optimal weight vector is also unique.

B.

The uniqueness of the solution depends entirely on the choice of the QP solver.

C.

A unique solution for both the weight vector and bias is always guaranteed.

D.

The solution is never unique, as multiple hyperplanes can achieve the same margin.

47 $In a hard-margin linear SVM, the margin is given by . How does the dimensionality of the feature space theoretically affect the maximum possible margin for a given dataset of points?$

Geometric interpretation of classification margins Hard

A.

Higher dimensionality always decreases the maximum possible margin due to the curse of dimensionality.

B.

Higher dimensionality generally allows for a larger maximum margin, as it provides more degrees of freedom to find a separating hyperplane.

C.

The dimensionality of the feature space has no theoretical relationship with the maximum possible margin.

D.

The margin is only dependent on the number of support vectors, not the dimensionality.

48 $A soft-margin SVM with a non-linear kernel is trained on a dataset. If you remove a data point which is correctly classified and lies strictly outside the margin (i.e.,), what is the most likely outcome upon retraining the SVM with the same hyperparameters?$

Hard margin and soft margin SVM Hard

A.

The decision boundary will change significantly.

B.

The decision boundary will remain exactly the same.

C.

The margin will decrease.

D.

The model will now overfit the remaining data.

49 $What is the primary reason that the kernel trick can be applied to the dual formulation of the SVM but not directly to the primal formulation?$

Primal and dual optimization problems Hard

A.

The dual problem has fewer constraints than the primal problem.

B.

The dual objective function and the decision rule depend on the data only through dot products of feature vectors, whereas the primal depends on the feature vectors themselves.

C.

The primal problem is non-convex, while the dual is convex.

D.

The primal formulation does not involve a bias term .

50 $Consider the Radial Basis Function (RBF) kernel with parameter . What happens to the decision boundary of an SVM as ?$

Kernel trick and kernel functions Hard

A.

The decision boundary approaches a linear hyperplane.

B.

The influence of each support vector becomes extremely localized.

C.

The decision boundary becomes highly complex and overfits the data.

D.

The SVM fails to find any support vectors.

51 $The Lagrangian for the hard-margin SVM primal problem is . What is the interpretation of the stationarity condition ?$

Lagrangian formulation Hard

A.

It shows that the optimal weight vector must be a linear combination of the feature vectors of the support vectors.

B.

It determines the value of the optimal bias term .

C.

It proves that the optimization problem is convex.

D.

It establishes the constraint in the dual problem.

52 $Consider the unconstrained hinge loss formulation of a linear SVM: . How does the solution change if the regularization term is changed from the L2-norm squared () to the L1-norm ()?$

Hard margin and soft margin SVM Hard

A.

The problem becomes non-convex and difficult to solve.

B.

The optimization problem is no longer a QP problem, and the resulting weight vector is likely to be sparse (have many zero components).

C.

The solution remains identical, as both are convex regularizers.

D.

The margin is no longer maximized, and the model focuses only on minimizing classification errors.

53 $Mercer's theorem provides the conditions for a function to be a valid kernel. It states that must be a continuous, symmetric function such that the matrix is positive semi-definite for any finite set of points . What does positive semi-definite imply in this context?$

Kernel trick and kernel functions Hard

A.

For any non-zero vector, the quadratic form .

B.

The kernel function corresponds to a dot product in a finite-dimensional space.

C.

All entries of the Gram matrix must be non-negative.

D.

The determinant of the Gram matrix must be strictly positive.

54 $If you add a new data point to a perfectly linearly separable dataset, under which condition is the hard-margin SVM decision boundary guaranteed to NOT change?$

Geometric interpretation of classification margins Hard

A.

The decision boundary will always change when a new data point is added.

B.

If the new point lies exactly on the decision boundary.

C.

If the new point is correctly classified and lies outside the existing margin.

D.

If the new point is from the positive class.

55 $In the context of SVMs, what is the 'duality gap' and what does it mean if it is zero?$

Primal and dual optimization problems Hard

A.

It is the difference between the primal objective value and the dual objective value. A zero gap (strong duality) means the optimal solutions to both problems are equivalent.

B.

It is the difference in performance between a linear SVM and a kernelized SVM.

C.

It is the geometric distance between the two margin boundaries.

D.

It is the number of misclassified points in a soft-margin SVM.

56 $In the soft-margin SVM, what is the role of the Lagrange multipliers associated with the constraints ?$

Lagrangian formulation Hard

A.

They directly measure the geometric margin of the classifier.

B.

They enforce the relationship, which leads to the box constraint in the dual.

C.

They are the weights of the support vectors in the final decision function.

D.

They are hyperparameters that need to be tuned using cross-validation.

57 $The Sequential Minimal Optimization (SMO) algorithm iteratively picks pairs of Lagrange multipliers to optimize. What is a common heuristic for choosing the first multiplier, ?$

Optimization perspective of SVM training Hard

A.

Choose the with the largest current value.

B.

Choose an corresponding to the point furthest from the current decision boundary.

C.

Choose an at random.

D.

Choose an corresponding to a data point that most violates the KKT conditions.

58 $Which of the following functions where is NOT a valid Mercer kernel?$

Kernel trick and kernel functions Hard

A.

for non-negative vectors

B.

C.

D.

for integer

59 $Consider training a soft-margin linear SVM. If the dataset is perfectly linearly separable with a large margin, and you choose a very small value for the hyperparameter (e.g.,), what is the likely outcome?$

Hard margin and soft margin SVM Hard

A.

All data points will become support vectors.

B.

The optimization will fail because is too small.

C.

The model may produce a decision boundary with a wider margin than the hard-margin SVM, potentially misclassifying some points even though a perfect separation is possible.

D.

The model will be identical to the hard-margin SVM because the data is separable.

60 $After solving the dual SVM problem and obtaining the optimal Lagrange multipliers, the weight vector is constructed as . What happens to the norm of this weight vector,, as the regularization parameter increases?$

Primal and dual optimization problems Hard

A.

It is independent of .

B.

It generally decreases.

C.

It oscillates unpredictably.

D.

It generally increases or stays the same.

Unit 5 - Practice Quiz