1In the context of a typical machine learning dataset, what does a single row of a data matrix usually represent?
Vectors, matrices, and tensors in machine learning
Easy
A.A single data point or sample
B.A single feature for all samples
C.The entire dataset
D.The model's hyperparameters
Correct Answer: A single data point or sample
Explanation:
Conventionally, in a data matrix, each row corresponds to a single observation or data point (e.g., a user, a house, an image), while the columns represent the features of that data point.
Incorrect! Try again.
2Which of these data types would be best represented by a 3rd-order tensor?
Vectors, matrices, and tensors in machine learning
Easy
A.A list of housing prices
B.A grayscale image (height, width)
C.A color image (height, width, color channels)
D.A user's age
Correct Answer: A color image (height, width, color channels)
Explanation:
A scalar (like age) is a 0th-order tensor, a vector (like prices) is a 1st-order tensor, and a matrix (like a grayscale image) is a 2nd-order tensor. A color image has three dimensions (height, width, color channels), making it a 3rd-order tensor.
Incorrect! Try again.
3What is the primary difference between a vector and a scalar?
Vectors, matrices, and tensors in machine learning
Easy
A.A vector has both magnitude and direction, while a scalar has only magnitude.
B.A vector has a single element, while a scalar has multiple elements.
C.A vector can only contain integers, while a scalar can be any number.
D.A vector is a matrix, while a scalar is a single number.
Correct Answer: A vector has both magnitude and direction, while a scalar has only magnitude.
Explanation:
A scalar is a single numerical value (e.g., temperature), representing only magnitude. A vector is an array of numbers that represents both magnitude and direction in a space.
Incorrect! Try again.
4If a matrix has dimensions , what does represent?
Vectors, matrices, and tensors in machine learning
Easy
A.The number of columns
B.The number of cells
C.The determinant of the matrix
D.The number of rows
Correct Answer: The number of rows
Explanation:
In the standard notation for matrix dimensions, , always represents the number of rows and represents the number of columns.
Incorrect! Try again.
5In machine learning, a vector of feature values for a single data point is often called a...
Vectors, matrices, and tensors in machine learning
Easy
A.Loss function
B.Weight matrix
C.Scalar tensor
D.Feature vector
Correct Answer: Feature vector
Explanation:
A feature vector is a vector whose elements represent the specific features of a single instance or data point. For example, for a house, the feature vector might be [size, number_of_bedrooms, age].
Incorrect! Try again.
6Which of the following is a necessary condition for a set of vectors to be considered a vector space?
Vector spaces and subspaces
Easy
A.It must contain at least 100 vectors.
B.All vectors in the space must have a norm of 1.
C.The space must be two-dimensional.
D.It must be closed under vector addition and scalar multiplication.
Correct Answer: It must be closed under vector addition and scalar multiplication.
Explanation:
A fundamental property of a vector space is closure. This means that if you add any two vectors from the space, the result is still in the space, and if you multiply any vector by a scalar, the result is also still in the space.
Incorrect! Try again.
7What is the zero vector in the vector space ?
Vector spaces and subspaces
Easy
A.[0, 0, 0]
B.0
C.[0]
D.[1, 1, 1]
Correct Answer: [0, 0, 0]
Explanation:
The zero vector is the additive identity in a vector space. In , which represents 3-dimensional space, the zero vector is a vector with three components, all of which are zero.
Incorrect! Try again.
8The number of vectors in a basis for a vector space is called the...?
Vector spaces and subspaces
Easy
A.Dimension of the space
B.Subspace of the space
C.Span of the space
D.Norm of the space
Correct Answer: Dimension of the space
Explanation:
The dimension of a vector space is defined as the number of linearly independent vectors required to span the entire space. This set of vectors is known as a basis.
Incorrect! Try again.
9Which of the following sets forms a subspace of ?
Vector spaces and subspaces
Easy
A.A line that does not pass through the origin
B.A line passing through the origin (0, 0)
C.A single point at (1, 1)
D.The first quadrant (where x >= 0 and y >= 0)
Correct Answer: A line passing through the origin (0, 0)
Explanation:
A subspace must contain the zero vector and be closed under addition and scalar multiplication. A line through the origin satisfies these conditions, whereas a line not through the origin does not contain the zero vector.
Incorrect! Try again.
10What does it mean for a set of vectors to 'span' a vector space?
Vector spaces and subspaces
Easy
A.The set has more vectors than the dimension of the space.
B.All vectors in the set are perpendicular to each other.
C.Any vector in the space can be written as a linear combination of the vectors in the set.
D.The set contains the zero vector.
Correct Answer: Any vector in the space can be written as a linear combination of the vectors in the set.
Explanation:
The span of a set of vectors is the set of all possible linear combinations of those vectors. If this span is equal to the entire vector space, the set is said to span the space.
Incorrect! Try again.
11What is the L2 norm (Euclidean norm) of the vector ?
Norms (L1, L2) and projections
Easy
A.7
B.1
C.25
D.5
Correct Answer: 5
Explanation:
The L2 norm is calculated as the square root of the sum of the squared components: .
Incorrect! Try again.
12Which norm is also known as the 'Manhattan distance' or 'taxicab norm'?
Norms (L1, L2) and projections
Easy
A.L2 Norm
B.L1 Norm
C.Infinity Norm
D.Frobenius Norm
Correct Answer: L1 Norm
Explanation:
The L1 norm calculates distance by summing the absolute differences of the components, which is analogous to moving along a rectangular grid, like the streets of Manhattan.
Incorrect! Try again.
13What is the L1 norm of the vector ?
Norms (L1, L2) and projections
Easy
A.4
B.14
C.0
D.6
Correct Answer: 6
Explanation:
The L1 norm is the sum of the absolute values of the components: .
Incorrect! Try again.
14In machine learning, using the L1 norm in regularization (like in LASSO regression) often leads to what kind of models?
Norms (L1, L2) and projections
Easy
A.Complex non-linear models
B.Models with very large weights
C.Dense models (no zero weights)
D.Sparse models (many zero weights)
Correct Answer: Sparse models (many zero weights)
Explanation:
L1 regularization adds a penalty equal to the L1 norm of the weights. This has the effect of encouraging some model weights to become exactly zero, leading to a 'sparse' model that performs feature selection.
Incorrect! Try again.
15Geometrically, the L2 norm of a vector represents its...
Norms (L1, L2) and projections
Easy
A.Sum of its components
B.Angle with the x-axis
C.Length from the origin in Euclidean space
D.Projection onto the y-axis
Correct Answer: Length from the origin in Euclidean space
Explanation:
The L2 norm corresponds to our intuitive understanding of length in a straight line from the origin to the point defined by the vector's coordinates.
Incorrect! Try again.
16Multiplying a vector by a matrix is an example of a...
Linear operators and transformations in ML
Easy
A.Non-linear transformation
B.Linear transformation
C.Scalar multiplication
D.Vector normalization
Correct Answer: Linear transformation
Explanation:
Matrix multiplication is the standard way to represent a linear transformation. It maps a vector from one vector space to another while preserving the properties of linearity (additivity and homogeneity).
Incorrect! Try again.
17Which of the following matrices represents a scaling transformation that doubles the size of a 2D vector along both the x and y axes?
Linear operators and transformations in ML
Easy
A.
B.
C.
D.
Correct Answer:
Explanation:
A scaling matrix has the scaling factors on its main diagonal. To double the size along both axes, the scaling factor for both x (top-left) and y (bottom-right) should be 2.
Incorrect! Try again.
18What is the result of applying the identity matrix to any vector ?
Linear operators and transformations in ML
Easy
A.The vector itself
B.The zero vector
C.A vector with all components equal to 1
D.The vector is rotated by 90 degrees
Correct Answer: The vector itself
Explanation:
The identity matrix is the matrix equivalent of the number 1. Multiplying any vector by the identity matrix () results in the original vector unchanged.
Incorrect! Try again.
19A function is a linear transformation if it satisfies for any vectors and scalars . What property does this demonstrate?
Linear operators and transformations in ML
Easy
A.Superposition (Additivity and Homogeneity)
B.Invertibility
C.Orthogonality
D.Normalization
Correct Answer: Superposition (Additivity and Homogeneity)
Explanation:
This defining equation of linearity combines two properties: additivity () and homogeneity (). Together, this is known as the principle of superposition.
Incorrect! Try again.
20In the context of machine learning, the weights of a single layer in a neural network can often be represented by a:
Linear operators and transformations in ML
Easy
A.Scalar
B.Single vector
C.Matrix
D.Norm
Correct Answer: Matrix
Explanation:
The weights connecting one layer of neurons to the next can be organized into a matrix. Applying this layer to an input vector (of activations from the previous layer) is a linear transformation performed by multiplying the input vector by the weight matrix.
Incorrect! Try again.
21A feed-forward neural network layer processes a batch of 64 data points, where each data point is a vector of 128 features. The layer's weight matrix transforms this input to an output where each data point is a vector of 32 features. Assuming the transformation is computed as , what are the dimensions of the weight matrix ?
Vectors, matrices, and tensors in machine learning
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
The input matrix has dimensions (batch size input features), which is . The output matrix has dimensions (batch size output features), which is . For the matrix multiplication to be valid, the dimensions must align: . This requires and . Thus, the dimensions of are .
Incorrect! Try again.
22For a computer vision task, you have a dataset of 1,000 color images, each with a resolution of pixels. The images use the RGB color model. What is the most appropriate shape for a tensor representing this entire dataset, following the common 'channels-last' convention (batch, height, width, channels)?
Vectors, matrices, and tensors in machine learning
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
A tensor is a multi-dimensional array. For a batch of images, the standard convention is a 4D tensor. Following the 'channels-last' format, the dimensions are ordered as (batch size, image height, image width, number of color channels). Here, the batch size is 1000, height is 64, width is 64, and there are 3 channels (R, G, B). Therefore, the shape is .
Incorrect! Try again.
23Given two non-zero column vectors and , what is the rank of the matrix formed by their outer product, ?
Vectors, matrices, and tensors in machine learning
Medium
A.
B.1
C.0
D.
Correct Answer: 1
Explanation:
The outer product results in an matrix where every column is a scalar multiple of the vector . Since all columns are linearly dependent (they are all multiples of a single vector), the dimension of the column space is 1. Therefore, the rank of the matrix is 1.
Incorrect! Try again.
24The Hadamard product (element-wise product) is used in various ML algorithms, such as in the gates of an LSTM cell. Given matrices and , what is their Hadamard product ?
Vectors, matrices, and tensors in machine learning
Medium
A.
B.
C.
D.This operation is undefined
Correct Answer:
Explanation:
The Hadamard product is calculated by multiplying the corresponding elements of the two matrices. So, . The resulting matrix is .
Incorrect! Try again.
25In a linear regression model, the normal equation to find the optimal coefficients is . If you have a dataset with 500 samples () and 10 features (), what are the dimensions of the matrix ?
Vectors, matrices, and tensors in machine learning
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
The data matrix has dimensions , which is . Its transpose, , has dimensions , which is . The matrix multiplication therefore has dimensions , which results in a matrix.
Incorrect! Try again.
26Which of the following sets is a subspace of ?
Vector spaces and subspaces
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
A set is a subspace if it contains the zero vector, is closed under vector addition, and is closed under scalar multiplication. The set represents a plane passing through the origin, which satisfies all three conditions for being a subspace. The other options fail: A) does not contain the zero vector; C) is not closed under multiplication by negative scalars; D) is a single point that is not the origin.
Incorrect! Try again.
27Consider the vectors and in . What geometric object does the span of these two vectors, , represent?
Vector spaces and subspaces
Medium
A.A line through the origin
B.The origin point only
C.All of
D.A plane through the origin
Correct Answer: A line through the origin
Explanation:
The vector is a scalar multiple of since . This means the two vectors are linearly dependent and point along the same line. The span of a set of linearly dependent vectors is determined by its largest linearly independent subset. In this case, , which is the line passing through the origin in the direction of .
Incorrect! Try again.
28In machine learning, feature spaces are represented as vector spaces. Which of the following sets of vectors cannot form a basis for the vector space ?
Vector spaces and subspaces
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
A basis for a vector space must satisfy two conditions: the vectors must be linearly independent, and they must span the entire space. For , any basis must contain exactly 3 vectors. Option C contains only two vectors, so it cannot span the entire 3-dimensional space. Therefore, it cannot be a basis for .
Incorrect! Try again.
29In a linear regression model represented by the equation , the vector of predicted values, , is calculated as where is the estimated coefficient vector. The vector must belong to which fundamental subspace?
Vector spaces and subspaces
Medium
A.The row space of
B.The null space of
C.The column space of
D.The null space of
Correct Answer: The column space of
Explanation:
The predicted value vector is computed by multiplying the matrix by the vector . This operation is equivalent to taking a linear combination of the columns of , with the coefficients given by the elements of . By definition, the set of all possible linear combinations of the columns of is the column space of .
Incorrect! Try again.
30A linear transformation used for dimensionality reduction maps data from to and is represented by a matrix . If the dimension of the column space (rank) of is 3, what is the dimension of the null space (kernel) of this transformation?
Vector spaces and subspaces
Medium
A.1
B.6
C.4
D.7
Correct Answer: 7
Explanation:
The Rank-Nullity Theorem states that for a matrix with columns, the rank of plus the dimension of the null space of (nullity) equals . Here, the transformation is from , so the matrix has 10 columns (). Given that the rank is 3, we have: . Solving for the nullity gives a dimension of 7 for the null space.
Incorrect! Try again.
31Lasso regression uses L1 regularization. A key feature of Lasso is that it can produce sparse models, where some coefficients become exactly zero. What is the geometric reason for this?
Norms (L1, L2) and projections
Medium
A.The L1 norm constraint is a hypersphere, which is smooth.
B.The L1 norm constraint is a diamond or cross-polytope, whose sharp corners align with the axes, making intersections on an axis (where a coefficient is zero) more probable.
C.The L1 norm constraint is a hypercube, which tends to intersect error function level curves at the axes.
D.The L1 norm is always smaller than the L2 norm, forcing coefficients to zero.
Correct Answer: The L1 norm constraint is a diamond or cross-polytope, whose sharp corners align with the axes, making intersections on an axis (where a coefficient is zero) more probable.
Explanation:
L1 regularization corresponds to a constraint region shaped like a diamond (in 2D) or cross-polytope (in higher dimensions). This shape has sharp corners that lie on the axes. The optimal solution is found where the elliptical level curves of the error function first touch this constraint region. It is geometrically probable that this first contact happens at a corner, where one or more coefficients are zero.
Incorrect! Try again.
32A feature vector in a machine learning model is given by . What are its Manhattan distance (L1 norm) and Euclidean distance (L2 norm) from the origin?
Norms (L1, L2) and projections
Medium
A.L1 = 17, L2 = 13
B.L1 = 13, L2 = 17
C.L1 = 7, L2 = 169
D.L1 = 17, L2 =
Correct Answer: L1 = 17, L2 = 13
Explanation:
The L1 norm (Manhattan distance) is the sum of the absolute values of the components: . The L2 norm (Euclidean distance) is the square root of the sum of the squares of the components: .
Incorrect! Try again.
33In the Gram-Schmidt process, vector projection is a key step. What is the projection of vector onto vector ?
Norms (L1, L2) and projections
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
The formula for the projection of vector onto vector is . First, calculate the dot product: . Next, calculate the squared L2 norm of : . Finally, compute the projection: .
Incorrect! Try again.
34Let be the projection of the true data vector onto the column space of a feature matrix . In linear regression, the residual vector is . What is the relationship between the residual vector and the column space of ?
Norms (L1, L2) and projections
Medium
A. is orthogonal to the column space of
B. is parallel to
C. is always the zero vector
D. is in the column space of
Correct Answer: is orthogonal to the column space of
Explanation:
The projection of onto the column space of is the vector in that space closest to . The vector connecting to its closest point in the subspace, which is the residual , must be orthogonal to the subspace itself. Therefore, the residual vector is orthogonal to every vector in the column space of .
Incorrect! Try again.
35In neural networks, the Frobenius norm is often used for weight decay, a form of regularization. Calculate the Frobenius norm of the weight matrix .
Norms (L1, L2) and projections
Medium
A.8
B.
C.4
D.50
Correct Answer:
Explanation:
The Frobenius norm of a matrix is the square root of the sum of the squares of all its elements, analogous to the L2 norm for vectors. .
Incorrect! Try again.
36A 2D dataset is transformed by first reflecting it across the y-axis and then rotating it 90 degrees clockwise. What single matrix represents this combined linear transformation?
Linear operators and transformations in ML
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
We can find the matrix by tracking the transformation of the basis vectors and . First, reflected across the y-axis becomes . Then, rotating 90 degrees clockwise maps to , resulting in . Second, is unchanged by reflection across the y-axis. Then, rotating 90 degrees clockwise gives . The resulting columns are and , forming the matrix .
Incorrect! Try again.
37In Principal Component Analysis (PCA), we compute the eigenvectors of the data's covariance matrix. What is the significance of the eigenvalues associated with these eigenvectors?
Linear operators and transformations in ML
Medium
A.They are always equal to 1, indicating a change of basis.
B.They indicate the direction of maximum variance.
C.They measure the amount of variance in the data along the direction of the corresponding eigenvector.
D.They represent the new feature values for the transformed data.
Correct Answer: They measure the amount of variance in the data along the direction of the corresponding eigenvector.
Explanation:
In PCA, the eigenvectors of the covariance matrix represent the principal components (the new orthogonal axes of the data). The corresponding eigenvalue for each eigenvector indicates the amount of variance in the original data that is captured when projected onto that eigenvector. Larger eigenvalues correspond to principal components that explain more variance.
Incorrect! Try again.
38For a given linear transformation represented by matrix , a non-zero vector is called an eigenvector if . What is the effect of the transformation on its eigenvector ?
Linear operators and transformations in ML
Medium
A.It only scales the vector by the factor , without changing its direction.
B.It inverts the vector .
C.It projects the vector onto another space.
D.It rotates the vector by an angle determined by .
Correct Answer: It only scales the vector by the factor , without changing its direction.
Explanation:
The defining equation for an eigenvector, , states that when the matrix transforms the vector , the resulting vector is simply a scalar multiple () of the original vector . This means the transformation only stretches or shrinks the vector along its original direction (and may flip it if ), but it does not change the direction itself.
Incorrect! Try again.
39Singular Value Decomposition (SVD) factorizes a matrix into . It is widely used for dimensionality reduction. How is the best rank- approximation of (denoted ) constructed from its SVD components?
Linear operators and transformations in ML
Medium
A.By averaging the first singular values.
B.By taking the first columns of and the first rows of only.
C.By keeping only the first rows of , , and .
D.By setting all but the largest singular values in to zero and reconstructing the matrix.
Correct Answer: By setting all but the largest singular values in to zero and reconstructing the matrix.
Explanation:
The Eckart-Young-Mirsky theorem states that the best rank- approximation of a matrix is obtained from its SVD. We form a new diagonal matrix by keeping the largest singular values from and setting all others to zero. The approximation is then calculated as . This effectively captures the most significant components of the original matrix.
Incorrect! Try again.
40A 2D linear transformation is applied to a dataset, represented by the matrix . If this transformation is applied to a unit square (area = 1), what will be the area of the resulting parallelogram?
Linear operators and transformations in ML
Medium
A.5
B.4
C.3
D.2
Correct Answer: 3
Explanation:
The absolute value of the determinant of a transformation matrix gives the scaling factor for areas (in 2D). The determinant of matrix is calculated as: . This means the transformation scales the area of any shape by a factor of 3. Therefore, a unit square with an area of 1 will be transformed into a parallelogram with an area of .
Incorrect! Try again.
41In a Convolutional Neural Network (CNN) processing a batch of RGB images, a 4th-order tensor is used. is batch size, is height, is width, and is channels. What is the interpretation of the slice ?
Vectors, matrices, and tensors in machine learning
Hard
A.A single pixel at position (i, j) across all images in the batch.
B.The j-th color channel of the i-th image in the batch.
C.The entire batch of images for a single color channel j.
D.The i-th row of pixels across all images and all channels.
Correct Answer: The j-th color channel of the i-th image in the batch.
Explanation:
The tensor is indexed as (batch, height, width, channel). Fixing the batch index to i selects the i-th image. Fixing the channel index to j selects the j-th channel (e.g., Red, Green, or Blue). The colons : act as wildcards, selecting all elements along the height and width dimensions. This results in a 2D matrix representing a single color plane of a specific image.
Incorrect! Try again.
42Consider the outer product of two non-zero vectors and , resulting in a matrix . What is the rank of matrix , and what does this imply about its column space, ?
Vectors, matrices, and tensors in machine learning
Hard
A.Rank is 1. The column space is the line spanned by the vector .
B.Rank is . The column space is a subspace of .
C.Rank is 1. The column space is the line spanned by the vector .
D.Rank can be 0 or 1. If 1, the column space is spanned by and .
Correct Answer: Rank is 1. The column space is the line spanned by the vector .
Explanation:
Every column of the matrix is a scalar multiple of the vector . Specifically, the j-th column of is . Since all columns are multiples of a single non-zero vector , the column space is one-dimensional and is spanned by . Therefore, the rank of is 1.
Incorrect! Try again.
43Let and be two distinct 2-dimensional subspaces of . What are the possible dimensions of their intersection, ?
Vector spaces and subspaces
Hard
A.0 or 1
B.Exactly 2
C.Exactly 1
D.1 or 2
Correct Answer: Exactly 1
Explanation:
In , a 2D subspace is a plane passing through the origin. Since and are distinct planes through the origin, their intersection must be a line passing through the origin, which is a 1-dimensional subspace. Using the dimension theorem, . Since is a subspace of , . So, , which implies . Since , cannot be 2. Therefore, the dimension must be exactly 1.
Incorrect! Try again.
44A linear transformation is represented by a matrix with SVD . What is the geometric interpretation of applying this transformation to the set of all unit vectors (a unit sphere) in ?
Linear operators and transformations in ML
Hard
A.The result is a hyperellipse in whose principal axes are the columns of scaled by the singular values in .
B.The result is a rotated version of the unit sphere, defined by the rotation matrix .
C.The result is a unit sphere in .
D.The result is a hyperellipse in whose principal axes are the columns of scaled by the singular values in .
Correct Answer: The result is a hyperellipse in whose principal axes are the columns of scaled by the singular values in .
Explanation:
The SVD decomposes the transformation into three steps: 1) A rotation/reflection in the domain space (), which maps the unit sphere to itself. 2) A scaling along the new coordinate axes by the singular values (), which transforms the sphere into a hyperellipse. 3) A rotation/reflection in the codomain space (), which orients the hyperellipse in . The final shape is a hyperellipse in .
Incorrect! Try again.
45Given a vector and a subspace spanned by the orthonormal columns of a matrix (), the projection of onto is . What is the squared L2 norm of the residual vector, ?
Norms (L1, L2) and projections
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
The residual vector is orthogonal to the projection . By the Pythagorean theorem for vectors, . Therefore, . Let's compute : . Since the columns of are orthonormal, (the identity matrix). So, . Substituting back, we get .
Incorrect! Try again.
46The column space of a matrix , denoted , and the null space of its transpose, , are fundamental subspaces. What is the relationship between these two subspaces in ?
Vector spaces and subspaces
Hard
A.Their intersection is the zero vector, but they are not necessarily orthogonal complements.
B. is the orthogonal complement of .
C. and are the same subspace.
D. is a subset of .
Correct Answer: is the orthogonal complement of .
Explanation:
This is a key part of the Fundamental Theorem of Linear Algebra. The null space of , also called the left null space of , contains all vectors such that . This condition means is orthogonal to every row of , which are the columns of . Therefore, any vector in is orthogonal to every vector in the column space . Together, they span the entire space , making them orthogonal complements.
Incorrect! Try again.
47In Support Vector Machines (SVMs), the kernel function allows computations in a high-dimensional feature space. A condition for a function to be a valid kernel is that the Gram matrix (where ) must be positive semi-definite for any set of inputs. Why is this property crucial?
Linear operators and transformations in ML
Hard
A.It ensures that the mapping is linear.
B.It guarantees that the decision boundary will be a hyperplane.
C.It ensures the SVM dual optimization problem is convex, guaranteeing a unique global minimum.
D.It is required for the matrix to be invertible.
Correct Answer: It ensures the SVM dual optimization problem is convex, guaranteeing a unique global minimum.
Explanation:
The dual formulation of the SVM optimization problem involves minimizing a quadratic form , where the matrix is constructed from the Gram matrix . For this quadratic program to be convex, the matrix must be positive semi-definite. The structure of is such that it is positive semi-definite if and only if the kernel matrix is. Convexity is essential because it ensures that any local minimum found by an optimization algorithm is also the global minimum.
Incorrect! Try again.
48The optimization problem for Lasso regression is . If the matrix has orthonormal columns (i.e., ), what is the closed-form solution for the -th component of the optimal weight vector, ?
Norms (L1, L2) and projections
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
When has orthonormal columns, . The quadratic term simplifies: . The objective becomes minimizing . This decouples for each component . The minimum for each is found by the soft-thresholding operator, which is given by the formula , applied to each component of .
Incorrect! Try again.
49A weight matrix in a neural network is updated via a rank-one update: , where and is a scalar. If is invertible, under what condition is guaranteed to be invertible according to the Sherman-Morrison formula?
Vectors, matrices, and tensors in machine learning
Hard
A.
B. and must be linearly independent
C. must be symmetric positive definite
D. and are non-zero vectors
Correct Answer:
Explanation:
The Sherman-Morrison formula provides an expression for the inverse of a rank-one update of a matrix: . In our case, and the update is . The formula becomes . The inverse exists if and only if the denominator is non-zero, which is the condition .
Incorrect! Try again.
50A 2D rotation by an angle is a linear transformation represented by the matrix . What are the eigenvalues of this matrix for a general ?
Linear operators and transformations in ML
Hard
A. and
B. and
C. and
D. and
Correct Answer: and
Explanation:
The eigenvalues are the roots of the characteristic equation . This gives , which simplifies to , or . Using the quadratic formula, . By Euler's formula, these are and .
Incorrect! Try again.
51Consider the set . Is a subspace of , and why?
Vector spaces and subspaces
Hard
A.No, because it is not closed under scalar multiplication.
B.No, because it does not contain the zero vector.
C.Yes, because it contains the zero vector and is closed under addition and scalar multiplication.
D.No, because it is not closed under vector addition.
Correct Answer: No, because it is not closed under vector addition.
Explanation:
To be a subspace, must satisfy three axioms: contain the zero vector, be closed under addition, and be closed under scalar multiplication. contains since . It is also closed under scalar multiplication. However, it is not closed under addition. For example, let and . Both are in since and . Their sum is . For this to be in , we need , but . Thus, is not closed under addition and is not a subspace.
Incorrect! Try again.
52A projection matrix projects vectors onto a subspace . Which of the following statements about is necessarily FALSE?
Norms (L1, L2) and projections
Hard
A.If it's an orthogonal projection, is symmetric ()
B. is idempotent ()
C. is invertible (unless )
D.The eigenvalues of are only 0 or 1
Correct Answer: is invertible (unless )
Explanation:
A projection matrix maps a vector space onto a proper subspace (a subspace of lower dimension), unless it is the identity matrix . This means the transformation collapses some dimensions, mapping non-zero vectors (from the orthogonal complement of the subspace) to the zero vector. Therefore, the null space of is non-trivial, which means its determinant is zero and it is not invertible. The only exception is when the subspace is the entire space, in which case , which is invertible. All other properties are true for projection matrices.
Incorrect! Try again.
53The covariance matrix of a dataset is real and symmetric. It can be diagonalized by an orthogonal matrix as , where is a diagonal matrix of eigenvalues. In Principal Component Analysis (PCA), what does this transformation represent?
Linear operators and transformations in ML
Hard
A.A linear regression fit to the data.
B.A projection of the data onto a random lower-dimensional subspace.
C.A normalization of the data so that each feature has zero mean and unit variance.
D.A change of basis to a new coordinate system where the axes are the principal components and the data is uncorrelated.
Correct Answer: A change of basis to a new coordinate system where the axes are the principal components and the data is uncorrelated.
Explanation:
The columns of the orthogonal matrix are the eigenvectors of the covariance matrix . These eigenvectors are the principal components, which form a new orthonormal basis for the data. The diagonalization (or equivalently ) shows that in this new basis, the covariance matrix is diagonal. A diagonal covariance matrix means that the new features (the projections of the data onto the principal components) are mutually uncorrelated.
Incorrect! Try again.
54Two vectors (for ) are constructed such that their L2 norms are equal, . However, their L1 norms are at the theoretical extremes for a unit L2 vector: and . What is the structure of these vectors?
Norms (L1, L2) and projections
Hard
A.This scenario is impossible as the L1 norm cannot exceed the L2 norm.
B. has entries of equal magnitude, and is a standard basis vector.
C. is a standard basis vector (e.g., ), and has entries of equal magnitude (e.g., ).
D.Both and are standard basis vectors.
Correct Answer: is a standard basis vector (e.g., ), and has entries of equal magnitude (e.g., ).
Explanation:
For any vector , we have the inequality . For a unit L2 vector (), this becomes . The lower bound is achieved when all the vector's mass is concentrated in a single entry, which is a standard basis vector (e.g., for , and ). The upper bound is achieved when the mass is spread out as evenly as possible, which occurs when all entries have equal magnitude (e.g., for , and ).
Incorrect! Try again.
55In a linear regression model , where is the design matrix and are the weights, the vector of predicted values must lie in a specific subspace of . What is this subspace, and what is the geometric interpretation of the ordinary least squares (OLS) solution?
Vector spaces and subspaces
Hard
A.The null space of . OLS finds the component of that is orthogonal to this subspace.
B.The entire space . OLS is only applicable if is already in the column space of .
C.The row space of . OLS finds the orthogonal projection of onto the row space.
D.The column space of . OLS finds the orthogonal projection of the true target vector onto this subspace.
Correct Answer: The column space of . OLS finds the orthogonal projection of the true target vector onto this subspace.
Explanation:
By definition, the predicted vector is a linear combination of the columns of , with the weights as the coefficients. The set of all possible linear combinations of the columns of is its column space, . The goal of ordinary least squares is to minimize the squared L2 distance between the true vector and the prediction , i.e., . The vector in that is closest to is the orthogonal projection of onto .
Incorrect! Try again.
56From a geometric perspective, why does L1 regularization (Lasso) tend to produce sparse solutions (i.e., many zero weights), whereas L2 regularization (Ridge) does not?
Norms (L1, L2) and projections
Hard
A.The L1 norm is non-differentiable everywhere, which causes optimization algorithms to set weights to zero.
B.The L2 norm ball is convex, while the L1 norm ball is not, forcing solutions to be on an axis.
C.The L1 norm penalizes large weights more heavily than the L2 norm, forcing them to become exactly zero.
D.The L1 norm ball is a hyperdiamond with sharp corners that are more likely to intersect the elliptical contours of the loss function, while the L2 norm ball is a smooth hypersphere.
Correct Answer: The L1 norm ball is a hyperdiamond with sharp corners that are more likely to intersect the elliptical contours of the loss function, while the L2 norm ball is a smooth hypersphere.
Explanation:
The regularization problem can be viewed as finding the point where the level sets (contours) of the loss function first touch the constraint region defined by the norm ball (). The L1 norm ball () has sharp corners/vertices at the axes. The elliptical contours of the sum-of-squares error are likely to make first contact with this shape at one of its corners, where one or more weight coordinates are zero. In contrast, the L2 norm ball () is a smooth sphere, and the first point of contact will typically not be on an axis, resulting in small but non-zero weights.
Incorrect! Try again.
57A linear transformation is represented by a matrix which is not full rank. Let . Which statement accurately describes the geometry of this transformation?
Linear operators and transformations in ML
Hard
A.The transformation is invertible.
B.The transformation collapses the entire input space into an -dimensional subspace of the output space .
C.The transformation maps to the entire output space .
D.The null space of the transformation is the zero vector only.
Correct Answer: The transformation collapses the entire input space into an -dimensional subspace of the output space .
Explanation:
The rank of the matrix is the dimension of its column space (or range). The column space is the set of all possible output vectors. Therefore, the entire input space is mapped into this -dimensional subspace of . Since is not full rank, either or (or both). If , the null space is non-trivial, meaning a subspace of input vectors is mapped to zero. If , the range does not span the entire codomain. In all cases, a dimensionality reduction occurs.
Incorrect! Try again.
58The trace of a square matrix, , is the sum of its diagonal elements. It is also equal to the sum of its eigenvalues. Which of the following trace properties is NOT always true for general matrices ?
Vectors, matrices, and tensors in machine learning
Hard
A. (for A square, c scalar)
B.
C.
D. (for A, B square)
Correct Answer:
Explanation:
The trace operator has the cyclic property: . This means you can cyclically permute the matrices inside the trace. Following this rule, is equal to and . However, is not generally equal to , which is a non-cyclic permutation. For example, let A, B, C be 1x1 matrices (scalars) a=1, b=2, c=3. Tr(abc)=6 but Tr(bac)=6. Let's try non-square matrices producing a square result. Let , , . Then , which is not square, so trace is not defined. The matrices must result in a square matrix. Let . is valid. requires . For to be valid, and . This is too restrictive. The cyclic property for is the key. . It does not generally equal .
Incorrect! Try again.
59In Principal Component Analysis (PCA), the data is projected onto a subspace spanned by the eigenvectors corresponding to the largest eigenvalues of the covariance matrix. If the covariance matrix of a 3D dataset has eigenvalues with corresponding eigenvectors , what is the geometric interpretation of the subspace ?
Vector spaces and subspaces
Hard
A.It is a 2D plane that minimizes the reconstruction error when measured with the L1 norm.
B.It is the 2D plane passing through the origin that captures the maximum variance in the data.
C.It is an arbitrary 2D subspace; any pair of eigenvectors could be chosen.
D.It is the line that captures the minimum variance in the data.
Correct Answer: It is the 2D plane passing through the origin that captures the maximum variance in the data.
Explanation:
The eigenvalues of the covariance matrix represent the amount of variance in the data along the direction of the corresponding eigenvectors (the principal components). By selecting the eigenvectors associated with the largest eigenvalues, we are choosing the directions of maximum variance. The subspace spanned by these eigenvectors is the lower-dimensional hyperplane that best approximates the data, in the sense that projecting the data onto it preserves the most variance.
Incorrect! Try again.
60A single-channel (grayscale) image is represented by a matrix . A transformation is applied such that the new value of each pixel is a weighted average of itself and its four cardinal neighbors: . This operation is a convolution. How can this entire transformation be expressed using matrix multiplication on a vectorized version of the image, ?
Vectors, matrices, and tensors in machine learning
Hard
A.As an element-wise product with a weight matrix, .
B.This operation cannot be represented as a single matrix multiplication.
C.As a standard matrix multiplication, , for some small matrix .
D.As a multiplication by a large, sparse Toeplitz matrix, .
Correct Answer: As a multiplication by a large, sparse Toeplitz matrix, .
Explanation:
Any linear transformation can be represented by a matrix multiplication. A convolution is a linear transformation. When the input image is flattened (vectorized) into a single column vector of size , the 2D convolution operation can be represented as a multiplication with a large transformation matrix of size . Due to the local and repeated nature of the convolution kernel, this matrix has a special structure: it is very sparse (mostly zeros) and exhibits a block Toeplitz structure, where diagonals are constant.