1 $What is the fundamental equation for an eigenvector and its corresponding eigenvalue for a matrix ?$

Eigen decomposition and its limitations in ML Easy

A.

B.

C.

D.

2 $A major limitation of eigen decomposition is that it can only be applied to what kind of matrices?$

Eigen decomposition and its limitations in ML Easy

A.

Identity matrices

B.

Rectangular matrices

C.

Square matrices

D.

Zero matrices

3 $In the eigen decomposition of a matrix as, what does the diagonal matrix contain?$

Eigen decomposition and its limitations in ML Easy

A.

The eigenvectors of A

B.

The singular values of A

C.

The inverse of A

D.

The eigenvalues of A

4 $What do the eigenvectors of a covariance matrix represent in the context of data?$

Eigen decomposition and its limitations in ML Easy

A.

The mean of the data

B.

The median of the data

C.

The directions of maximum variance in the data

D.

The number of data points

5 $Singular Value Decomposition (SVD) can be applied to which type of matrices?$

Singular value decomposition (SVD) Easy

A.

Only symmetric matrices

B.

Only square matrices

C.

Only diagonal matrices

D.

Any rectangular matrix

6 $In the SVD of a matrix, what does the matrix contain?$

Singular value decomposition (SVD) Easy

A.

Singular values

B.

Class labels

C.

Eigenvalues

D.

Eigenvectors

7 $What property do the matrices and have in the SVD of a matrix ?$

Singular value decomposition (SVD) Easy

A.

They are inverse matrices of each other.

B.

They are diagonal matrices.

C.

They are zero matrices.

D.

They are orthogonal matrices.

8 $What do larger singular values in SVD generally represent?$

Singular value decomposition (SVD) Easy

A.

More important information or structure in the matrix

B.

The dimensions of the matrix

C.

Noise in the data

D.

Less important information in the matrix

9 $What is the primary goal of Principal Component Analysis (PCA)?$

Principal component analysis (PCA) from a geometric and optimization perspective Easy

A.

To find the mean of the dataset

B.

To reduce the dimensionality of the data while preserving the most variance

C.

To classify data points into different groups

D.

To predict a continuous target variable

10 $Geometrically, what does PCA find?$

Principal component analysis (PCA) from a geometric and optimization perspective Easy

A.

The outliers in the dataset

B.

The clusters present in the data

C.

A new coordinate system where axes point in the directions of maximum variance

D.

The shortest path between data points

11 $The first principal component (PC1) is the direction that...$

Principal component analysis (PCA) from a geometric and optimization perspective Easy

A.

Is parallel to one of the original axes.

B.

Points towards the origin.

C.

Maximizes the variance of the projected data.

D.

Minimizes the variance of the projected data.

12 $Principal components are calculated as the eigenvectors of which matrix?$

Principal component analysis (PCA) from a geometric and optimization perspective Easy

A.

The covariance matrix of the data

B.

The inverse of the data matrix

C.

The identity matrix

D.

The original data matrix

13 $Are the principal components found by PCA correlated with each other?$

Principal component analysis (PCA) from a geometric and optimization perspective Easy

A.

No, they are uncorrelated (orthogonal).

B.

Yes, they are highly correlated.

C.

Only the first two components are correlated.

D.

It depends on the dataset.

14 $What is the primary goal of Linear Discriminant Analysis (LDA)?$

Linear discriminant analysis (LDA) Easy

A.

To find a lower-dimensional space that maximizes the separability between classes

B.

To reduce dimensions by ignoring class labels

C.

To maximize the variance within each class

D.

To cluster unlabeled data

15 $How does LDA differ fundamentally from PCA?$

Linear discriminant analysis (LDA) Easy

A.

LDA always finds more dimensions than PCA.

B.

There is no fundamental difference.

C.

LDA is supervised (uses class labels), while PCA is unsupervised.

D.

LDA is unsupervised, while PCA is supervised.

16 $To achieve good class separation, LDA aims to maximize the ratio of...$

Linear discriminant analysis (LDA) Easy

A.

between-class variance to within-class variance.

B.

total variance to within-class variance.

C.

within-class variance to between-class variance.

D.

between-class variance to total variance.

17 $What is a common application of LDA?$

Linear discriminant analysis (LDA) Easy

A.

Recommending products to users

B.

Pre-processing for classification tasks

C.

Anomaly detection

D.

Data compression for storage

18 $In the context of a recommendation system, what does the user-item interaction matrix typically contain?$

Applications of matrix factorization in recommendation systems Easy

A.

User demographics

B.

The number of items in stock

C.

Item prices

D.

Ratings that users have given to items

19 $When we factorize a user-item matrix into two smaller matrices (a user-feature matrix and an item-feature matrix), what do the "features" represent?$

Applications of matrix factorization in recommendation systems Easy

A.

The number of users and items

B.

Explicit features like genre or price

C.

The original ratings

D.

Latent (hidden) features that describe users and items

20 $How can matrix factorization be used to predict a rating for an item a user has not yet seen?$

Applications of matrix factorization in recommendation systems Easy

A.

It cannot be used for prediction, only for data compression.

B.

By finding the average rating of that item

C.

By copying the rating from the most similar user

D.

By taking the dot product of the user's latent feature vector and the item's latent feature vector

21 $A data matrix is tall and thin (with). Why can't we directly compute the eigen decomposition of ?$

Eigen decomposition and its limitations in ML Medium

A.

The matrix must be symmetric for eigen decomposition.

B.

Eigen decomposition is computationally too expensive for tall matrices.

C.

Eigen decomposition is only defined for square matrices.

D.

The matrix does not have a full set of linearly independent columns.

22 $If is an eigenvector of a matrix with eigenvalue, what is the corresponding eigenvalue for the matrix ?$

Eigen decomposition and its limitations in ML Medium

A.

B.

It cannot be determined without knowing .

C.

D.

23 $For a real symmetric matrix, what is the geometric relationship between eigenvectors corresponding to distinct (different) eigenvalues?$

Eigen decomposition and its limitations in ML Medium

A.

There is no guaranteed relationship.

B.

They are parallel.

C.

They form an acute angle.

D.

They are orthogonal.

24 $Eigen decomposition is often applied to a covariance matrix in machine learning. What is a significant limitation of this approach if the features have vastly different scales (e.g., one feature in meters and another in kilometers)?$

Eigen decomposition and its limitations in ML Medium

A.

The covariance matrix becomes non-symmetric, making decomposition impossible.

B.

The eigenvalues become negative, which is not interpretable.

C.

The computation of eigenvectors becomes numerically unstable.

D.

The eigenvector corresponding to the feature with the largest scale will dominate the analysis.

25 $Given the SVD of a matrix as, where is an matrix. The singular values in are the square roots of the non-zero eigenvalues of which matrix?$

Singular value decomposition (SVD) Medium

A.

B.

C.

D.

itself

26 $You perform SVD on a matrix . What is the maximum possible number of non-zero singular values?$

Singular value decomposition (SVD) Medium

A.

1000

B.

1500

C.

250

D.

500

27 $In the context of low-rank approximation, truncating the SVD of a matrix to keep the top singular values gives a matrix . What optimization problem does solve?$

Singular value decomposition (SVD) Medium

A.

It ensures that is an orthogonal matrix.

B.

It minimizes the sum of the singular values of .

C.

It minimizes the Frobenius norm among all rank- matrices.

D.

It maximizes the determinant of .

28 $If a square matrix is invertible, what is the relationship between the singular values of and its inverse ?$

Singular value decomposition (SVD) Medium

A.

The singular values of are the negatives of the singular values of .

B.

There is no direct relationship between them.

C.

The singular values of are the reciprocals of the singular values of .

D.

The singular values of are the same as the singular values of .

29 $From a geometric perspective, what do the principal components of a dataset represent?$

Principal component analysis (PCA) from a geometric and optimization perspective Medium

A.

The vectors pointing from the origin to the densest regions of the data.

B.

Directions that best separate the different classes in the data.

C.

A sequence of orthogonal directions that capture the maximum variance in the data.

D.

The axes of the original feature space.

30 $PCA can be viewed as an optimization problem where we seek to minimize the reconstruction error. What does this reconstruction error physically represent?$

Principal component analysis (PCA) from a geometric and optimization perspective Medium

A.

The variance of the data projected onto the last principal component.

B.

The total variance of the original dataset.

C.

The sum of squared distances from each data point to its projection onto the principal component subspace.

D.

The number of data points misclassified by the projection.

31 $You apply PCA to a dataset and find the eigenvalues of the covariance matrix are [10, 8, 0.1, 0.05]. What does this suggest about the dimensionality of your data?$

Principal component analysis (PCA) from a geometric and optimization perspective Medium

A.

The data can be effectively represented in 2 dimensions with minimal information loss.

B.

The data requires all 4 dimensions for an accurate representation.

C.

The data is uniformly distributed in a 4-dimensional space.

D.

The features are completely uncorrelated.

32 $What is the primary reason for mean-centering the data (subtracting the mean of each feature) before performing PCA?$

Principal component analysis (PCA) from a geometric and optimization perspective Medium

A.

To ensure all eigenvalues are positive.

B.

To make the data matrix invertible.

C.

To ensure the first principal component describes the direction of maximum variance, not the mean of the data.

D.

To reduce the number of principal components needed.

33 $What is the primary objective of Linear Discriminant Analysis (LDA) in the context of dimensionality reduction?$

Linear discriminant analysis (LDA) Medium

A.

To find a projection that minimizes the within-class variance, regardless of class separation.

B.

To find a projection that maximizes the variance of the entire dataset.

C.

To find a projection that makes the features uncorrelated.

D.

To find a projection that maximizes the separation between classes.

34 $You are working on a classification problem with 4 distinct classes. What is the maximum number of dimensions you can reduce your data to using LDA?$

Linear discriminant analysis (LDA) Medium

A.

Dependent on the number of features.

B.

2

C.

4

D.

3

35 $LDA finds its projection vectors by solving a generalized eigenvalue problem of the form . What do and represent?$

Linear discriminant analysis (LDA) Medium

A.

is the data matrix and is its transpose .

B.

is the between-class scatter matrix and is the total scatter matrix.

C.

is the sample covariance matrix and is the identity matrix.

D.

is the between-class scatter matrix and is the within-class scatter matrix.

36 $Under what condition would PCA and LDA produce very similar results for dimensionality reduction in a classification task?$

Linear discriminant analysis (LDA) Medium

A.

When the direction of maximum variance in the data also happens to be the direction that best separates the classes.

B.

When the number of features is much larger than the number of samples.

C.

PCA and LDA can never produce similar results because their objectives are fundamentally different.

D.

When the data is perfectly balanced across all classes.

37 $In a recommendation system based on matrix factorization, we decompose a user-item rating matrix into two lower-rank matrices (users) and (items). What is the main purpose of this decomposition?$

Applications of matrix factorization in recommendation systems Medium

A.

To learn latent features for users and items that can be used to predict missing ratings.

B.

To reduce the storage space of the rating matrix by a guaranteed factor.

C.

To identify the most popular items across all users.

D.

To find the exact, original ratings for every user-item pair.

38 $A key challenge in matrix factorization for recommendation systems is the sparsity of the user-item matrix. How do algorithms typically handle the many missing entries during the training process?$

Applications of matrix factorization in recommendation systems Medium

A.

They remove all users and items with too many missing ratings.

B.

They fill the missing entries with the global average rating before factorization.

C.

They calculate the prediction error and update the model parameters only on the observed ratings.

D.

They treat all missing entries as a rating of zero.

39 $In a matrix factorization model, we have a user latent vector and two item latent vectors, and . Assuming ratings are predicted by the dot product, which item would be recommended to the user and why?$

Applications of matrix factorization in recommendation systems Medium

A.

Item, because its dot product with the user vector is higher.

B.

Neither item, because the user vector is not a unit vector.

C.

Item, because its dot product with the user vector is higher.

D.

Both items, because they have positive values in their latent vectors.

40 $What is a common method to prevent overfitting in matrix factorization models for recommendation systems?$

Applications of matrix factorization in recommendation systems Medium

A.

Adding a regularization term (e.g., L2 regularization) to the loss function.

B.

Initializing the user and item matrices with random noise from a uniform distribution.

C.

Increasing the number of latent factors () until the training error is zero.

D.

Using only users who have rated a large number of items.

41 $A non-symmetric matrix is not guaranteed to be diagonalizable. In the context of a machine learning model where represents state transitions in a discrete-time dynamical system, what is the primary implication if its eigen decomposition does not exist?$

Eigen decomposition and its limitations in ML Hard

A.

The Jordan Normal Form must be used to understand the system's dynamics, which may involve transformations beyond simple scaling along eigenvector directions (e.g., shear transformations).

B.

The system is inherently unstable and the state will always diverge to infinity.

C.

The matrix must be singular, meaning its determinant is zero.

D.

The model's long-term behavior cannot be analyzed using powers of .

42 $Given the SVD of a matrix, where and rank() = . The pseudoinverse is . What is the precise geometric interpretation of the operator represented by the projection matrix ?$

Singular value decomposition (SVD) Hard

A.

It is an orthogonal projection from onto the null space of (the left null space of).

B.

It is the identity operator on the column space of .

C.

It is an orthogonal projection from onto the row space of (the range of).

D.

It is an orthogonal projection from onto the column space of (the range of).

43 $Consider a dataset where the covariance matrix has eigenvalues . You apply a linear transformation to the data, where is an orthogonal matrix (). What are the principal components and their corresponding variances (eigenvalues) for the transformed data ?$

Principal component analysis (PCA) from a geometric and optimization perspective Hard

A.

The principal components are rotated, but the variances (eigenvalues) remain unchanged.

B.

The principal components are the eigenvectors of, and the variances are unchanged.

C.

The principal components are the columns of, and the variances are the diagonal elements of .

D.

Both the principal components and their variances change unpredictably.

44 $In Fisher's LDA, we maximize, where is the between-class scatter and is the within-class scatter. If is singular, which can happen in high-dimensional settings (p > N), what is the most robust and standard procedure to make LDA applicable?$

Linear discriminant analysis (LDA) Hard

A.

First apply PCA to reduce dimensionality to a subspace where becomes non-singular, then apply LDA in that subspace.

B.

Use the Moore-Penrose pseudoinverse of to solve the generalized eigenvalue problem.

C.

Add a small identity matrix to (i.e.,) to make it invertible, a form of regularization.

D.

The problem is unsolvable as the Fisher criterion is undefined.

45 $In a recommendation system using matrix factorization, the objective function is . If a user has rated only one item with a rating of 5, and the regularization parameter is very large (approaching infinity), what will the learned latent vector converge to?$

Applications of matrix factorization in recommendation systems Hard

A.

The exact zero vector.

B.

A vector with a very large norm, pointing in the same direction as .

C.

A near-zero vector.

D.

A vector with a very large norm, orthogonal to .

46 $A real symmetric matrix is always diagonalizable by an orthogonal matrix. If a real matrix is known to be diagonalizable, but is not symmetric, what can we definitively conclude about its eigenvectors?$

Eigen decomposition and its limitations in ML Hard

A.

The eigenvectors corresponding to distinct real eigenvalues are orthogonal.

B.

The matrix of eigenvectors can always be chosen to be an orthogonal matrix.

C.

The eigenvectors are guaranteed to be orthogonal.

D.

The eigenvectors are linearly independent but may not be orthogonal.

47 $According to the Eckart-Young-Mirsky theorem, the best rank-k approximation of a matrix in the Frobenius norm is . If is a square, invertible matrix of size with singular values, what is the exact Frobenius norm of the error of the best rank- approximation, ?$

Singular value decomposition (SVD) Hard

A.

B.

C.

D.

0

48 $You perform PCA on a dataset with features and samples (). You find that the last eigenvalues of the data's covariance matrix are exactly zero (). What is the most precise geometric interpretation of this result?$

Principal component analysis (PCA) from a geometric and optimization perspective Hard

A.

The data points lie perfectly within a -dimensional affine subspace of the original -dimensional space.

B.

The dataset has features that are pure noise.

C.

The dataset contains categorical features that were improperly encoded.

D.

The first components capture all the information, and the last can be discarded with no information loss.

49 $Consider a binary classification problem where the two classes have identical, spherical covariance matrices (i.e.,) and their means are and . In this specific scenario, the optimal projection vector found by LDA is parallel to which vector?$

Linear discriminant analysis (LDA) Hard

A.

A vector orthogonal to the vector connecting the class means.

B.

The first principal component of the combined dataset.

C.

The direction is undefined because the within-class scatter matrix is proportional to the identity matrix.

D.

The vector connecting the class means, .

50 $When using Alternating Least Squares (ALS) for matrix factorization, the algorithm alternates between solving for user factors (given item factors) and item factors (given). Why is this approach particularly well-suited for large-scale distributed computation compared to simultaneous Stochastic Gradient Descent (SGD)?$

Applications of matrix factorization in recommendation systems Hard

A.

SGD cannot handle the sparse rating matrix, while ALS is specifically designed for it.

B.

Each step of ALS involves solving for user (or item) factors independently, which areembarrassingly parallel subproblems.

C.

ALS is guaranteed to find the global minimum of the non-convex problem, whereas SGD is not.

D.

ALS requires significantly fewer iterations to converge than SGD.

51 $A Markov chain is described by a row-stochastic transition matrix . The Perron-Frobenius theorem guarantees that its largest eigenvalue is 1. What is the significance of the corresponding left eigenvector,, which satisfies ?$

Eigen decomposition and its limitations in ML Hard

A.

It is a vector of all ones, which is the right eigenvector for the eigenvalue 1.

B.

It represents the initial distribution of the states.

C.

It is the stationary distribution of the Markov chain, describing the long-term probability of being in each state.

D.

It is always a uniform distribution, indicating all states are equally likely in the long run.

52 $A very large, dense matrix has a singular value spectrum that decays exponentially fast (e.g.,). What is the most important practical implication of this property?$

Singular value decomposition (SVD) Hard

A.

The matrix can be accurately approximated by a matrix of very low rank, enabling significant data compression and faster computations.

B.

The matrix represents a chaotic system with high intrinsic dimensionality.

C.

The matrix columns are nearly orthogonal, making it well-conditioned.

D.

The matrix is nearly singular and numerically difficult to invert.

53 $How does Probabilistic PCA (PPCA) fundamentally differ from standard PCA in its formulation and assumptions?$

Principal component analysis (PCA) from a geometric and optimization perspective Hard

A.

Standard PCA is an iterative algorithm while PPCA has a closed-form solution.

B.

PPCA allows for non-orthogonal principal components, while standard PCA enforces orthogonality.

C.

Standard PCA minimizes the L2 reconstruction error, whereas PPCA minimizes the L1 reconstruction error, making it more robust to outliers.

D.

PPCA maximizes data likelihood under a generative latent variable model with isotropic Gaussian noise, while standard PCA is the deterministic, zero-noise limit of this model.

54 $For a multi-class classification problem with classes and features, what is the maximum rank of the between-class scatter matrix, and what is the direct consequence of this for the dimensionality reduction performed by LDA?$

Linear discriminant analysis (LDA) Hard

A.

The rank is at most, so LDA can project to at most dimensions.

B.

The rank is at most, so LDA can project to at most dimensions.

C.

The rank is at most, where is the number of samples.

D.

The rank is at most, so LDA can project to at most dimensions.

55 $In collaborative filtering, a common first step before applying SVD is to fill missing ratings in the user-item matrix . How does the naive strategy of imputing missing values with the global mean rating bias the resulting model, especially for users with very few ratings?$

Applications of matrix factorization in recommendation systems Hard

A.

It has no significant effect as SVD is robust to such imputations.

B.

It strongly biases the latent factor vectors of sparse users toward a 'generic' profile that primarily reflects average rating behavior, obscuring their unique tastes.

C.

It causes the model to shrink all predictions towards the global mean, an effect identical to L2 regularization.

D.

It correctly centers the data, leading to a more accurate model.

56 $You are given a 2D dataset with two classes that form two long, thin, parallel clusters. The direction of maximum variance for the combined data is along the length of the clusters, while the direction that best separates them is orthogonal to their length. If you must reduce the data to 1 dimension for classification, which statement is most accurate?$

Principal component analysis (PCA) from a geometric and optimization perspective Hard

A.

Neither will be effective; a non-linear method like kernel PCA is required.

B.

LDA will perform much better because it will find the projection that maximizes the separation between the class means.

C.

PCA will perform better because it captures the global structure of the data.

D.

Both will perform equally well as they will identify the same primary axis.

57 $For a tall matrix with, computing the full SVD () is inefficient due to the size of . How does the 'Thin SVD' (or 'Economy SVD') provide a more efficient but still exact representation?$

Singular value decomposition (SVD) Hard

A.

Thin SVD sets all singular values below a threshold to zero, yielding a low-rank approximation.

B.

Thin SVD computes the SVD of the smaller matrix to avoid dealing with the large dimension .

C.

Thin SVD is an iterative algorithm that approximates the SVD, while full SVD is a direct method.

D.

Thin SVD only computes the first columns of (as) and the top-left block of, which is sufficient to perfectly reconstruct .

58 $The standard Power Iteration algorithm finds the eigenvector corresponding to the eigenvalue with the largest magnitude. How can this method be adapted to find the eigenvalue of a matrix that is closest to a specific target value ?$

Eigen decomposition and its limitations in ML Hard

A.

It is not possible; Power Iteration is fundamentally limited to finding the dominant eigenvalue.

B.

By applying Power Iteration to the matrix .

C.

By applying Power Iteration to the matrix and taking the reciprocal of the result.

D.

By applying Power Iteration to the matrix, an approach known as Inverse Iteration with a shift.

59 $From an optimization perspective, PCA can be derived by finding a low-dimensional representation of data and a transformation matrix that minimizes the reconstruction error . What essential constraint must be placed on the columns of for the solution to be the standard PCA projection?$

Principal component analysis (PCA) from a geometric and optimization perspective Hard

A.

No constraints are needed; ordinary least squares minimization automatically yields the principal components.

B.

The rows of must form an orthonormal set ().

C.

The columns of must form an orthonormal set ().

D.

must be a lower triangular matrix to ensure a unique solution.

60 $In modern matrix factorization models, the prediction for a rating is often modeled as . What is the primary motivation for explicitly modeling the global bias, user bias, and item bias ?$

Applications of matrix factorization in recommendation systems Hard

A.

To ensure the latent factors and have a zero mean and unit variance.

B.

To make the overall optimization problem convex, guaranteeing a global minimum.

C.

It is a form of regularization that is more effective at preventing overfitting than a simple penalty.

D.

To account for systematic rating tendencies (e.g., some users are consistently harsh raters, some items are universally popular) so that the latent factors can model true user-item preference interactions.

Unit 2 - Practice Quiz