Unit 2 - Practice Quiz

INT255 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the fundamental equation for an eigenvector and its corresponding eigenvalue for a matrix ?

Eigen decomposition and its limitations in ML Easy
A.
B.
C.
D.

2 A major limitation of eigen decomposition is that it can only be applied to what kind of matrices?

Eigen decomposition and its limitations in ML Easy
A. Zero matrices
B. Rectangular matrices
C. Square matrices
D. Identity matrices

3 In the eigen decomposition of a matrix as , what does the diagonal matrix contain?

Eigen decomposition and its limitations in ML Easy
A. The eigenvalues of A
B. The eigenvectors of A
C. The singular values of A
D. The inverse of A

4 What do the eigenvectors of a covariance matrix represent in the context of data?

Eigen decomposition and its limitations in ML Easy
A. The mean of the data
B. The number of data points
C. The directions of maximum variance in the data
D. The median of the data

5 Singular Value Decomposition (SVD) can be applied to which type of matrices?

Singular value decomposition (SVD) Easy
A. Only diagonal matrices
B. Only square matrices
C. Only symmetric matrices
D. Any rectangular matrix

6 In the SVD of a matrix , what does the matrix contain?

Singular value decomposition (SVD) Easy
A. Eigenvectors
B. Class labels
C. Singular values
D. Eigenvalues

7 What property do the matrices and have in the SVD of a matrix ?

Singular value decomposition (SVD) Easy
A. They are zero matrices.
B. They are diagonal matrices.
C. They are inverse matrices of each other.
D. They are orthogonal matrices.

8 What do larger singular values in SVD generally represent?

Singular value decomposition (SVD) Easy
A. Less important information in the matrix
B. The dimensions of the matrix
C. More important information or structure in the matrix
D. Noise in the data

9 What is the primary goal of Principal Component Analysis (PCA)?

Principal component analysis (PCA) from a geometric and optimization perspective Easy
A. To predict a continuous target variable
B. To reduce the dimensionality of the data while preserving the most variance
C. To classify data points into different groups
D. To find the mean of the dataset

10 Geometrically, what does PCA find?

Principal component analysis (PCA) from a geometric and optimization perspective Easy
A. The outliers in the dataset
B. The shortest path between data points
C. The clusters present in the data
D. A new coordinate system where axes point in the directions of maximum variance

11 The first principal component (PC1) is the direction that...

Principal component analysis (PCA) from a geometric and optimization perspective Easy
A. Points towards the origin.
B. Is parallel to one of the original axes.
C. Maximizes the variance of the projected data.
D. Minimizes the variance of the projected data.

12 Principal components are calculated as the eigenvectors of which matrix?

Principal component analysis (PCA) from a geometric and optimization perspective Easy
A. The original data matrix
B. The covariance matrix of the data
C. The inverse of the data matrix
D. The identity matrix

13 Are the principal components found by PCA correlated with each other?

Principal component analysis (PCA) from a geometric and optimization perspective Easy
A. Only the first two components are correlated.
B. It depends on the dataset.
C. No, they are uncorrelated (orthogonal).
D. Yes, they are highly correlated.

14 What is the primary goal of Linear Discriminant Analysis (LDA)?

Linear discriminant analysis (LDA) Easy
A. To reduce dimensions by ignoring class labels
B. To cluster unlabeled data
C. To find a lower-dimensional space that maximizes the separability between classes
D. To maximize the variance within each class

15 How does LDA differ fundamentally from PCA?

Linear discriminant analysis (LDA) Easy
A. There is no fundamental difference.
B. LDA is supervised (uses class labels), while PCA is unsupervised.
C. LDA is unsupervised, while PCA is supervised.
D. LDA always finds more dimensions than PCA.

16 To achieve good class separation, LDA aims to maximize the ratio of...

Linear discriminant analysis (LDA) Easy
A. between-class variance to total variance.
B. within-class variance to between-class variance.
C. total variance to within-class variance.
D. between-class variance to within-class variance.

17 What is a common application of LDA?

Linear discriminant analysis (LDA) Easy
A. Pre-processing for classification tasks
B. Recommending products to users
C. Anomaly detection
D. Data compression for storage

18 In the context of a recommendation system, what does the user-item interaction matrix typically contain?

Applications of matrix factorization in recommendation systems Easy
A. Ratings that users have given to items
B. The number of items in stock
C. User demographics
D. Item prices

19 When we factorize a user-item matrix into two smaller matrices (a user-feature matrix and an item-feature matrix), what do the "features" represent?

Applications of matrix factorization in recommendation systems Easy
A. The number of users and items
B. Latent (hidden) features that describe users and items
C. Explicit features like genre or price
D. The original ratings

20 How can matrix factorization be used to predict a rating for an item a user has not yet seen?

Applications of matrix factorization in recommendation systems Easy
A. By taking the dot product of the user's latent feature vector and the item's latent feature vector
B. By finding the average rating of that item
C. By copying the rating from the most similar user
D. It cannot be used for prediction, only for data compression.

21 A data matrix is tall and thin ( with ). Why can't we directly compute the eigen decomposition of ?

Eigen decomposition and its limitations in ML Medium
A. The matrix does not have a full set of linearly independent columns.
B. Eigen decomposition is computationally too expensive for tall matrices.
C. The matrix must be symmetric for eigen decomposition.
D. Eigen decomposition is only defined for square matrices.

22 If is an eigenvector of a matrix with eigenvalue , what is the corresponding eigenvalue for the matrix ?

Eigen decomposition and its limitations in ML Medium
A. It cannot be determined without knowing .
B.
C.
D.

23 For a real symmetric matrix, what is the geometric relationship between eigenvectors corresponding to distinct (different) eigenvalues?

Eigen decomposition and its limitations in ML Medium
A. They form an acute angle.
B. They are orthogonal.
C. They are parallel.
D. There is no guaranteed relationship.

24 Eigen decomposition is often applied to a covariance matrix in machine learning. What is a significant limitation of this approach if the features have vastly different scales (e.g., one feature in meters and another in kilometers)?

Eigen decomposition and its limitations in ML Medium
A. The eigenvalues become negative, which is not interpretable.
B. The eigenvector corresponding to the feature with the largest scale will dominate the analysis.
C. The computation of eigenvectors becomes numerically unstable.
D. The covariance matrix becomes non-symmetric, making decomposition impossible.

25 Given the SVD of a matrix as , where is an matrix. The singular values in are the square roots of the non-zero eigenvalues of which matrix?

Singular value decomposition (SVD) Medium
A.
B.
C. itself
D.

26 You perform SVD on a matrix . What is the maximum possible number of non-zero singular values?

Singular value decomposition (SVD) Medium
A. 500
B. 1000
C. 250
D. 1500

27 In the context of low-rank approximation, truncating the SVD of a matrix to keep the top singular values gives a matrix . What optimization problem does solve?

Singular value decomposition (SVD) Medium
A. It maximizes the determinant of .
B. It minimizes the Frobenius norm among all rank- matrices.
C. It minimizes the sum of the singular values of .
D. It ensures that is an orthogonal matrix.

28 If a square matrix is invertible, what is the relationship between the singular values of and its inverse ?

Singular value decomposition (SVD) Medium
A. The singular values of are the negatives of the singular values of .
B. The singular values of are the same as the singular values of .
C. The singular values of are the reciprocals of the singular values of .
D. There is no direct relationship between them.

29 From a geometric perspective, what do the principal components of a dataset represent?

Principal component analysis (PCA) from a geometric and optimization perspective Medium
A. A sequence of orthogonal directions that capture the maximum variance in the data.
B. The vectors pointing from the origin to the densest regions of the data.
C. The axes of the original feature space.
D. Directions that best separate the different classes in the data.

30 PCA can be viewed as an optimization problem where we seek to minimize the reconstruction error. What does this reconstruction error physically represent?

Principal component analysis (PCA) from a geometric and optimization perspective Medium
A. The number of data points misclassified by the projection.
B. The variance of the data projected onto the last principal component.
C. The total variance of the original dataset.
D. The sum of squared distances from each data point to its projection onto the principal component subspace.

31 You apply PCA to a dataset and find the eigenvalues of the covariance matrix are [10, 8, 0.1, 0.05]. What does this suggest about the dimensionality of your data?

Principal component analysis (PCA) from a geometric and optimization perspective Medium
A. The features are completely uncorrelated.
B. The data is uniformly distributed in a 4-dimensional space.
C. The data can be effectively represented in 2 dimensions with minimal information loss.
D. The data requires all 4 dimensions for an accurate representation.

32 What is the primary reason for mean-centering the data (subtracting the mean of each feature) before performing PCA?

Principal component analysis (PCA) from a geometric and optimization perspective Medium
A. To reduce the number of principal components needed.
B. To make the data matrix invertible.
C. To ensure all eigenvalues are positive.
D. To ensure the first principal component describes the direction of maximum variance, not the mean of the data.

33 What is the primary objective of Linear Discriminant Analysis (LDA) in the context of dimensionality reduction?

Linear discriminant analysis (LDA) Medium
A. To find a projection that maximizes the separation between classes.
B. To find a projection that minimizes the within-class variance, regardless of class separation.
C. To find a projection that maximizes the variance of the entire dataset.
D. To find a projection that makes the features uncorrelated.

34 You are working on a classification problem with 4 distinct classes. What is the maximum number of dimensions you can reduce your data to using LDA?

Linear discriminant analysis (LDA) Medium
A. 3
B. Dependent on the number of features.
C. 4
D. 2

35 LDA finds its projection vectors by solving a generalized eigenvalue problem of the form . What do and represent?

Linear discriminant analysis (LDA) Medium
A. is the sample covariance matrix and is the identity matrix.
B. is the data matrix and is its transpose .
C. is the between-class scatter matrix and is the within-class scatter matrix.
D. is the between-class scatter matrix and is the total scatter matrix.

36 Under what condition would PCA and LDA produce very similar results for dimensionality reduction in a classification task?

Linear discriminant analysis (LDA) Medium
A. When the number of features is much larger than the number of samples.
B. PCA and LDA can never produce similar results because their objectives are fundamentally different.
C. When the direction of maximum variance in the data also happens to be the direction that best separates the classes.
D. When the data is perfectly balanced across all classes.

37 In a recommendation system based on matrix factorization, we decompose a user-item rating matrix into two lower-rank matrices (users) and (items). What is the main purpose of this decomposition?

Applications of matrix factorization in recommendation systems Medium
A. To find the exact, original ratings for every user-item pair.
B. To learn latent features for users and items that can be used to predict missing ratings.
C. To reduce the storage space of the rating matrix by a guaranteed factor.
D. To identify the most popular items across all users.

38 A key challenge in matrix factorization for recommendation systems is the sparsity of the user-item matrix. How do algorithms typically handle the many missing entries during the training process?

Applications of matrix factorization in recommendation systems Medium
A. They treat all missing entries as a rating of zero.
B. They remove all users and items with too many missing ratings.
C. They calculate the prediction error and update the model parameters only on the observed ratings.
D. They fill the missing entries with the global average rating before factorization.

39 In a matrix factorization model, we have a user latent vector and two item latent vectors, and . Assuming ratings are predicted by the dot product, which item would be recommended to the user and why?

Applications of matrix factorization in recommendation systems Medium
A. Both items, because they have positive values in their latent vectors.
B. Item , because its dot product with the user vector is higher.
C. Item , because its dot product with the user vector is higher.
D. Neither item, because the user vector is not a unit vector.

40 What is a common method to prevent overfitting in matrix factorization models for recommendation systems?

Applications of matrix factorization in recommendation systems Medium
A. Initializing the user and item matrices with random noise from a uniform distribution.
B. Using only users who have rated a large number of items.
C. Increasing the number of latent factors () until the training error is zero.
D. Adding a regularization term (e.g., L2 regularization) to the loss function.

41 A non-symmetric matrix is not guaranteed to be diagonalizable. In the context of a machine learning model where represents state transitions in a discrete-time dynamical system, what is the primary implication if its eigen decomposition does not exist?

Eigen decomposition and its limitations in ML Hard
A. The system is inherently unstable and the state will always diverge to infinity.
B. The Jordan Normal Form must be used to understand the system's dynamics, which may involve transformations beyond simple scaling along eigenvector directions (e.g., shear transformations).
C. The model's long-term behavior cannot be analyzed using powers of .
D. The matrix must be singular, meaning its determinant is zero.

42 Given the SVD of a matrix , where and rank() = . The pseudoinverse is . What is the precise geometric interpretation of the operator represented by the projection matrix ?

Singular value decomposition (SVD) Hard
A. It is an orthogonal projection from onto the column space of (the range of ).
B. It is the identity operator on the column space of .
C. It is an orthogonal projection from onto the row space of (the range of ).
D. It is an orthogonal projection from onto the null space of (the left null space of ).

43 Consider a dataset where the covariance matrix has eigenvalues . You apply a linear transformation to the data , where is an orthogonal matrix (). What are the principal components and their corresponding variances (eigenvalues) for the transformed data ?

Principal component analysis (PCA) from a geometric and optimization perspective Hard
A. The principal components are the eigenvectors of , and the variances are unchanged.
B. The principal components are the columns of , and the variances are the diagonal elements of .
C. The principal components are rotated, but the variances (eigenvalues) remain unchanged.
D. Both the principal components and their variances change unpredictably.

44 In Fisher's LDA, we maximize , where is the between-class scatter and is the within-class scatter. If is singular, which can happen in high-dimensional settings (p > N), what is the most robust and standard procedure to make LDA applicable?

Linear discriminant analysis (LDA) Hard
A. Add a small identity matrix to (i.e., ) to make it invertible, a form of regularization.
B. Use the Moore-Penrose pseudoinverse of to solve the generalized eigenvalue problem.
C. The problem is unsolvable as the Fisher criterion is undefined.
D. First apply PCA to reduce dimensionality to a subspace where becomes non-singular, then apply LDA in that subspace.

45 In a recommendation system using matrix factorization, the objective function is . If a user has rated only one item with a rating of 5, and the regularization parameter is very large (approaching infinity), what will the learned latent vector converge to?

Applications of matrix factorization in recommendation systems Hard
A. A vector with a very large norm, pointing in the same direction as .
B. A near-zero vector.
C. A vector with a very large norm, orthogonal to .
D. The exact zero vector.

46 A real symmetric matrix is always diagonalizable by an orthogonal matrix. If a real matrix is known to be diagonalizable, but is not symmetric, what can we definitively conclude about its eigenvectors?

Eigen decomposition and its limitations in ML Hard
A. The matrix of eigenvectors can always be chosen to be an orthogonal matrix.
B. The eigenvectors corresponding to distinct real eigenvalues are orthogonal.
C. The eigenvectors are linearly independent but may not be orthogonal.
D. The eigenvectors are guaranteed to be orthogonal.

47 According to the Eckart-Young-Mirsky theorem, the best rank-k approximation of a matrix in the Frobenius norm is . If is a square, invertible matrix of size with singular values , what is the exact Frobenius norm of the error of the best rank- approximation, ?

Singular value decomposition (SVD) Hard
A.
B.
C.
D. 0

48 You perform PCA on a dataset with features and samples (). You find that the last eigenvalues of the data's covariance matrix are exactly zero (). What is the most precise geometric interpretation of this result?

Principal component analysis (PCA) from a geometric and optimization perspective Hard
A. The data points lie perfectly within a -dimensional affine subspace of the original -dimensional space.
B. The first components capture all the information, and the last can be discarded with no information loss.
C. The dataset contains categorical features that were improperly encoded.
D. The dataset has features that are pure noise.

49 Consider a binary classification problem where the two classes have identical, spherical covariance matrices (i.e., ) and their means are and . In this specific scenario, the optimal projection vector found by LDA is parallel to which vector?

Linear discriminant analysis (LDA) Hard
A. A vector orthogonal to the vector connecting the class means.
B. The vector connecting the class means, .
C. The direction is undefined because the within-class scatter matrix is proportional to the identity matrix.
D. The first principal component of the combined dataset.

50 When using Alternating Least Squares (ALS) for matrix factorization, the algorithm alternates between solving for user factors (given item factors ) and item factors (given ). Why is this approach particularly well-suited for large-scale distributed computation compared to simultaneous Stochastic Gradient Descent (SGD)?

Applications of matrix factorization in recommendation systems Hard
A. ALS requires significantly fewer iterations to converge than SGD.
B. Each step of ALS involves solving for user (or item) factors independently, which areembarrassingly parallel subproblems.
C. SGD cannot handle the sparse rating matrix, while ALS is specifically designed for it.
D. ALS is guaranteed to find the global minimum of the non-convex problem, whereas SGD is not.

51 A Markov chain is described by a row-stochastic transition matrix . The Perron-Frobenius theorem guarantees that its largest eigenvalue is 1. What is the significance of the corresponding left eigenvector, , which satisfies ?

Eigen decomposition and its limitations in ML Hard
A. It is the stationary distribution of the Markov chain, describing the long-term probability of being in each state.
B. It represents the initial distribution of the states.
C. It is always a uniform distribution, indicating all states are equally likely in the long run.
D. It is a vector of all ones, which is the right eigenvector for the eigenvalue 1.

52 A very large, dense matrix has a singular value spectrum that decays exponentially fast (e.g., ). What is the most important practical implication of this property?

Singular value decomposition (SVD) Hard
A. The matrix columns are nearly orthogonal, making it well-conditioned.
B. The matrix represents a chaotic system with high intrinsic dimensionality.
C. The matrix can be accurately approximated by a matrix of very low rank, enabling significant data compression and faster computations.
D. The matrix is nearly singular and numerically difficult to invert.

53 How does Probabilistic PCA (PPCA) fundamentally differ from standard PCA in its formulation and assumptions?

Principal component analysis (PCA) from a geometric and optimization perspective Hard
A. PPCA allows for non-orthogonal principal components, while standard PCA enforces orthogonality.
B. Standard PCA is an iterative algorithm while PPCA has a closed-form solution.
C. PPCA maximizes data likelihood under a generative latent variable model with isotropic Gaussian noise, while standard PCA is the deterministic, zero-noise limit of this model.
D. Standard PCA minimizes the L2 reconstruction error, whereas PPCA minimizes the L1 reconstruction error, making it more robust to outliers.

54 For a multi-class classification problem with classes and features, what is the maximum rank of the between-class scatter matrix , and what is the direct consequence of this for the dimensionality reduction performed by LDA?

Linear discriminant analysis (LDA) Hard
A. The rank is at most , where is the number of samples.
B. The rank is at most , so LDA can project to at most dimensions.
C. The rank is at most , so LDA can project to at most dimensions.
D. The rank is at most , so LDA can project to at most dimensions.

55 In collaborative filtering, a common first step before applying SVD is to fill missing ratings in the user-item matrix . How does the naive strategy of imputing missing values with the global mean rating bias the resulting model, especially for users with very few ratings?

Applications of matrix factorization in recommendation systems Hard
A. It causes the model to shrink all predictions towards the global mean, an effect identical to L2 regularization.
B. It has no significant effect as SVD is robust to such imputations.
C. It correctly centers the data, leading to a more accurate model.
D. It strongly biases the latent factor vectors of sparse users toward a 'generic' profile that primarily reflects average rating behavior, obscuring their unique tastes.

56 You are given a 2D dataset with two classes that form two long, thin, parallel clusters. The direction of maximum variance for the combined data is along the length of the clusters, while the direction that best separates them is orthogonal to their length. If you must reduce the data to 1 dimension for classification, which statement is most accurate?

Principal component analysis (PCA) from a geometric and optimization perspective Hard
A. PCA will perform better because it captures the global structure of the data.
B. LDA will perform much better because it will find the projection that maximizes the separation between the class means.
C. Both will perform equally well as they will identify the same primary axis.
D. Neither will be effective; a non-linear method like kernel PCA is required.

57 For a tall matrix with , computing the full SVD () is inefficient due to the size of . How does the 'Thin SVD' (or 'Economy SVD') provide a more efficient but still exact representation?

Singular value decomposition (SVD) Hard
A. Thin SVD sets all singular values below a threshold to zero, yielding a low-rank approximation.
B. Thin SVD computes the SVD of the smaller matrix to avoid dealing with the large dimension .
C. Thin SVD only computes the first columns of (as ) and the top-left block of , which is sufficient to perfectly reconstruct .
D. Thin SVD is an iterative algorithm that approximates the SVD, while full SVD is a direct method.

58 The standard Power Iteration algorithm finds the eigenvector corresponding to the eigenvalue with the largest magnitude. How can this method be adapted to find the eigenvalue of a matrix that is closest to a specific target value ?

Eigen decomposition and its limitations in ML Hard
A. By applying Power Iteration to the matrix and taking the reciprocal of the result.
B. By applying Power Iteration to the matrix .
C. It is not possible; Power Iteration is fundamentally limited to finding the dominant eigenvalue.
D. By applying Power Iteration to the matrix , an approach known as Inverse Iteration with a shift.

59 From an optimization perspective, PCA can be derived by finding a low-dimensional representation of data and a transformation matrix that minimizes the reconstruction error . What essential constraint must be placed on the columns of for the solution to be the standard PCA projection?

Principal component analysis (PCA) from a geometric and optimization perspective Hard
A. must be a lower triangular matrix to ensure a unique solution.
B. The columns of must form an orthonormal set ().
C. No constraints are needed; ordinary least squares minimization automatically yields the principal components.
D. The rows of must form an orthonormal set ().

60 In modern matrix factorization models, the prediction for a rating is often modeled as . What is the primary motivation for explicitly modeling the global bias , user bias , and item bias ?

Applications of matrix factorization in recommendation systems Hard
A. To ensure the latent factors and have a zero mean and unit variance.
B. It is a form of regularization that is more effective at preventing overfitting than a simple penalty.
C. To make the overall optimization problem convex, guaranteeing a global minimum.
D. To account for systematic rating tendencies (e.g., some users are consistently harsh raters, some items are universally popular) so that the latent factors can model true user-item preference interactions.