1 $In the context of linear algebra, which of the following best describes a scalar?$

Scalars Easy

A.

A two-dimensional array of numbers

B.

A single number

C.

A multi-dimensional array of numbers

D.

A one-dimensional array of numbers

2 $A 2-dimensional array of numbers, arranged in rows and columns, is known as a:$

Matrices Easy

A.

Scalar

B.

Matrix

C.

Vector

D.

Tensor

3 $What is a vector?$

Vectors Easy

A.

A single number with magnitude only

B.

A mathematical constant

C.

A 2D grid of numbers

D.

An ordered list of numbers

4 $How is a tensor best described in relation to scalars, vectors, and matrices?$

Tensors Easy

A.

A tensor is a generalization of scalars, vectors, and matrices to any number of dimensions.

B.

A tensor is another name for a scalar.

C.

A tensor is a specific type of vector.

D.

A tensor is always a 3-dimensional array.

5 $In the equation, where is a matrix and is a non-zero vector, what does represent?$

Eigenvalues and Eigenvectors Easy

A.

Eigenvalue

B.

Eigenmatrix

C.

Eigenvector

D.

Eigendirection

6 $The probability of an event is always a number between:$

Probability Foundations Easy

A.

0 and 1

B.

0 and infinity

C.

0 and 100

D.

-1 and 1

7 $What is a random variable?$

Random variables Easy

A.

A constant value in an experiment

B.

A variable whose value is a numerical outcome of a random phenomenon

C.

A variable that has no defined value

D.

A variable that is not a number

8 $A function that describes the likelihood of all possible outcomes for a random variable is called a:$

Probability distribution Easy

A.

Mean function

B.

Loss function

C.

Probability distribution

D.

Activation function

9 $The mean, or expected value, of a dataset represents its:$

Mean Easy

A.

Spread or dispersion

B.

Most frequently occurring value

C.

Middle value

D.

Central tendency or average

10 $In statistics, what does variance measure?$

Variance Easy

A.

The spread of the data around the mean

B.

The most common value in the dataset

C.

The average value of the data

D.

The relationship between two different variables

11 $A positive covariance between two variables indicates that:$

Covariance Easy

A.

As one variable increases, the other tends to decrease

B.

As one variable increases, the other tends to increase

C.

The two variables are not related

D.

Both variables are always positive

12 $The probability of an event A occurring, given that event B has already occurred, is known as:$

Joint, Marginal and Conditional Probability Easy

A.

Joint Probability

B.

Prior Probability

C.

Conditional Probability

D.

Marginal Probability

13 $What is the primary purpose of Bayes' Theorem?$

Baye’s Theorem Easy

A.

To measure the spread of a probability distribution

B.

To calculate the average of a probability distribution

C.

To define the independence of two events

D.

To update the probability of a hypothesis based on new evidence

14 $Which statement is true when comparing Likelihood and Probability?$

Likelihood vs Probability Easy

A.

Probability is for discrete data and Likelihood is for continuous data.

B.

The sum of all likelihoods must equal 1.

C.

Probability refers to future outcomes given fixed parameters, while Likelihood refers to parameters given observed outcomes.

D.

Likelihood and Probability are interchangeable terms.

15 $In mathematics, a function is a rule that:$

Functions Easy

A.

Assigns exactly one output to each input

B.

Is a collection of random numbers

C.

Can assign multiple outputs to a single input

D.

Always returns a positive number

16 $The gradient of a function at a point gives the direction of the:$

Gradient Easy

A.

Steepest ascent

B.

Curve

C.

Minimum value

D.

Steepest descent

17 $When taking the partial derivative of a multivariable function with respect to one variable, how are the other variables treated?$

Partial derivatives Easy

A.

They are ignored

B.

As variables to be differentiated

C.

As constants

D.

As zero

18 $The Chain Rule is a formula for computing the derivative of a:$

Chain Rule Easy

A.

Sum of two functions

B.

Quotient of two functions

C.

Composite function

D.

Product of two functions

19 $The probability of two or more events occurring together, such as, is called:$

Joint, Marginal and Conditional Probability Easy

A.

Independent Probability

B.

Conditional Probability

C.

Joint Probability

D.

Marginal Probability

20 $In training a machine learning model, what is the primary role of the gradient of the loss function?$

Calculus for ML Easy

A.

To select the best features from the input data

B.

To determine the number of training epochs

C.

To initialize the model's weights

D.

To guide the updating of model parameters to minimize the loss

21 $Let be a square matrix and be an eigenvector of with a corresponding eigenvalue . What is the result of the transformation ?$

Eigenvalues and Eigenvectors Medium

A.

The vector is scaled by the eigenvalue, resulting in .

B.

The vector is unchanged.

C.

The result is the scalar value .

D.

The vector is rotated but not scaled.

22 $What is the gradient of the function at the point ?$

Gradient Medium

A.

B.

C.

D.

23 $A spam filter is 90% accurate at detecting spam (True Positive Rate), and has a 95% accuracy for not marking a non-spam email as spam (True Negative Rate). If 10% of all emails are spam, what is the probability that an email flagged as spam is actually non-spam?$

Baye’s Theorem Medium

A.

~48.7%

B.

~35.7%

C.

~90.0%

D.

~5.2%

24 $A rare disease affects 1 in 1000 people. A test for this disease has a 99% true positive rate (sensitivity) and a 98% true negative rate (specificity). If a randomly selected individual tests positive, what is the approximate probability that they actually have the disease?$

Baye’s Theorem Medium

A.

~1.0%

B.

~82.5%

C.

~4.7%

D.

~99.0%

25 $If the covariance between two random variables, and, is calculated to be zero (), what can be definitively concluded?$

Covariance Medium

A.

The variance of is equal to the variance of .

B.

Either or must be a constant.

C.

There is no linear relationship between and .

D.

and are statistically independent.

26 $For the function, find the partial derivative with respect to, denoted as .$

Partial derivatives Medium

A.

B.

C.

D.

27 $Consider the following joint probability distribution for two discrete random variables, and : | | Y=0 | Y=1 | |:---:|:---:|:---:| | X=0 | 0.1 | 0.4 | | X=1 | 0.2 | 0.3 | What is the conditional probability ?$

Joint, Marginal and Conditional Probability Medium

A.

0.429

B.

0.7

C.

0.3

D.

0.5

28 $If matrix has dimensions and matrix has dimensions, what are the dimensions of the resulting matrix product ?$

Matrices Medium

A.

B.

C.

D.

The product is not defined.

29 $In a simple neural network, the output is, and the loss is . What is the partial derivative of the loss with respect to the weight, ?$

Chain Rule Medium

A.

B.

C.

D.

30 $You toss a coin 10 times and observe 7 heads. Let be the probability of getting a head on a single toss. The function is best described as:$

Likelihood vs Probability Medium

A.

The joint probability of the data and the parameter .

B.

The probability of the parameter given the observed data.

C.

The likelihood of the parameter given the observed data.

D.

The probability of observing the data.

31 $What are the eigenvalues of the matrix ?$

Eigenvalues and Eigenvectors Medium

A.

B.

C.

D.

32 $For a continuous random variable with a probability density function (PDF), which statement is correct regarding the probability of taking on a specific value ?$

Probability distribution Medium

A.

cannot be determined.

B.

is the area under the curve at point .

C.

D.

33 $A batch of 32 grayscale images, each with a resolution of 64x64 pixels, is to be fed into a neural network. What is the rank (or number of axes) of the tensor required to represent this data?$

Tensors Medium

A.

4

B.

1

C.

2

D.

3

34 $Given a random variable with and a constant . What is the variance of the new random variable, i.e., ?$

Variance Medium

A.

36

B.

41

C.

18

D.

23

35 $In the context of machine learning optimization, why is the negative gradient () used in the gradient descent algorithm?$

Gradient Medium

A.

It points towards the global maximum of the cost function.

B.

It is orthogonal to the direction of steepest descent.

C.

It is always a vector of negative values, which simplifies calculations.

D.

It points in the direction of the steepest descent of the cost function.

36 $Given two vectors and . What is the dot product ?$

Vectors Medium

A.

-14

B.

12

C.

The dot product is not defined for these vectors.

D.

32

37 $In the formulation of Bayes' Theorem,, the term is known as what?$

Baye’s Theorem Medium

A.

Prior Probability

B.

Posterior Probability

C.

Likelihood

D.

Evidence

38 $In Principal Component Analysis (PCA), the principal components of a dataset are found by computing the eigenvectors of the data's covariance matrix. How are these principal components typically ordered?$

Eigenvalues and Eigenvectors Medium

A.

By the ascending order of their corresponding eigenvalues.

B.

By the descending order of their corresponding eigenvalues.

C.

Randomly, as the order does not matter.

D.

Alphabetically by the name of the original features.

39 $A data scientist is measuring the exact time (in seconds) it takes for a user to click a 'buy' button after a webpage loads. What type of random variable is this measurement?$

Random variables Medium

A.

Continuous random variable

B.

Bernoulli random variable

C.

Discrete random variable

D.

Categorical random variable

40 $The cost function for Linear Regression is Mean Squared Error (MSE), given by . This function is widely used because it has a special property that guarantees gradient descent will find the global minimum. What is this property?$

Functions Medium

A.

It is a linear function.

B.

It is a non-negative function.

C.

It is a discontinuous function.

D.

It is a convex function.

41 $If two events, A and B, are independent, which of the following statements correctly describes their joint probability, ?$

Joint, Marginal and Conditional Probability Medium

A.

B.

C.

D.

42 $Let be a real symmetric matrix with eigenvalues . The Rayleigh quotient is defined as for a non-zero vector . What is the maximum value of and for which vector is it achieved?$

Eigenvalues and Eigenvectors Hard

A.

The maximum value is, achieved when is a linear combination of the corresponding eigenvectors.

B.

The maximum value is, achieved when is the eigenvector corresponding to .

C.

The maximum value is the trace of A,, achieved when is a vector of all ones.

D.

The maximum value is, achieved when is the eigenvector corresponding to .

43 $Consider the L2 regularized loss function for linear regression:, where,,, and is a scalar regularization parameter. What is the gradient ?$

Gradient Hard

A.

B.

C.

D.

44 $In a generative model for classification, we model the class-conditional densities and the class priors . The posterior probability is then derived using Bayes' theorem. If we assume that for all classes, the class-conditional densities are Gaussian distributions with a shared covariance matrix but different means, what form does the decision boundary between any two classes and take?$

Baye’s Theorem Hard

A.

Quadratic

B.

A combination of exponential functions

C.

Circular

D.

Linear

45 $If the covariance matrix of a random vector is diagonal, which of the following statements is the most precise and universally true conclusion?$

Covariance Hard

A.

The random variables and are uncorrelated.

B.

The variances of and must be equal.

C.

The random variables and are independent.

D.

The joint probability distribution must be a Gaussian distribution.

46 $Let be a scalar loss function. The output of a layer is a vector, where . Here, is a weight matrix, is the input, is the bias, and is an element-wise activation function. Using the chain rule, what is the partial derivative of the loss with respect to the weight matrix, ?$

Chain Rule Hard

A.

B.

C.

D.

47 $Consider a coin toss experiment modeled by a Bernoulli distribution with parameter (probability of heads). You observe a sequence of outcomes: D = {Heads, Tails, Heads}. Which of the following statements correctly describes the likelihood function ?$

Likelihood vs Probability Hard

A.

The likelihood is, which is calculated by normalizing over all possible data .

B.

. As a function of, it is a valid probability density function.

C.

is the probability of observing the data, and it must be less than or equal to 1.

D.

. As a function of, it is not a probability distribution and its integral over is not necessarily 1.

48 $A real matrix is positive semi-definite (PSD) if for all non-zero vectors . Which of the following conditions is NOT equivalent to matrix being PSD?$

Matrices Hard

A.

The matrix can be decomposed as for some matrix .

B.

The determinant of is non-negative.

C.

All principal minors of are non-negative.

D.

All eigenvalues of are non-negative.

49 $In tensor algebra, a contraction is a generalization of the matrix trace operation. Consider a rank-3 tensor with components and a rank-2 tensor (matrix) with components . The operation defined by represents a contraction. What is the rank of the resulting tensor ?$

Tensors Hard

A.

Rank 3

B.

Rank 2

C.

Rank 1

D.

Rank 5

50 $Let and be continuous random variables with a joint PDF that is non-zero only in the square region where and . Given within this region, what is the conditional probability density function ?$

Joint, Marginal and Conditional Probability Hard

A.

B.

C.

D.

51 $For a twice-differentiable multivariable function used as a loss function in machine learning, a critical point (where) is a saddle point if the Hessian matrix is:$

Functions Hard

A.

The zero matrix.

B.

Negative semi-definite but not negative definite.

C.

Positive semi-definite but not positive definite.

D.

Indefinite (has both positive and negative eigenvalues).

52 $If a square matrix is idempotent () and is not the identity matrix or the zero matrix, what can be definitively concluded about its eigenvalues?$

Eigenvalues and Eigenvectors Hard

A.

All eigenvalues must be real and positive.

B.

All eigenvalues must be either 0 or 1.

C.

All eigenvalues must be 1.

D.

The matrix must have at least one eigenvalue equal to 0 and at least one eigenvalue equal to 1.

53 $The moment generating function (MGF) for a random variable is given by . What are the mean and variance of ?$

Probability Distribution Hard

A.

Mean = 3, Variance = 8

B.

Mean = 6, Variance = 8

C.

Mean = 3, Variance = 2

D.

Mean = 3, Variance = 4

54 $For a function where A is a symmetric positive definite matrix, the gradient with respect to matrix A,, is known to be . How does this result change if A is not restricted to be symmetric?$

Gradient Hard

A.

B.

C.

The gradient is undefined for non-symmetric matrices.

D.

55 $Consider the softmax function applied to a vector, where the -th component is . What is the partial derivative for the case where ?$

Partial derivatives Hard

A.

B.

C.

D.

56 $Let and where,, and . The Jacobians are and . According to the multivariate chain rule, what is the Jacobian of the composite function, denoted ?$

Chain Rule Hard

A.

B.

C.

D.

57 $Two independent random variables and are exponentially distributed with the same rate parameter . Their PDFs are for and for . Let . What is the probability density function of ?$

Joint, Marginal and Conditional Probability Hard

A.

A Chi-squared distribution:

B.

An Exponential distribution:

C.

A Normal distribution due to the Central Limit Theorem.

D.

A Gamma distribution:

58 $In a very high-dimensional Euclidean space (e.g.,), what is the approximate angle between two vectors and drawn independently from an isotropic Gaussian distribution ?$

Vectors Hard

A.

The angle is uniformly distributed between and .

B.

(radians)

C.

or (0 or radians)

D.

(radians)

59 $Let and be two random variables with variances,, and covariance . What is the variance of the random variable ?$

Mean, Variance, Covariance Hard

A.

B.

C.

D.

60 $Using the change of variable technique, if a random variable has a probability density function, and where is a strictly monotonic and differentiable function, what is the PDF of, ?$

Random variables Hard

A.

B.

C.

D.

61 $In Bayesian inference, the posterior distribution is proportional to the product of the likelihood and the prior: . If we choose a conjugate prior for a given likelihood function, what is the primary computational advantage?$

Baye’s Theorem Hard

A.

The posterior distribution belongs to the same family of distributions as the prior, making updates simple and analytical.

B.

The prior and posterior distributions become independent of the data.

C.

It eliminates the need to calculate the evidence term .

D.

The resulting model is guaranteed to have a lower generalization error.

Unit 4 - Practice Quiz