1 $In the context of Machine Learning, what does the dot product of two vectors typically represent in applied linear algebra?$

A.

The sum of their dimensions

B.

The measure of similarity or projection of one vector onto another

C.

The division of individual elements

D.

The probability of the vectors being independent

2 $Given a matrix of size and a matrix of size, what are the dimensions of the resulting matrix ?$

A.

B.

C.

D.

3 $Which concept in linear algebra is fundamental to Principal Component Analysis (PCA) for dimensionality reduction?$

A.

Matrix addition

B.

Eigenvalues and Eigenvectors

C.

Scalar multiplication

D.

Cross product

4 $What is the formula for Bayes' Theorem ?$

A.

B.

C.

D.

5 $In Bayesian inference, what does the term Prior Probability () represent?$

A.

The probability of the evidence given the hypothesis

B.

The updated probability after observing evidence

C.

The initial probability of a hypothesis before observing new evidence

D.

The total probability of all events

6 $Two events and are statistically independent if:$

A.

B.

C.

D.

7 $In a Bayesian Network, what do the nodes represent?$

A.

Probabilistic dependencies

B.

Random variables

C.

Causal arrows

D.

Neural weights

8 $What does a directed edge from Node A to Node B in a Bayesian Network imply?$

A.

A is conditionally independent of B

B.

B causes A

C.

A has a direct influence on B

D.

A and B are mutually exclusive

9 $What is the primary goal of Supervised Learning ?$

A.

To group similar data points without labels

B.

To learn a mapping from input variables to a target variable using labeled data

C.

To maximize a reward signal through trial and error

D.

To reduce the number of features in a dataset

10 $Which of the following is a classic Classification problem?$

A.

Predicting house prices based on square footage

B.

Grouping customers by purchasing behavior

C.

Identifying whether an email is 'Spam' or 'Not Spam'

D.

Teaching a robot to walk

11 $Which of the following is a Regression task?$

A.

Predicting the temperature for tomorrow (in degrees)

B.

Recognizing handwritten digits (0-9)

C.

Diagnosing a disease (Yes/No)

D.

Segmenting an image into objects

12 $What characterizes Unsupervised Learning ?$

A.

The data consists of input-output pairs

B.

The algorithm receives a reward or penalty

C.

The data is unlabeled, and the system looks for patterns

D.

It requires a human supervisor to correct errors constantly

13 $Which algorithm is commonly used for Clustering ?$

A.

Linear Regression

B.

K-Means

C.

Logistic Regression

D.

Naive Bayes

14 $In Reinforcement Learning, what is the learner called?$

A.

The Supervisor

B.

The Critic

C.

The Agent

D.

The Cluster

15 $The Exploration vs. Exploitation trade-off is a core challenge in:$

A.

Supervised Learning

B.

Unsupervised Learning

C.

Reinforcement Learning

D.

Linear Algebra

16 $What is One-Hot Encoding used for in Feature Engineering?$

A.

Replacing missing values with the mean

B.

Converting categorical variables into a numerical format

C.

Scaling numerical variables to a 0-1 range

D.

Removing outliers from the dataset

17 $Why is Data Normalization (or Scaling) important for algorithms like K-Nearest Neighbors?$

A.

It prevents overfitting

B.

It converts text to numbers

C.

It ensures that features with larger numeric ranges do not dominate distance calculations

D.

It increases the amount of training data

18 $What is the purpose of Cross-Validation ?$

A.

To mix supervised and unsupervised learning

B.

To assess how the results of a statistical analysis will generalize to an independent data set

C.

To increase the speed of training

D.

To automatically generate new features

19 $In a confusion matrix, what does True Positive (TP) represent?$

A.

The model predicted positive, and the actual value was positive

B.

The model predicted positive, but the actual value was negative

C.

The model predicted negative, and the actual value was negative

D.

The model predicted negative, but the actual value was positive

20 $Which metric is calculated as ?$

A.

Recall

B.

Accuracy

C.

Precision

D.

F1 Score

21 $Which metric is calculated as ?$

A.

Recall

B.

Precision

C.

Specificity

D.

AUC

22 $If a dataset is heavily imbalanced (e.g., 99% benign, 1% fraud), why is Accuracy a poor metric?$

A.

It cannot be calculated for binary classification

B.

A model predicting the majority class for all inputs will still have 99% accuracy

C.

Accuracy only works for regression problems

D.

It takes too long to compute

23 $What is the F1 Score ?$

A.

The arithmetic mean of Precision and Recall

B.

The harmonic mean of Precision and Recall

C.

The difference between True Positives and False Positives

D.

The sum of Accuracy and Error Rate

24 $Which of the following is a real-world application of Unsupervised Learning ?$

A.

Face recognition unlocking a phone

B.

Credit card fraud detection using labeled history

C.

Customer segmentation for targeted marketing

D.

Self-driving car navigation

25 $What is Overfitting ?$

A.

When a model performs poorly on both training and testing data

B.

When a model learns the training data (including noise) too well and performs poorly on new data

C.

When the learning rate is too high

D.

When the model is too simple to capture the underlying structure

26 $Which statistical measure describes the spread or dispersion of a dataset?$

A.

Mean

B.

Mode

C.

Standard Deviation

D.

Median

27 $In probabilistic reasoning, what does the notation represent?$

A.

Joint probability of A and B occurring together

B.

Conditional probability of A given B

C.

Probability of A or B

D.

Probability of A minus B

28 $In a Bayesian Network, the set consisting of a node’s parents, children, and children’s parents is known as its:$

A.

Neighborhood Watch

B.

Markov Blanket

C.

Decision Boundary

D.

Hidden Layer

29 $Which technique fills in missing data values with a statistical estimate (like the mean)?$

A.

Pruning

B.

Imputation

C.

Dropout

D.

Augmentation

30 $A standard Linear Regression model assumes a relationship between dependent and independent variables is:$

A.

Exponential

B.

Linear

C.

Circular

D.

Logarithmic

31 $In Reinforcement Learning, the Policy defines:$

A.

The reward function

B.

The physics of the environment

C.

The agent's behavior or strategy for picking actions

D.

The final goal state

32 $What is Dimensionality Reduction (e.g., PCA) often used for?$

A.

To increase the complexity of the model

B.

To visualize high-dimensional data and reduce computation time

C.

To add more features to the dataset

D.

To classify images

33 $Which of the following represents a Continuous probability distribution?$

A.

Bernoulli Distribution

B.

Binomial Distribution

C.

Gaussian (Normal) Distribution

D.

Rolling a die

34 $The Law of Large Numbers states that:$

A.

As sample size increases, the sample mean gets closer to the expected value (population mean)

B.

You need large numbers to do machine learning

C.

Probability cannot be calculated for small datasets

D.

Variance increases with sample size

35 $In feature engineering, Binning involves:$

A.

Deleting data features

B.

Converting continuous variables into discrete intervals/buckets

C.

Multiplying two features together

D.

Separating training and test data

36 $What does the Learning Rate control in machine learning optimization?$

A.

The number of features used

B.

The step size at each iteration while moving toward a minimum of a loss function

C.

The split ratio of training vs testing data

D.

The accuracy of the final model

37 $Which is a common real-world use case for Reinforcement Learning ?$

A.

Predicting housing market trends

B.

Email spam filtering

C.

Game playing AI (e.g., AlphaGo)

D.

Clustering news articles

38 $If events and are Mutually Exclusive, then is:$

A.

1

B.

0.5

C.

D.

39 $What is the Median of the dataset ?$

A.

5

B.

6

C.

3

D.

9

40 $In the context of matrices, the Identity Matrix has which property?$

A.

All elements are 1

B.

All elements are 0

C.

Diagonal elements are 1, others are 0

D.

Diagonal elements are 0, others are 1

41 $Which vector operation is defined as ?$

A.

Vector Addition

B.

Dot Product

C.

Scalar Multiplication

D.

Normalization

42 $Probabilistic Reasoning handles uncertainty by:$

A.

Ignoring unknown variables

B.

Using degrees of belief (0 to 1) instead of True/False logic

C.

Assuming all events are equally likely

D.

Running infinite loops

43 $In a Receiver Operating Characteristic (ROC) curve, what is plotted on the axes?$

A.

Precision vs Recall

B.

True Positive Rate vs False Positive Rate

C.

Accuracy vs Loss

D.

Predicted vs Actual

44 $What does AUC (Area Under the Curve) indicate?$

A.

The time taken to train the model

B.

The degree of separability/ability of the classifier to distinguish between classes

C.

The number of features in the model

D.

The percentage of missing data

45 $Feature Selection (as opposed to extraction) implies:$

A.

Creating new features from old ones

B.

Selecting a subset of the most relevant features and discarding the rest

C.

Cleaning dirty data

D.

Changing the labels of the data

46 $Which of the following best describes Semi-Supervised Learning ?$

A.

Using a small amount of labeled data and a large amount of unlabeled data

B.

Using no data at all

C.

Using only labeled data

D.

Using reinforcement learning without a reward

47 $In a Bayesian Network, if no path exists between two nodes (taking into account edge directions and observed evidence), they are:$

A.

Conditionally Dependent

B.

Conditionally Independent

C.

Correlated

D.

Causally Linked

48 $What is a Decision Tree ?$

A.

A linear equation classifier

B.

A flowchart-like structure where internal nodes represent feature tests and leaf nodes represent class labels

C.

A clustering algorithm

D.

A neural network activation function

49 $Why do we split data into Training and Testing sets?$

A.

To double the amount of data

B.

To evaluate the model on data it has never seen before

C.

To make the code run faster

D.

To ensure the model overfits

50 $In the equation used in linear algebra for ML, what does represent?$

A.

The basis vector

B.

The bias term (intercept)

C.

The beta coefficient

D.

The batch size

Unit 3 - Practice Quiz

Send Feedback

Thank You!