1In the context of Machine Learning, what does the dot product of two vectors typically represent in applied linear algebra?
A.The sum of their dimensions
B.The measure of similarity or projection of one vector onto another
C.The division of individual elements
D.The probability of the vectors being independent
Correct Answer: The measure of similarity or projection of one vector onto another
Explanation:Geometrically, the dot product relates to the angle between vectors, often used to determine similarity.
Incorrect! Try again.
2Given a matrix of size and a matrix of size , what are the dimensions of the resulting matrix ?
A.
B.
C.
D.
Correct Answer:
Explanation:In matrix multiplication, the inner dimensions () must match, and the resulting matrix takes the row count of the first and column count of the second ().
Incorrect! Try again.
3Which concept in linear algebra is fundamental to Principal Component Analysis (PCA) for dimensionality reduction?
A.Matrix addition
B.Eigenvalues and Eigenvectors
C.Scalar multiplication
D.Cross product
Correct Answer: Eigenvalues and Eigenvectors
Explanation:PCA relies on finding the eigenvectors (principal components) of the covariance matrix to identify the directions of maximum variance.
Incorrect! Try again.
4What is the formula for Bayes' Theorem?
A.
B.
C.
D.
Correct Answer:
Explanation:Bayes' Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Incorrect! Try again.
5In Bayesian inference, what does the term Prior Probability () represent?
A.The probability of the evidence given the hypothesis
B.The updated probability after observing evidence
C.The initial probability of a hypothesis before observing new evidence
D.The total probability of all events
Correct Answer: The initial probability of a hypothesis before observing new evidence
Explanation:The prior is the initial belief or probability assigned to a hypothesis before new data is taken into account.
Incorrect! Try again.
6Two events and are statistically independent if:
A.
B.
C.
D.
Correct Answer:
Explanation:If events are independent, the occurrence of does not change the probability of occurring.
Incorrect! Try again.
7In a Bayesian Network, what do the nodes represent?
A.Probabilistic dependencies
B.Random variables
C.Causal arrows
D.Neural weights
Correct Answer: Random variables
Explanation:In a Bayesian Network (a directed acyclic graph), nodes represent random variables, and edges represent conditional dependencies.
Incorrect! Try again.
8What does a directed edge from Node A to Node B in a Bayesian Network imply?
A.A is conditionally independent of B
B.B causes A
C.A has a direct influence on B
D.A and B are mutually exclusive
Correct Answer: A has a direct influence on B
Explanation:An edge indicates that the variable has a direct probabilistic influence on variable .
Incorrect! Try again.
9What is the primary goal of Supervised Learning?
A.To group similar data points without labels
B.To learn a mapping from input variables to a target variable using labeled data
C.To maximize a reward signal through trial and error
D.To reduce the number of features in a dataset
Correct Answer: To learn a mapping from input variables to a target variable using labeled data
Explanation:Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs (labels).
Incorrect! Try again.
10Which of the following is a classic Classification problem?
A.Predicting house prices based on square footage
B.Grouping customers by purchasing behavior
C.Identifying whether an email is 'Spam' or 'Not Spam'
D.Teaching a robot to walk
Correct Answer: Identifying whether an email is 'Spam' or 'Not Spam'
Explanation:Classification involves predicting discrete categories (labels). Predicting prices is regression; grouping is clustering; robot walking is reinforcement learning.
Incorrect! Try again.
11Which of the following is a Regression task?
A.Predicting the temperature for tomorrow (in degrees)
B.Recognizing handwritten digits (0-9)
C.Diagnosing a disease (Yes/No)
D.Segmenting an image into objects
Correct Answer: Predicting the temperature for tomorrow (in degrees)
Explanation:Regression predicts continuous numerical values, such as temperature, price, or age.
Incorrect! Try again.
12What characterizes Unsupervised Learning?
A.The data consists of input-output pairs
B.The algorithm receives a reward or penalty
C.The data is unlabeled, and the system looks for patterns
D.It requires a human supervisor to correct errors constantly
Correct Answer: The data is unlabeled, and the system looks for patterns
Explanation:Unsupervised learning deals with data that has no historical labels, aiming to find structure (like clusters) within the data.
Incorrect! Try again.
13Which algorithm is commonly used for Clustering?
A.Linear Regression
B.K-Means
C.Logistic Regression
D.Naive Bayes
Correct Answer: K-Means
Explanation:K-Means is a popular unsupervised algorithm that partitions data into clusters based on distance to centroids.
Incorrect! Try again.
14In Reinforcement Learning, what is the learner called?
A.The Supervisor
B.The Critic
C.The Agent
D.The Cluster
Correct Answer: The Agent
Explanation:In RL, the 'Agent' takes actions in an environment to maximize cumulative reward.
Incorrect! Try again.
15The Exploration vs. Exploitation trade-off is a core challenge in:
A.Supervised Learning
B.Unsupervised Learning
C.Reinforcement Learning
D.Linear Algebra
Correct Answer: Reinforcement Learning
Explanation:The agent must decide whether to use known knowledge to gain reward (exploitation) or try new actions to discover better rewards (exploration).
Incorrect! Try again.
16What is One-Hot Encoding used for in Feature Engineering?
A.Replacing missing values with the mean
B.Converting categorical variables into a numerical format
C.Scaling numerical variables to a 0-1 range
D.Removing outliers from the dataset
Correct Answer: Converting categorical variables into a numerical format
Explanation:One-Hot Encoding creates binary columns for each category in a categorical variable, making it suitable for ML algorithms.
Incorrect! Try again.
17Why is Data Normalization (or Scaling) important for algorithms like K-Nearest Neighbors?
A.It prevents overfitting
B.It converts text to numbers
C.It ensures that features with larger numeric ranges do not dominate distance calculations
D.It increases the amount of training data
Correct Answer: It ensures that features with larger numeric ranges do not dominate distance calculations
Explanation:Distance-based algorithms are sensitive to the scale of data; a feature ranging 0-1000 would overpower one ranging 0-1 without scaling.
Incorrect! Try again.
18What is the purpose of Cross-Validation?
A.To mix supervised and unsupervised learning
B.To assess how the results of a statistical analysis will generalize to an independent data set
C.To increase the speed of training
D.To automatically generate new features
Correct Answer: To assess how the results of a statistical analysis will generalize to an independent data set
Explanation:Cross-validation (like K-Fold) helps estimate model performance on unseen data and mitigates overfitting.
Incorrect! Try again.
19In a confusion matrix, what does True Positive (TP) represent?
A.The model predicted positive, and the actual value was positive
B.The model predicted positive, but the actual value was negative
C.The model predicted negative, and the actual value was negative
D.The model predicted negative, but the actual value was positive
Correct Answer: The model predicted positive, and the actual value was positive
Explanation:TP indicates a correct prediction of the positive class.
Incorrect! Try again.
20Which metric is calculated as ?
A.Recall
B.Accuracy
C.Precision
D.F1 Score
Correct Answer: Precision
Explanation:Precision measures the accuracy of the positive predictions (of all the times we predicted positive, how many were actually positive?).
Incorrect! Try again.
21Which metric is calculated as ?
A.Recall
B.Precision
C.Specificity
D.AUC
Correct Answer: Recall
Explanation:Recall (or Sensitivity) measures the ability of the model to find all the relevant cases (of all actual positives, how many did we find?).
Incorrect! Try again.
22If a dataset is heavily imbalanced (e.g., 99% benign, 1% fraud), why is Accuracy a poor metric?
A.It cannot be calculated for binary classification
B.A model predicting the majority class for all inputs will still have 99% accuracy
C.Accuracy only works for regression problems
D.It takes too long to compute
Correct Answer: A model predicting the majority class for all inputs will still have 99% accuracy
Explanation:Accuracy is misleading in imbalanced datasets because a trivial model (predicting the majority class) achieves high accuracy but fails to detect the minority class.
Incorrect! Try again.
23What is the F1 Score?
A.The arithmetic mean of Precision and Recall
B.The harmonic mean of Precision and Recall
C.The difference between True Positives and False Positives
D.The sum of Accuracy and Error Rate
Correct Answer: The harmonic mean of Precision and Recall
Explanation:The F1 Score balances Precision and Recall, providing a single metric that is useful when you need to balance both false positives and false negatives.
Incorrect! Try again.
24Which of the following is a real-world application of Unsupervised Learning?
A.Face recognition unlocking a phone
B.Credit card fraud detection using labeled history
C.Customer segmentation for targeted marketing
D.Self-driving car navigation
Correct Answer: Customer segmentation for targeted marketing
Explanation:Segmentation involves grouping customers with similar behaviors without pre-defined labels, a classic unsupervised task.
Incorrect! Try again.
25What is Overfitting?
A.When a model performs poorly on both training and testing data
B.When a model learns the training data (including noise) too well and performs poorly on new data
C.When the learning rate is too high
D.When the model is too simple to capture the underlying structure
Correct Answer: When a model learns the training data (including noise) too well and performs poorly on new data
Explanation:Overfitting occurs when a model memorizes the training data, losing the ability to generalize to unseen data.
Incorrect! Try again.
26Which statistical measure describes the spread or dispersion of a dataset?
A.Mean
B.Mode
C.Standard Deviation
D.Median
Correct Answer: Standard Deviation
Explanation:Standard deviation quantifies the amount of variation or dispersion of a set of values from the mean.
Incorrect! Try again.
27In probabilistic reasoning, what does the notation represent?
A.Joint probability of A and B occurring together
B.Conditional probability of A given B
C.Probability of A or B
D.Probability of A minus B
Correct Answer: Joint probability of A and B occurring together
Explanation: or is the joint probability that both events happen.
Incorrect! Try again.
28In a Bayesian Network, the set consisting of a node’s parents, children, and children’s parents is known as its:
A.Neighborhood Watch
B.Markov Blanket
C.Decision Boundary
D.Hidden Layer
Correct Answer: Markov Blanket
Explanation:The Markov Blanket of a node shields it from the rest of the network; conditioned on its Markov Blanket, a node is independent of all other nodes.
Incorrect! Try again.
29Which technique fills in missing data values with a statistical estimate (like the mean)?
A.Pruning
B.Imputation
C.Dropout
D.Augmentation
Correct Answer: Imputation
Explanation:Imputation is the process of replacing missing data with substituted values.
Incorrect! Try again.
30A standard Linear Regression model assumes a relationship between dependent and independent variables is:
A.Exponential
B.Linear
C.Circular
D.Logarithmic
Correct Answer: Linear
Explanation:Linear regression models the relationship as a straight line: (in simple 1D cases).
Incorrect! Try again.
31In Reinforcement Learning, the Policy defines:
A.The reward function
B.The physics of the environment
C.The agent's behavior or strategy for picking actions
D.The final goal state
Correct Answer: The agent's behavior or strategy for picking actions
Explanation:A policy maps states to actions, effectively defining how the agent behaves.
Incorrect! Try again.
32What is Dimensionality Reduction (e.g., PCA) often used for?
A.To increase the complexity of the model
B.To visualize high-dimensional data and reduce computation time
C.To add more features to the dataset
D.To classify images
Correct Answer: To visualize high-dimensional data and reduce computation time
Explanation:Reducing dimensions helps remove redundant features (noise), speeds up training, and allows visualization (2D/3D).
Incorrect! Try again.
33Which of the following represents a Continuous probability distribution?
A.Bernoulli Distribution
B.Binomial Distribution
C.Gaussian (Normal) Distribution
D.Rolling a die
Correct Answer: Gaussian (Normal) Distribution
Explanation:Gaussian distribution deals with continuous variables (like height or weight), whereas Bernoulli and Binomial are discrete.
Incorrect! Try again.
34The Law of Large Numbers states that:
A.As sample size increases, the sample mean gets closer to the expected value (population mean)
B.You need large numbers to do machine learning
C.Probability cannot be calculated for small datasets
D.Variance increases with sample size
Correct Answer: As sample size increases, the sample mean gets closer to the expected value (population mean)
Explanation:It guarantees stable long-term results for the averages of some random events.
Incorrect! Try again.
35In feature engineering, Binning involves:
A.Deleting data features
B.Converting continuous variables into discrete intervals/buckets
C.Multiplying two features together
D.Separating training and test data
Correct Answer: Converting continuous variables into discrete intervals/buckets
Explanation:Binning transforms continuous data (e.g., age) into categorical bins (e.g., '18-25', '26-35').
Incorrect! Try again.
36What does the Learning Rate control in machine learning optimization?
A.The number of features used
B.The step size at each iteration while moving toward a minimum of a loss function
C.The split ratio of training vs testing data
D.The accuracy of the final model
Correct Answer: The step size at each iteration while moving toward a minimum of a loss function
Explanation:The learning rate determines how much the model weights are updated during training.
Incorrect! Try again.
37Which is a common real-world use case for Reinforcement Learning?
A.Predicting housing market trends
B.Email spam filtering
C.Game playing AI (e.g., AlphaGo)
D.Clustering news articles
Correct Answer: Game playing AI (e.g., AlphaGo)
Explanation:Games provide a clear environment with rewards (winning/score), making them ideal for RL agents.
Incorrect! Try again.
38If events and are Mutually Exclusive, then is:
A.1
B.0.5
C.
D.
Correct Answer:
Explanation:Mutually exclusive events cannot happen at the same time, so their joint probability is zero.
Incorrect! Try again.
39What is the Median of the dataset ?
A.5
B.6
C.3
D.9
Correct Answer: 5
Explanation:First, sort the data: . The middle value is 5.
Incorrect! Try again.
40In the context of matrices, the Identity Matrix has which property?
A.All elements are 1
B.All elements are 0
C.Diagonal elements are 1, others are 0
D.Diagonal elements are 0, others are 1
Correct Answer: Diagonal elements are 1, others are 0
Explanation:The identity matrix acts like the number 1 in matrix multiplication ().
Incorrect! Try again.
41Which vector operation is defined as ?
A.Vector Addition
B.Dot Product
C.Scalar Multiplication
D.Normalization
Correct Answer: Vector Addition
Explanation:Vector addition is performed element-wise.
Incorrect! Try again.
42Probabilistic Reasoning handles uncertainty by:
A.Ignoring unknown variables
B.Using degrees of belief (0 to 1) instead of True/False logic
C.Assuming all events are equally likely
D.Running infinite loops
Correct Answer: Using degrees of belief (0 to 1) instead of True/False logic
Explanation:It extends logic to handle uncertainty using probability theory.
Incorrect! Try again.
43In a Receiver Operating Characteristic (ROC) curve, what is plotted on the axes?
A.Precision vs Recall
B.True Positive Rate vs False Positive Rate
C.Accuracy vs Loss
D.Predicted vs Actual
Correct Answer: True Positive Rate vs False Positive Rate
Explanation:The ROC curve plots TPR (y-axis) against FPR (x-axis) at various threshold settings.
Incorrect! Try again.
44What does AUC (Area Under the Curve) indicate?
A.The time taken to train the model
B.The degree of separability/ability of the classifier to distinguish between classes
C.The number of features in the model
D.The percentage of missing data
Correct Answer: The degree of separability/ability of the classifier to distinguish between classes
Explanation:Higher AUC (close to 1) means the model is better at distinguishing between positive and negative classes.
Incorrect! Try again.
45Feature Selection (as opposed to extraction) implies:
A.Creating new features from old ones
B.Selecting a subset of the most relevant features and discarding the rest
C.Cleaning dirty data
D.Changing the labels of the data
Correct Answer: Selecting a subset of the most relevant features and discarding the rest
Explanation:Feature selection keeps the original features but removes the irrelevant or redundant ones.
Incorrect! Try again.
46Which of the following best describes Semi-Supervised Learning?
A.Using a small amount of labeled data and a large amount of unlabeled data
B.Using no data at all
C.Using only labeled data
D.Using reinforcement learning without a reward
Correct Answer: Using a small amount of labeled data and a large amount of unlabeled data
Explanation:This approach sits between supervised and unsupervised learning, often used when labeling data is expensive.
Incorrect! Try again.
47In a Bayesian Network, if no path exists between two nodes (taking into account edge directions and observed evidence), they are:
A.Conditionally Dependent
B.Conditionally Independent
C.Correlated
D.Causally Linked
Correct Answer: Conditionally Independent
Explanation:Separation in the graph (D-separation) implies conditional independence in the probability distribution.
Incorrect! Try again.
48What is a Decision Tree?
A.A linear equation classifier
B.A flowchart-like structure where internal nodes represent feature tests and leaf nodes represent class labels
C.A clustering algorithm
D.A neural network activation function
Correct Answer: A flowchart-like structure where internal nodes represent feature tests and leaf nodes represent class labels
Explanation:Decision Trees split data based on feature values to arrive at a prediction.
Incorrect! Try again.
49Why do we split data into Training and Testing sets?
A.To double the amount of data
B.To evaluate the model on data it has never seen before
C.To make the code run faster
D.To ensure the model overfits
Correct Answer: To evaluate the model on data it has never seen before
Explanation:Testing on unseen data provides an unbiased evaluation of the final model fit.
Incorrect! Try again.
50In the equation used in linear algebra for ML, what does represent?
A.The basis vector
B.The bias term (intercept)
C.The beta coefficient
D.The batch size
Correct Answer: The bias term (intercept)
Explanation:The bias allows the activation function (or line) to be shifted to the left or right, better fitting the data.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.