1According to Tom Mitchell's definition of machine learning, a computer program is said to learn from experience with respect to some class of tasks and performance measure if:
A.It can execute tasks in without any errors.
B.It can generate new tasks based on performance .
C.Its performance at tasks in , as measured by , improves with experience .
D.It minimizes the complexity of the code required for .
Correct Answer: Its performance at tasks in , as measured by , improves with experience .
Explanation:
Tom Mitchell provided a formal definition: 'A computer program is said to learn from experience with respect to some class of tasks and performance measure , if its performance at tasks in , as measured by , improves with experience .'
Incorrect! Try again.
2Which of the following best distinguishes Machine Learning from traditional programming?
A.Traditional programming deals with numbers, while Machine Learning deals with images.
B.Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
C.Machine Learning does not require a compiler.
D.Traditional programming is faster than Machine Learning.
Correct Answer: Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
Explanation:
In traditional programming, you input data and rules to get answers. In machine learning (specifically supervised learning), you input data and answers (labels) to learn the rules (the model).
Incorrect! Try again.
3In the context of Supervised Learning, the dataset consists of:
A.Input vectors and associated target labels.
B.Input vectors only.
C.Unstructured text without annotations.
D.A reward signal only.
Correct Answer: Input vectors and associated target labels.
Explanation:
Supervised learning algorithms are trained on a labeled dataset, which means the data consists of input features () paired with the correct output or target labels ().
Incorrect! Try again.
4Which of the following is a classic example of Unsupervised Learning?
Clustering is an unsupervised task where the goal is to group similar data points together without pre-existing labels. Spam filtering and price prediction are supervised, and playing chess is typically reinforcement learning.
Incorrect! Try again.
5In Reinforcement Learning, what does the agent maximize to learn the optimal policy?
A.The number of states visited.
B.The cumulative future reward.
C.The accuracy of prediction.
D.The immediate reward.
Correct Answer: The cumulative future reward.
Explanation:
The goal of a reinforcement learning agent is to learn a policy that maximizes the expected cumulative reward (return) over time, not just the immediate reward.
Incorrect! Try again.
6Predicting a continuous output value, such as the temperature tomorrow, is known as:
A.Clustering
B.Classification
C.Dimensionality Reduction
D.Regression
Correct Answer: Regression
Explanation:
Regression is a type of supervised learning where the target variable is continuous. Classification deals with discrete categories.
Incorrect! Try again.
7Predicting whether an email is 'Spam' or 'Not Spam' is an example of:
A.Regression
B.Binary Classification
C.K-Means Clustering
D.Policy Search
Correct Answer: Binary Classification
Explanation:
This is a classification task because the output is a discrete category (class), and since there are only two classes, it is Binary Classification.
Incorrect! Try again.
8Which of the following is a major challenge in Machine Learning where the model learns the training data too well, including the noise, and performs poorly on new data?
A.Regularization
B.Underfitting
C.Convergence
D.Overfitting
Correct Answer: Overfitting
Explanation:
Overfitting occurs when a model models the training data too well, capturing noise and fluctuations rather than the underlying pattern, leading to poor generalization on unseen data.
Incorrect! Try again.
9The Curse of Dimensionality refers to:
A.The computational cost of adding more rows to a dataset.
B.The bias introduced by dimension reduction techniques.
C.The exponential increase in data volume required to generalize accurately as the number of features increases.
D.The difficulty of visualizing data in 2D.
Correct Answer: The exponential increase in data volume required to generalize accurately as the number of features increases.
Explanation:
As the dimensionality (number of features) of the input space increases, the volume of the space increases so fast that the available data becomes sparse, making it difficult to find statistical significance.
Incorrect! Try again.
10In the Statistical Learning Framework, we assume the data is generated by:
A.A random number generator.
B.A deterministic linear function.
C.The learning algorithm itself.
D.An unknown joint probability distribution .
Correct Answer: An unknown joint probability distribution .
Explanation:
The standard assumption is that there is an unknown fixed joint probability distribution from which the training and test data are drawn independently and identically distributed (i.i.d.).
Incorrect! Try again.
11The function that measures the penalty for predicting when the true value is is called:
A.The Hypothesis
B.The Activation Function
C.The Regularizer
D.The Loss Function
Correct Answer: The Loss Function
Explanation:
A Loss Function (or cost function) quantifies the difference between the estimated value and the true value.
Incorrect! Try again.
12What is Generalization Error (or True Risk)?
A.The error on the test set.
B.The expected value of the loss function over the underlying data distribution.
C.The error on the training set.
D.The difference between the predicted value and the average value.
Correct Answer: The expected value of the loss function over the underlying data distribution.
Explanation:
Generalization error is the expectation of the loss over the entire distribution of data, representing how well the model performs on unseen data.
Incorrect! Try again.
13Mathematically, Empirical Risk for a hypothesis on a dataset of size is defined as:
A.
B.
C.
D.
Correct Answer:
Explanation:
Empirical Risk is the average loss computed over the observed training dataset, serving as a proxy for the true risk.
Incorrect! Try again.
14Empirical Risk Minimization (ERM) is the principle of:
A.Choosing the hypothesis with the simplest structure.
B.Minimizing the error on the test set.
C.Minimizing the computational time.
D.Choosing the hypothesis that minimizes the loss on the training set.
Correct Answer: Choosing the hypothesis that minimizes the loss on the training set.
Explanation:
ERM is a learning strategy that selects the hypothesis from the hypothesis space that minimizes the empirical risk (training error).
Incorrect! Try again.
15Why is minimizing Empirical Risk not always sufficient to ensure good learning?
A.It is computationally too expensive.
B.It ignores the training labels.
C.It always results in underfitting.
D.It can lead to overfitting if the hypothesis space is too complex.
Correct Answer: It can lead to overfitting if the hypothesis space is too complex.
Explanation:
If the model is complex enough, it can memorize the training data (zero empirical risk) but fail to generalize to new data (high true risk). This is overfitting.
Incorrect! Try again.
16What is Inductive Bias?
A.The tendency of a model to underfit the data.
B.The bias term in the equation .
C.The error introduced by noise in the data.
D.The set of assumptions a learner makes to predict outputs for unseen inputs.
Correct Answer: The set of assumptions a learner makes to predict outputs for unseen inputs.
Explanation:
Inductive bias refers to the set of assumptions (like choosing a linear model or preferring simpler trees) that allows a learning algorithm to generalize beyond the training data.
Incorrect! Try again.
17Which of the following describes Occam's Razor in the context of Inductive Bias?
A.Complex models are always better.
B.We should always choose the hypothesis with the highest training error.
C.Data should be sliced into smaller chunks for processing.
D.Among competing hypotheses that fit the data equally well, the simplest one should be selected.
Correct Answer: Among competing hypotheses that fit the data equally well, the simplest one should be selected.
Explanation:
Occam's Razor is a common inductive bias preferring simplicity. It suggests that unnecessary complexity should be avoided.
Incorrect! Try again.
18Restricting the hypothesis space to include only Linear Classifiers is an example of:
A.Sampling Bias
B.Restriction Bias (Language Bias)
C.Preference Bias
D.Confirmation Bias
Correct Answer: Restriction Bias (Language Bias)
Explanation:
Restriction bias (or language bias) strictly limits the set of hypotheses the learner can consider (e.g., only linear functions), as opposed to preference bias which prefers certain hypotheses within the set.
Incorrect! Try again.
19In the context of the No Free Lunch Theorem, which statement is true?
A.Deep Learning is universally better than Decision Trees.
B.Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
C.A single algorithm exists that is superior for all possible problems.
D.More data always guarantees a better model.
Correct Answer: Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
Explanation:
The No Free Lunch Theorem states that there is no 'universal' best learning algorithm; if an algorithm performs well on a certain class of problems, it pays for that by performing poorly on the remaining problems.
Incorrect! Try again.
20What does PAC stand for in Learning Theory?
A.Perfectly Accurate Classification
B.Probably Approximately Correct
C.Probabilistic Algorithm Complexity
D.Pattern Analysis and Computing
Correct Answer: Probably Approximately Correct
Explanation:
PAC stands for Probably Approximately Correct, a framework for mathematical analysis of machine learning.
Incorrect! Try again.
21In the PAC framework, the parameter (epsilon) represents:
Correct Answer: The accuracy parameter (maximum allowable error).
Explanation:
represents the error bound. We want the true error of our hypothesis to be at most .
Incorrect! Try again.
22In the PAC framework, the parameter (delta) represents:
A.The error rate.
B.The dimensionality of the data.
C.The confidence parameter (probability that the error is high).
D.The learning rate.
Correct Answer: The confidence parameter (probability that the error is high).
Explanation:
is the probability that the algorithm produces a hypothesis with error greater than . Thus, is the confidence.
Incorrect! Try again.
23A concept class is PAC-learnable if there exists an algorithm that outputs a hypothesis such that with probability at least :
A.
B.
C.
D.
Correct Answer:
Explanation:
The definition of PAC learning requires that the learner produces a hypothesis with error at most with high probability ().
Incorrect! Try again.
24In PAC learning, Sample Complexity refers to:
A.The complexity of the loss function.
B.The time complexity of the algorithm.
C.The number of training examples required to guarantee a valid hypothesis with high probability.
D.The number of features in the dataset.
Correct Answer: The number of training examples required to guarantee a valid hypothesis with high probability.
Explanation:
Sample complexity is the minimum number of examples needed to satisfy the PAC conditions for given and .
Incorrect! Try again.
25For a finite hypothesis space , the number of samples required for consistent PAC learning is proportional to:
A.
B.
C.
D.
Correct Answer:
Explanation:
The sample complexity bound for a finite hypothesis space is generally proportional to the natural logarithm of the size of the hypothesis space, specifically .
Incorrect! Try again.
26Which of the following implies Agnostic PAC Learning?
A.The error must be exactly zero.
B.The target function is assumed to be within the hypothesis space .
C.The target function may not belong to , and we seek the hypothesis with minimum risk.
D.There is no noise in the data.
Correct Answer: The target function may not belong to , and we seek the hypothesis with minimum risk.
Explanation:
Agnostic learning relaxes the realizability assumption. We don't assume the true target function is in our set of hypotheses; we just want to find the best approximation within our set.
Incorrect! Try again.
27Which type of learning is characterized by the absence of labels but the presence of a goal to discover hidden structures?
A.Supervised Learning
B.Reinforcement Learning
C.Semisupervised Learning
D.Unsupervised Learning
Correct Answer: Unsupervised Learning
Explanation:
Unsupervised learning deals with unlabeled data and looks for patterns, structures, or groupings.
Incorrect! Try again.
28Consider a dataset where inputs are images of animals and labels are 'Cat', 'Dog', or 'Bird'. This is a:
A.Clustering problem
B.Multi-label Classification problem
C.Regression problem
D.Multi-class Classification problem
Correct Answer: Multi-class Classification problem
Explanation:
It is classification (discrete labels) and there are more than two mutually exclusive classes, making it Multi-class.
Incorrect! Try again.
29Underfitting is often a result of:
A.Too much training data.
B.The model being too complex.
C.Training for too many epochs.
D.The model being too simple to capture the underlying trend.
Correct Answer: The model being too simple to capture the underlying trend.
Explanation:
Underfitting happens when the model has high bias and is not complex enough to represent the underlying structure of the data (e.g., fitting a line to a curve).
Incorrect! Try again.
30Which of the following is NOT a component of the Statistical Learning Framework?
A.Input Space
B.Loss Function
C.Output Space
D.The exact formula of the Target Function
Correct Answer: The exact formula of the Target Function
Explanation:
In the framework, the Target Function (or the distribution producing the labels) is unknown. We try to approximate it, but we do not start with it.
Incorrect! Try again.
31Which statement best describes the Bias-Variance Tradeoff?
A.Bias and Variance are independent of model complexity.
B.Ideally, we want high bias and high variance.
C.Increasing model complexity decreases bias and increases variance.
D.Increasing model complexity increases bias and decreases variance.
Correct Answer: Increasing model complexity decreases bias and increases variance.
Explanation:
Simple models have high bias (rigid assumptions) and low variance. Complex models have low bias (can fit training data well) but high variance (sensitive to noise in training data).
Incorrect! Try again.
32In the context of PAC learning, a hypothesis is consistent with the training data if:
A.It correctly classifies all training examples (Empirical error is 0).
B.It is a linear function.
C.It has the lowest generalization error.
D.It is chosen randomly.
Correct Answer: It correctly classifies all training examples (Empirical error is 0).
Explanation:
Consistency in learning theory implies that the hypothesis makes no errors on the provided training set.
Incorrect! Try again.
33Which of the following represents the Zero-One Loss function for binary classification ()?
A.
B. if , else
C.
D.
Correct Answer: if , else
Explanation:
Zero-One loss simply counts an error (1) if the prediction is wrong and (0) if it is correct.
Incorrect! Try again.
34In Semi-Supervised Learning:
A.All data is labeled.
B.The agent learns from rewards.
C.No data is labeled.
D.A small amount of labeled data is used with a large amount of unlabeled data.
Correct Answer: A small amount of labeled data is used with a large amount of unlabeled data.
Explanation:
Semi-supervised learning bridges supervised and unsupervised learning by using a mix of labeled and unlabeled data, typically to reduce the cost of labeling.
Incorrect! Try again.
35Which learning paradigm is most suitable for a robot learning to walk by trial and error?
A.Supervised Learning
B.Unsupervised Learning
C.Transductive Learning
D.Reinforcement Learning
Correct Answer: Reinforcement Learning
Explanation:
The robot acts in an environment, receives feedback (falling or moving forward), and adjusts its policy. This is classic RL.
Incorrect! Try again.
36What is the primary goal of the Validation Set?
A.To tune hyperparameters and evaluate the model during development to prevent overfitting.
B.To train the model parameters.
C.To report the final accuracy of the model.
D.To increase the size of the training data.
Correct Answer: To tune hyperparameters and evaluate the model during development to prevent overfitting.
Explanation:
The validation set is used for model selection and hyperparameter tuning, acting as a check against overfitting before final testing.
Incorrect! Try again.
37If a learning algorithm has high Variance, it implies:
A.The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
B.The algorithm always produces the same model regardless of the data.
C.The algorithm pays very little attention to the training data.
D.The algorithm has high systematic error.
Correct Answer: The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
Explanation:
High variance means the model captures random noise in the training data; changing the training set slightly results in a very different model.
Incorrect! Try again.
38The assumption that the training data and future test data are drawn from the same distribution is called:
A.The Bayesian assumption.
B.The i.i.d. assumption (Independent and Identically Distributed).
C.The Markov assumption.
D.The linearity assumption.
Correct Answer: The i.i.d. assumption (Independent and Identically Distributed).
Explanation:
Most statistical learning theory relies on the data being independent and identically distributed (i.i.d.) from a fixed distribution.
Incorrect! Try again.
39In the definition of PAC learning, the term 'Probably' refers to:
A.The confidence .
B.The error .
C.The hypothesis space size.
D.The loss function.
Correct Answer: The confidence .
Explanation:
In PAC, 'Probably' refers to the requirement that the algorithm succeeds with high probability (). 'Approximately' refers to the error bound .
Incorrect! Try again.
40Which of the following is a potential solution to Overfitting?
A.Regularization (e.g., adding a penalty for complexity).
B.Reducing the size of the training set.
C.Making the model more complex.
D.Increasing the number of features.
Correct Answer: Regularization (e.g., adding a penalty for complexity).
Explanation:
Regularization discourages complex models (e.g., large weights), helping to prevent overfitting.
Incorrect! Try again.
41In Linear Regression, the inductive bias typically includes the assumption that:
A.The relationship between input and output is linear.
B.The data is clustered.
C.The relationship is a high-degree polynomial.
D.The output is a discrete class label.
Correct Answer: The relationship between input and output is linear.
Explanation:
Linear regression assumes the target can be approximated by a weighted sum of the input features.
Incorrect! Try again.
42In the equation , the term represents:
A.Variance error.
B.Irreducible error.
C.Bias error.
D.Training error.
Correct Answer: Irreducible error.
Explanation:
This represents noise in the system that cannot be eliminated by any model, usually due to unobserved variables.
Incorrect! Try again.
43Which inequality is commonly used in PAC learning derivations to bound the probability of large deviations?
A.Euler's Identity
B.Hoeffding's Inequality
C.Newton's Second Law
D.Pythagorean Theorem
Correct Answer: Hoeffding's Inequality
Explanation:
Hoeffding's Inequality provides an upper bound on the probability that the sum of bounded random variables deviates from its expected value, crucial for bounding empirical risk against true risk.
Incorrect! Try again.
44A hypothesis space is said to be infinite if:
A.It contains continuous parameters (e.g., all possible linear separators in ).
B.It contains a finite number of hypotheses.
C.It is empty.
D.It only contains decision trees of depth 3.
Correct Answer: It contains continuous parameters (e.g., all possible linear separators in ).
Explanation:
If parameters (weights) are real numbers, there are infinitely many possible combinations, making the hypothesis space infinite.
Incorrect! Try again.
45The difference between the True Risk and the Empirical Risk is often called:
A.Generalization Gap
B.Training Loss
C.Inductive Bias
D.Bayes Error
Correct Answer: Generalization Gap
Explanation:
The generalization gap is the difference between the error on the training set (empirical) and the error on the unseen data (true risk).
Incorrect! Try again.
46Which of the following datasets would be most appropriate for a Regression problem?
A.Emails labeled as spam/ham.
B.Handwritten digits 0-9.
C.Historical data of house sizes and their selling prices.
D.Photos labeled with names of people.
Correct Answer: Historical data of house sizes and their selling prices.
Explanation:
Selling prices are continuous numerical values, making this a regression problem.
Incorrect! Try again.
47In an Unsupervised Learning setting, Dimensionality Reduction aims to:
A.Cluster data into groups.
B.Increase the number of features to capture more detail.
C.Label the data automatically.
D.Reduce the number of random variables under consideration by obtaining a set of principal variables.
Correct Answer: Reduce the number of random variables under consideration by obtaining a set of principal variables.
Explanation:
Dimensionality reduction (e.g., PCA) projects data into a lower-dimensional space while preserving important information.
Incorrect! Try again.
48Why do we need a Test Set that is completely separate from the Training Set?
A.To calculate the gradient.
B.To make the training faster.
C.To provide an unbiased evaluation of the final model fit.
D.To use for hyperparameter tuning.
Correct Answer: To provide an unbiased evaluation of the final model fit.
Explanation:
If we test on training data, we cannot detect overfitting. A separate test set simulates how the model performs on real-world, unseen data.
Incorrect! Try again.
49In the context of Machine Learning scope, Computer Vision typically involves:
A.Optimizing database queries.
B.Analyzing text sentiment.
C.Predicting stock prices.
D.Extracting information from images and videos.
Correct Answer: Extracting information from images and videos.
Explanation:
Computer Vision is the field of ML dealing with visual inputs like images and videos.
Incorrect! Try again.
50The 'Realizability Assumption' in PAC learning states that:
A.The sample size is infinite.
B.The data is noiseless.
C.There exists a hypothesis in the hypothesis space such that .
D.The learning algorithm is efficient.
Correct Answer: There exists a hypothesis in the hypothesis space such that .
Explanation:
Realizability assumes that the true concept can be perfectly represented by some hypothesis within the class we are searching.