1According to Tom Mitchell's definition of machine learning, a computer program is said to learn from experience with respect to some class of tasks and performance measure if:
A.Its performance at tasks in , as measured by , improves with experience .
B.It can execute tasks in without any errors.
C.It can generate new tasks based on performance .
D.It minimizes the complexity of the code required for .
Correct Answer: Its performance at tasks in , as measured by , improves with experience .
Explanation:Tom Mitchell provided a formal definition: 'A computer program is said to learn from experience with respect to some class of tasks and performance measure , if its performance at tasks in , as measured by , improves with experience .'
Incorrect! Try again.
2Which of the following best distinguishes Machine Learning from traditional programming?
A.Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
B.Traditional programming is faster than Machine Learning.
C.Machine Learning does not require a compiler.
D.Traditional programming deals with numbers, while Machine Learning deals with images.
Correct Answer: Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
Explanation:In traditional programming, you input data and rules to get answers. In machine learning (specifically supervised learning), you input data and answers (labels) to learn the rules (the model).
Incorrect! Try again.
3In the context of Supervised Learning, the dataset consists of:
A.Input vectors only.
B.Input vectors and associated target labels.
C.A reward signal only.
D.Unstructured text without annotations.
Correct Answer: Input vectors and associated target labels.
Explanation:Supervised learning algorithms are trained on a labeled dataset, which means the data consists of input features () paired with the correct output or target labels ().
Incorrect! Try again.
4Which of the following is a classic example of Unsupervised Learning?
Explanation:Clustering is an unsupervised task where the goal is to group similar data points together without pre-existing labels. Spam filtering and price prediction are supervised, and playing chess is typically reinforcement learning.
Incorrect! Try again.
5In Reinforcement Learning, what does the agent maximize to learn the optimal policy?
A.The immediate reward.
B.The cumulative future reward.
C.The accuracy of prediction.
D.The number of states visited.
Correct Answer: The cumulative future reward.
Explanation:The goal of a reinforcement learning agent is to learn a policy that maximizes the expected cumulative reward (return) over time, not just the immediate reward.
Incorrect! Try again.
6Predicting a continuous output value, such as the temperature tomorrow, is known as:
A.Classification
B.Regression
C.Clustering
D.Dimensionality Reduction
Correct Answer: Regression
Explanation:Regression is a type of supervised learning where the target variable is continuous. Classification deals with discrete categories.
Incorrect! Try again.
7Predicting whether an email is 'Spam' or 'Not Spam' is an example of:
A.Regression
B.Binary Classification
C.K-Means Clustering
D.Policy Search
Correct Answer: Binary Classification
Explanation:This is a classification task because the output is a discrete category (class), and since there are only two classes, it is Binary Classification.
Incorrect! Try again.
8Which of the following is a major challenge in Machine Learning where the model learns the training data too well, including the noise, and performs poorly on new data?
A.Underfitting
B.Overfitting
C.Regularization
D.Convergence
Correct Answer: Overfitting
Explanation:Overfitting occurs when a model models the training data too well, capturing noise and fluctuations rather than the underlying pattern, leading to poor generalization on unseen data.
Incorrect! Try again.
9The Curse of Dimensionality refers to:
A.The difficulty of visualizing data in 2D.
B.The exponential increase in data volume required to generalize accurately as the number of features increases.
C.The computational cost of adding more rows to a dataset.
D.The bias introduced by dimension reduction techniques.
Correct Answer: The exponential increase in data volume required to generalize accurately as the number of features increases.
Explanation:As the dimensionality (number of features) of the input space increases, the volume of the space increases so fast that the available data becomes sparse, making it difficult to find statistical significance.
Incorrect! Try again.
10In the Statistical Learning Framework, we assume the data is generated by:
A.A random number generator.
B.An unknown joint probability distribution .
C.A deterministic linear function.
D.The learning algorithm itself.
Correct Answer: An unknown joint probability distribution .
Explanation:The standard assumption is that there is an unknown fixed joint probability distribution from which the training and test data are drawn independently and identically distributed (i.i.d.).
Incorrect! Try again.
11The function that measures the penalty for predicting when the true value is is called:
A.The Activation Function
B.The Loss Function
C.The Hypothesis
D.The Regularizer
Correct Answer: The Loss Function
Explanation:A Loss Function (or cost function) quantifies the difference between the estimated value and the true value.
Incorrect! Try again.
12What is Generalization Error (or True Risk)?
A.The error on the training set.
B.The error on the test set.
C.The expected value of the loss function over the underlying data distribution.
D.The difference between the predicted value and the average value.
Correct Answer: The expected value of the loss function over the underlying data distribution.
Explanation:Generalization error is the expectation of the loss over the entire distribution of data, representing how well the model performs on unseen data.
Incorrect! Try again.
13Mathematically, Empirical Risk for a hypothesis on a dataset of size is defined as:
A.
B.
C.
D.
Correct Answer:
Explanation:Empirical Risk is the average loss computed over the observed training dataset, serving as a proxy for the true risk.
Incorrect! Try again.
14Empirical Risk Minimization (ERM) is the principle of:
A.Minimizing the error on the test set.
B.Choosing the hypothesis that minimizes the loss on the training set.
C.Choosing the hypothesis with the simplest structure.
D.Minimizing the computational time.
Correct Answer: Choosing the hypothesis that minimizes the loss on the training set.
Explanation:ERM is a learning strategy that selects the hypothesis from the hypothesis space that minimizes the empirical risk (training error).
Incorrect! Try again.
15Why is minimizing Empirical Risk not always sufficient to ensure good learning?
A.It is computationally too expensive.
B.It ignores the training labels.
C.It can lead to overfitting if the hypothesis space is too complex.
D.It always results in underfitting.
Correct Answer: It can lead to overfitting if the hypothesis space is too complex.
Explanation:If the model is complex enough, it can memorize the training data (zero empirical risk) but fail to generalize to new data (high true risk). This is overfitting.
Incorrect! Try again.
16What is Inductive Bias?
A.The error introduced by noise in the data.
B.The set of assumptions a learner makes to predict outputs for unseen inputs.
C.The bias term in the equation .
D.The tendency of a model to underfit the data.
Correct Answer: The set of assumptions a learner makes to predict outputs for unseen inputs.
Explanation:Inductive bias refers to the set of assumptions (like choosing a linear model or preferring simpler trees) that allows a learning algorithm to generalize beyond the training data.
Incorrect! Try again.
17Which of the following describes Occam's Razor in the context of Inductive Bias?
A.Complex models are always better.
B.Among competing hypotheses that fit the data equally well, the simplest one should be selected.
C.We should always choose the hypothesis with the highest training error.
D.Data should be sliced into smaller chunks for processing.
Correct Answer: Among competing hypotheses that fit the data equally well, the simplest one should be selected.
Explanation:Occam's Razor is a common inductive bias preferring simplicity. It suggests that unnecessary complexity should be avoided.
Incorrect! Try again.
18Restricting the hypothesis space to include only Linear Classifiers is an example of:
A.Preference Bias
B.Restriction Bias (Language Bias)
C.Sampling Bias
D.Confirmation Bias
Correct Answer: Restriction Bias (Language Bias)
Explanation:Restriction bias (or language bias) strictly limits the set of hypotheses the learner can consider (e.g., only linear functions), as opposed to preference bias which prefers certain hypotheses within the set.
Incorrect! Try again.
19In the context of the No Free Lunch Theorem, which statement is true?
A.A single algorithm exists that is superior for all possible problems.
B.Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
C.Deep Learning is universally better than Decision Trees.
D.More data always guarantees a better model.
Correct Answer: Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
Explanation:The No Free Lunch Theorem states that there is no 'universal' best learning algorithm; if an algorithm performs well on a certain class of problems, it pays for that by performing poorly on the remaining problems.
Incorrect! Try again.
20What does PAC stand for in Learning Theory?
A.Perfectly Accurate Classification
B.Probably Approximately Correct
C.Probabilistic Algorithm Complexity
D.Pattern Analysis and Computing
Correct Answer: Probably Approximately Correct
Explanation:PAC stands for Probably Approximately Correct, a framework for mathematical analysis of machine learning.
Incorrect! Try again.
21In the PAC framework, the parameter (epsilon) represents:
Correct Answer: The accuracy parameter (maximum allowable error).
Explanation: represents the error bound. We want the true error of our hypothesis to be at most .
Incorrect! Try again.
22In the PAC framework, the parameter (delta) represents:
A.The error rate.
B.The learning rate.
C.The confidence parameter (probability that the error is high).
D.The dimensionality of the data.
Correct Answer: The confidence parameter (probability that the error is high).
Explanation: is the probability that the algorithm produces a hypothesis with error greater than . Thus, is the confidence.
Incorrect! Try again.
23A concept class is PAC-learnable if there exists an algorithm that outputs a hypothesis such that with probability at least :
A.
B.
C.
D.
Correct Answer:
Explanation:The definition of PAC learning requires that the learner produces a hypothesis with error at most with high probability ().
Incorrect! Try again.
24In PAC learning, Sample Complexity refers to:
A.The time complexity of the algorithm.
B.The number of training examples required to guarantee a valid hypothesis with high probability.
C.The number of features in the dataset.
D.The complexity of the loss function.
Correct Answer: The number of training examples required to guarantee a valid hypothesis with high probability.
Explanation:Sample complexity is the minimum number of examples needed to satisfy the PAC conditions for given and .
Incorrect! Try again.
25For a finite hypothesis space , the number of samples required for consistent PAC learning is proportional to:
A.
B.
C.
D.
Correct Answer:
Explanation:The sample complexity bound for a finite hypothesis space is generally proportional to the natural logarithm of the size of the hypothesis space, specifically .
Incorrect! Try again.
26Which of the following implies Agnostic PAC Learning?
A.The target function is assumed to be within the hypothesis space .
B.The target function may not belong to , and we seek the hypothesis with minimum risk.
C.The error must be exactly zero.
D.There is no noise in the data.
Correct Answer: The target function may not belong to , and we seek the hypothesis with minimum risk.
Explanation:Agnostic learning relaxes the realizability assumption. We don't assume the true target function is in our set of hypotheses; we just want to find the best approximation within our set.
Incorrect! Try again.
27Which type of learning is characterized by the absence of labels but the presence of a goal to discover hidden structures?
A.Supervised Learning
B.Unsupervised Learning
C.Reinforcement Learning
D.Semisupervised Learning
Correct Answer: Unsupervised Learning
Explanation:Unsupervised learning deals with unlabeled data and looks for patterns, structures, or groupings.
Incorrect! Try again.
28Consider a dataset where inputs are images of animals and labels are 'Cat', 'Dog', or 'Bird'. This is a:
A.Multi-class Classification problem
B.Multi-label Classification problem
C.Regression problem
D.Clustering problem
Correct Answer: Multi-class Classification problem
Explanation:It is classification (discrete labels) and there are more than two mutually exclusive classes, making it Multi-class.
Incorrect! Try again.
29Underfitting is often a result of:
A.The model being too complex.
B.Too much training data.
C.The model being too simple to capture the underlying trend.
D.Training for too many epochs.
Correct Answer: The model being too simple to capture the underlying trend.
Explanation:Underfitting happens when the model has high bias and is not complex enough to represent the underlying structure of the data (e.g., fitting a line to a curve).
Incorrect! Try again.
30Which of the following is NOT a component of the Statistical Learning Framework?
A.Input Space
B.Output Space
C.The exact formula of the Target Function
D.Loss Function
Correct Answer: The exact formula of the Target Function
Explanation:In the framework, the Target Function (or the distribution producing the labels) is unknown. We try to approximate it, but we do not start with it.
Incorrect! Try again.
31Which statement best describes the Bias-Variance Tradeoff?
A.Increasing model complexity increases bias and decreases variance.
B.Increasing model complexity decreases bias and increases variance.
C.Ideally, we want high bias and high variance.
D.Bias and Variance are independent of model complexity.
Correct Answer: Increasing model complexity decreases bias and increases variance.
Explanation:Simple models have high bias (rigid assumptions) and low variance. Complex models have low bias (can fit training data well) but high variance (sensitive to noise in training data).
Incorrect! Try again.
32In the context of PAC learning, a hypothesis is consistent with the training data if:
A.It has the lowest generalization error.
B.It correctly classifies all training examples (Empirical error is 0).
C.It is a linear function.
D.It is chosen randomly.
Correct Answer: It correctly classifies all training examples (Empirical error is 0).
Explanation:Consistency in learning theory implies that the hypothesis makes no errors on the provided training set.
Incorrect! Try again.
33Which of the following represents the Zero-One Loss function for binary classification ()?
A.
B.
C. if , else
D.
Correct Answer: if , else
Explanation:Zero-One loss simply counts an error (1) if the prediction is wrong and (0) if it is correct.
Incorrect! Try again.
34In Semi-Supervised Learning:
A.All data is labeled.
B.No data is labeled.
C.A small amount of labeled data is used with a large amount of unlabeled data.
D.The agent learns from rewards.
Correct Answer: A small amount of labeled data is used with a large amount of unlabeled data.
Explanation:Semi-supervised learning bridges supervised and unsupervised learning by using a mix of labeled and unlabeled data, typically to reduce the cost of labeling.
Incorrect! Try again.
35Which learning paradigm is most suitable for a robot learning to walk by trial and error?
A.Supervised Learning
B.Reinforcement Learning
C.Unsupervised Learning
D.Transductive Learning
Correct Answer: Reinforcement Learning
Explanation:The robot acts in an environment, receives feedback (falling or moving forward), and adjusts its policy. This is classic RL.
Incorrect! Try again.
36What is the primary goal of the Validation Set?
A.To train the model parameters.
B.To tune hyperparameters and evaluate the model during development to prevent overfitting.
C.To report the final accuracy of the model.
D.To increase the size of the training data.
Correct Answer: To tune hyperparameters and evaluate the model during development to prevent overfitting.
Explanation:The validation set is used for model selection and hyperparameter tuning, acting as a check against overfitting before final testing.
Incorrect! Try again.
37If a learning algorithm has high Variance, it implies:
A.The algorithm pays very little attention to the training data.
B.The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
C.The algorithm always produces the same model regardless of the data.
D.The algorithm has high systematic error.
Correct Answer: The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
Explanation:High variance means the model captures random noise in the training data; changing the training set slightly results in a very different model.
Incorrect! Try again.
38The assumption that the training data and future test data are drawn from the same distribution is called:
A.The i.i.d. assumption (Independent and Identically Distributed).
B.The linearity assumption.
C.The Markov assumption.
D.The Bayesian assumption.
Correct Answer: The i.i.d. assumption (Independent and Identically Distributed).
Explanation:Most statistical learning theory relies on the data being independent and identically distributed (i.i.d.) from a fixed distribution.
Incorrect! Try again.
39In the definition of PAC learning, the term 'Probably' refers to:
A.The error .
B.The confidence .
C.The hypothesis space size.
D.The loss function.
Correct Answer: The confidence .
Explanation:In PAC, 'Probably' refers to the requirement that the algorithm succeeds with high probability (). 'Approximately' refers to the error bound .
Incorrect! Try again.
40Which of the following is a potential solution to Overfitting?
A.Increasing the number of features.
B.Reducing the size of the training set.
C.Regularization (e.g., adding a penalty for complexity).
D.Making the model more complex.
Correct Answer: Regularization (e.g., adding a penalty for complexity).
Explanation:Regularization discourages complex models (e.g., large weights), helping to prevent overfitting.
Incorrect! Try again.
41In Linear Regression, the inductive bias typically includes the assumption that:
A.The relationship between input and output is linear.
B.The relationship is a high-degree polynomial.
C.The output is a discrete class label.
D.The data is clustered.
Correct Answer: The relationship between input and output is linear.
Explanation:Linear regression assumes the target can be approximated by a weighted sum of the input features.
Incorrect! Try again.
42In the equation , the term represents:
A.Irreducible error.
B.Bias error.
C.Variance error.
D.Training error.
Correct Answer: Irreducible error.
Explanation:This represents noise in the system that cannot be eliminated by any model, usually due to unobserved variables.
Incorrect! Try again.
43Which inequality is commonly used in PAC learning derivations to bound the probability of large deviations?
A.Pythagorean Theorem
B.Hoeffding's Inequality
C.Euler's Identity
D.Newton's Second Law
Correct Answer: Hoeffding's Inequality
Explanation:Hoeffding's Inequality provides an upper bound on the probability that the sum of bounded random variables deviates from its expected value, crucial for bounding empirical risk against true risk.
Incorrect! Try again.
44A hypothesis space is said to be infinite if:
A.It contains a finite number of hypotheses.
B.It contains continuous parameters (e.g., all possible linear separators in ).
C.It is empty.
D.It only contains decision trees of depth 3.
Correct Answer: It contains continuous parameters (e.g., all possible linear separators in ).
Explanation:If parameters (weights) are real numbers, there are infinitely many possible combinations, making the hypothesis space infinite.
Incorrect! Try again.
45The difference between the True Risk and the Empirical Risk is often called:
A.Generalization Gap
B.Training Loss
C.Bayes Error
D.Inductive Bias
Correct Answer: Generalization Gap
Explanation:The generalization gap is the difference between the error on the training set (empirical) and the error on the unseen data (true risk).
Incorrect! Try again.
46Which of the following datasets would be most appropriate for a Regression problem?
A.Emails labeled as spam/ham.
B.Photos labeled with names of people.
C.Historical data of house sizes and their selling prices.
D.Handwritten digits 0-9.
Correct Answer: Historical data of house sizes and their selling prices.
Explanation:Selling prices are continuous numerical values, making this a regression problem.
Incorrect! Try again.
47In an Unsupervised Learning setting, Dimensionality Reduction aims to:
A.Increase the number of features to capture more detail.
B.Reduce the number of random variables under consideration by obtaining a set of principal variables.
C.Cluster data into groups.
D.Label the data automatically.
Correct Answer: Reduce the number of random variables under consideration by obtaining a set of principal variables.
Explanation:Dimensionality reduction (e.g., PCA) projects data into a lower-dimensional space while preserving important information.
Incorrect! Try again.
48Why do we need a Test Set that is completely separate from the Training Set?
A.To make the training faster.
B.To provide an unbiased evaluation of the final model fit.
C.To use for hyperparameter tuning.
D.To calculate the gradient.
Correct Answer: To provide an unbiased evaluation of the final model fit.
Explanation:If we test on training data, we cannot detect overfitting. A separate test set simulates how the model performs on real-world, unseen data.
Incorrect! Try again.
49In the context of Machine Learning scope, Computer Vision typically involves:
A.Analyzing text sentiment.
B.Predicting stock prices.
C.Extracting information from images and videos.
D.Optimizing database queries.
Correct Answer: Extracting information from images and videos.
Explanation:Computer Vision is the field of ML dealing with visual inputs like images and videos.
Incorrect! Try again.
50The 'Realizability Assumption' in PAC learning states that:
A.The learning algorithm is efficient.
B.There exists a hypothesis in the hypothesis space such that .
C.The data is noiseless.
D.The sample size is infinite.
Correct Answer: There exists a hypothesis in the hypothesis space such that .
Explanation:Realizability assumes that the true concept can be perfectly represented by some hypothesis within the class we are searching.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.