Unit 1 - Practice Quiz

INT394

1 According to Tom Mitchell's definition of machine learning, a computer program is said to learn from experience with respect to some class of tasks and performance measure if:

A. Its performance at tasks in , as measured by , improves with experience .
B. It can execute tasks in without any errors.
C. It can generate new tasks based on performance .
D. It minimizes the complexity of the code required for .

2 Which of the following best distinguishes Machine Learning from traditional programming?

A. Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
B. Traditional programming is faster than Machine Learning.
C. Machine Learning does not require a compiler.
D. Traditional programming deals with numbers, while Machine Learning deals with images.

3 In the context of Supervised Learning, the dataset consists of:

A. Input vectors only.
B. Input vectors and associated target labels.
C. A reward signal only.
D. Unstructured text without annotations.

4 Which of the following is a classic example of Unsupervised Learning?

A. Spam filtering.
B. House price prediction.
C. Customer segmentation (Clustering).
D. Playing Chess.

5 In Reinforcement Learning, what does the agent maximize to learn the optimal policy?

A. The immediate reward.
B. The cumulative future reward.
C. The accuracy of prediction.
D. The number of states visited.

6 Predicting a continuous output value, such as the temperature tomorrow, is known as:

A. Classification
B. Regression
C. Clustering
D. Dimensionality Reduction

7 Predicting whether an email is 'Spam' or 'Not Spam' is an example of:

A. Regression
B. Binary Classification
C. K-Means Clustering
D. Policy Search

8 Which of the following is a major challenge in Machine Learning where the model learns the training data too well, including the noise, and performs poorly on new data?

A. Underfitting
B. Overfitting
C. Regularization
D. Convergence

9 The Curse of Dimensionality refers to:

A. The difficulty of visualizing data in 2D.
B. The exponential increase in data volume required to generalize accurately as the number of features increases.
C. The computational cost of adding more rows to a dataset.
D. The bias introduced by dimension reduction techniques.

10 In the Statistical Learning Framework, we assume the data is generated by:

A. A random number generator.
B. An unknown joint probability distribution .
C. A deterministic linear function.
D. The learning algorithm itself.

11 The function that measures the penalty for predicting when the true value is is called:

A. The Activation Function
B. The Loss Function
C. The Hypothesis
D. The Regularizer

12 What is Generalization Error (or True Risk)?

A. The error on the training set.
B. The error on the test set.
C. The expected value of the loss function over the underlying data distribution.
D. The difference between the predicted value and the average value.

13 Mathematically, Empirical Risk for a hypothesis on a dataset of size is defined as:

A.
B.
C.
D.

14 Empirical Risk Minimization (ERM) is the principle of:

A. Minimizing the error on the test set.
B. Choosing the hypothesis that minimizes the loss on the training set.
C. Choosing the hypothesis with the simplest structure.
D. Minimizing the computational time.

15 Why is minimizing Empirical Risk not always sufficient to ensure good learning?

A. It is computationally too expensive.
B. It ignores the training labels.
C. It can lead to overfitting if the hypothesis space is too complex.
D. It always results in underfitting.

16 What is Inductive Bias?

A. The error introduced by noise in the data.
B. The set of assumptions a learner makes to predict outputs for unseen inputs.
C. The bias term in the equation .
D. The tendency of a model to underfit the data.

17 Which of the following describes Occam's Razor in the context of Inductive Bias?

A. Complex models are always better.
B. Among competing hypotheses that fit the data equally well, the simplest one should be selected.
C. We should always choose the hypothesis with the highest training error.
D. Data should be sliced into smaller chunks for processing.

18 Restricting the hypothesis space to include only Linear Classifiers is an example of:

A. Preference Bias
B. Restriction Bias (Language Bias)
C. Sampling Bias
D. Confirmation Bias

19 In the context of the No Free Lunch Theorem, which statement is true?

A. A single algorithm exists that is superior for all possible problems.
B. Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
C. Deep Learning is universally better than Decision Trees.
D. More data always guarantees a better model.

20 What does PAC stand for in Learning Theory?

A. Perfectly Accurate Classification
B. Probably Approximately Correct
C. Probabilistic Algorithm Complexity
D. Pattern Analysis and Computing

21 In the PAC framework, the parameter (epsilon) represents:

A. The probability of failure.
B. The accuracy parameter (maximum allowable error).
C. The number of samples.
D. The complexity of the hypothesis space.

22 In the PAC framework, the parameter (delta) represents:

A. The error rate.
B. The learning rate.
C. The confidence parameter (probability that the error is high).
D. The dimensionality of the data.

23 A concept class is PAC-learnable if there exists an algorithm that outputs a hypothesis such that with probability at least :

A.
B.
C.
D.

24 In PAC learning, Sample Complexity refers to:

A. The time complexity of the algorithm.
B. The number of training examples required to guarantee a valid hypothesis with high probability.
C. The number of features in the dataset.
D. The complexity of the loss function.

25 For a finite hypothesis space , the number of samples required for consistent PAC learning is proportional to:

A.
B.
C.
D.

26 Which of the following implies Agnostic PAC Learning?

A. The target function is assumed to be within the hypothesis space .
B. The target function may not belong to , and we seek the hypothesis with minimum risk.
C. The error must be exactly zero.
D. There is no noise in the data.

27 Which type of learning is characterized by the absence of labels but the presence of a goal to discover hidden structures?

A. Supervised Learning
B. Unsupervised Learning
C. Reinforcement Learning
D. Semisupervised Learning

28 Consider a dataset where inputs are images of animals and labels are 'Cat', 'Dog', or 'Bird'. This is a:

A. Multi-class Classification problem
B. Multi-label Classification problem
C. Regression problem
D. Clustering problem

29 Underfitting is often a result of:

A. The model being too complex.
B. Too much training data.
C. The model being too simple to capture the underlying trend.
D. Training for too many epochs.

30 Which of the following is NOT a component of the Statistical Learning Framework?

A. Input Space
B. Output Space
C. The exact formula of the Target Function
D. Loss Function

31 Which statement best describes the Bias-Variance Tradeoff?

A. Increasing model complexity increases bias and decreases variance.
B. Increasing model complexity decreases bias and increases variance.
C. Ideally, we want high bias and high variance.
D. Bias and Variance are independent of model complexity.

32 In the context of PAC learning, a hypothesis is consistent with the training data if:

A. It has the lowest generalization error.
B. It correctly classifies all training examples (Empirical error is 0).
C. It is a linear function.
D. It is chosen randomly.

33 Which of the following represents the Zero-One Loss function for binary classification ()?

A.
B.
C. if , else
D.

34 In Semi-Supervised Learning:

A. All data is labeled.
B. No data is labeled.
C. A small amount of labeled data is used with a large amount of unlabeled data.
D. The agent learns from rewards.

35 Which learning paradigm is most suitable for a robot learning to walk by trial and error?

A. Supervised Learning
B. Reinforcement Learning
C. Unsupervised Learning
D. Transductive Learning

36 What is the primary goal of the Validation Set?

A. To train the model parameters.
B. To tune hyperparameters and evaluate the model during development to prevent overfitting.
C. To report the final accuracy of the model.
D. To increase the size of the training data.

37 If a learning algorithm has high Variance, it implies:

A. The algorithm pays very little attention to the training data.
B. The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
C. The algorithm always produces the same model regardless of the data.
D. The algorithm has high systematic error.

38 The assumption that the training data and future test data are drawn from the same distribution is called:

A. The i.i.d. assumption (Independent and Identically Distributed).
B. The linearity assumption.
C. The Markov assumption.
D. The Bayesian assumption.

39 In the definition of PAC learning, the term 'Probably' refers to:

A. The error .
B. The confidence .
C. The hypothesis space size.
D. The loss function.

40 Which of the following is a potential solution to Overfitting?

A. Increasing the number of features.
B. Reducing the size of the training set.
C. Regularization (e.g., adding a penalty for complexity).
D. Making the model more complex.

41 In Linear Regression, the inductive bias typically includes the assumption that:

A. The relationship between input and output is linear.
B. The relationship is a high-degree polynomial.
C. The output is a discrete class label.
D. The data is clustered.

42 In the equation , the term represents:

A. Irreducible error.
B. Bias error.
C. Variance error.
D. Training error.

43 Which inequality is commonly used in PAC learning derivations to bound the probability of large deviations?

A. Pythagorean Theorem
B. Hoeffding's Inequality
C. Euler's Identity
D. Newton's Second Law

44 A hypothesis space is said to be infinite if:

A. It contains a finite number of hypotheses.
B. It contains continuous parameters (e.g., all possible linear separators in ).
C. It is empty.
D. It only contains decision trees of depth 3.

45 The difference between the True Risk and the Empirical Risk is often called:

A. Generalization Gap
B. Training Loss
C. Bayes Error
D. Inductive Bias

46 Which of the following datasets would be most appropriate for a Regression problem?

A. Emails labeled as spam/ham.
B. Photos labeled with names of people.
C. Historical data of house sizes and their selling prices.
D. Handwritten digits 0-9.

47 In an Unsupervised Learning setting, Dimensionality Reduction aims to:

A. Increase the number of features to capture more detail.
B. Reduce the number of random variables under consideration by obtaining a set of principal variables.
C. Cluster data into groups.
D. Label the data automatically.

48 Why do we need a Test Set that is completely separate from the Training Set?

A. To make the training faster.
B. To provide an unbiased evaluation of the final model fit.
C. To use for hyperparameter tuning.
D. To calculate the gradient.

49 In the context of Machine Learning scope, Computer Vision typically involves:

A. Analyzing text sentiment.
B. Predicting stock prices.
C. Extracting information from images and videos.
D. Optimizing database queries.

50 The 'Realizability Assumption' in PAC learning states that:

A. The learning algorithm is efficient.
B. There exists a hypothesis in the hypothesis space such that .
C. The data is noiseless.
D. The sample size is infinite.