Unit 1 - Practice Quiz

INT394 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 According to Tom Mitchell's definition of machine learning, a computer program is said to learn from experience with respect to some class of tasks and performance measure if:

A. It can execute tasks in without any errors.
B. It can generate new tasks based on performance .
C. Its performance at tasks in , as measured by , improves with experience .
D. It minimizes the complexity of the code required for .

2 Which of the following best distinguishes Machine Learning from traditional programming?

A. Traditional programming deals with numbers, while Machine Learning deals with images.
B. Traditional programming uses data and rules to produce answers; Machine Learning uses data and answers to produce rules.
C. Machine Learning does not require a compiler.
D. Traditional programming is faster than Machine Learning.

3 In the context of Supervised Learning, the dataset consists of:

A. Input vectors and associated target labels.
B. Input vectors only.
C. Unstructured text without annotations.
D. A reward signal only.

4 Which of the following is a classic example of Unsupervised Learning?

A. Customer segmentation (Clustering).
B. House price prediction.
C. Playing Chess.
D. Spam filtering.

5 In Reinforcement Learning, what does the agent maximize to learn the optimal policy?

A. The number of states visited.
B. The cumulative future reward.
C. The accuracy of prediction.
D. The immediate reward.

6 Predicting a continuous output value, such as the temperature tomorrow, is known as:

A. Clustering
B. Classification
C. Dimensionality Reduction
D. Regression

7 Predicting whether an email is 'Spam' or 'Not Spam' is an example of:

A. Regression
B. Binary Classification
C. K-Means Clustering
D. Policy Search

8 Which of the following is a major challenge in Machine Learning where the model learns the training data too well, including the noise, and performs poorly on new data?

A. Regularization
B. Underfitting
C. Convergence
D. Overfitting

9 The Curse of Dimensionality refers to:

A. The computational cost of adding more rows to a dataset.
B. The bias introduced by dimension reduction techniques.
C. The exponential increase in data volume required to generalize accurately as the number of features increases.
D. The difficulty of visualizing data in 2D.

10 In the Statistical Learning Framework, we assume the data is generated by:

A. A random number generator.
B. A deterministic linear function.
C. The learning algorithm itself.
D. An unknown joint probability distribution .

11 The function that measures the penalty for predicting when the true value is is called:

A. The Hypothesis
B. The Activation Function
C. The Regularizer
D. The Loss Function

12 What is Generalization Error (or True Risk)?

A. The error on the test set.
B. The expected value of the loss function over the underlying data distribution.
C. The error on the training set.
D. The difference between the predicted value and the average value.

13 Mathematically, Empirical Risk for a hypothesis on a dataset of size is defined as:

A.
B.
C.
D.

14 Empirical Risk Minimization (ERM) is the principle of:

A. Choosing the hypothesis with the simplest structure.
B. Minimizing the error on the test set.
C. Minimizing the computational time.
D. Choosing the hypothesis that minimizes the loss on the training set.

15 Why is minimizing Empirical Risk not always sufficient to ensure good learning?

A. It is computationally too expensive.
B. It ignores the training labels.
C. It always results in underfitting.
D. It can lead to overfitting if the hypothesis space is too complex.

16 What is Inductive Bias?

A. The tendency of a model to underfit the data.
B. The bias term in the equation .
C. The error introduced by noise in the data.
D. The set of assumptions a learner makes to predict outputs for unseen inputs.

17 Which of the following describes Occam's Razor in the context of Inductive Bias?

A. Complex models are always better.
B. We should always choose the hypothesis with the highest training error.
C. Data should be sliced into smaller chunks for processing.
D. Among competing hypotheses that fit the data equally well, the simplest one should be selected.

18 Restricting the hypothesis space to include only Linear Classifiers is an example of:

A. Sampling Bias
B. Restriction Bias (Language Bias)
C. Preference Bias
D. Confirmation Bias

19 In the context of the No Free Lunch Theorem, which statement is true?

A. Deep Learning is universally better than Decision Trees.
B. Averaged over all possible data generating distributions, every classification algorithm has the same error rate.
C. A single algorithm exists that is superior for all possible problems.
D. More data always guarantees a better model.

20 What does PAC stand for in Learning Theory?

A. Perfectly Accurate Classification
B. Probably Approximately Correct
C. Probabilistic Algorithm Complexity
D. Pattern Analysis and Computing

21 In the PAC framework, the parameter (epsilon) represents:

A. The complexity of the hypothesis space.
B. The accuracy parameter (maximum allowable error).
C. The number of samples.
D. The probability of failure.

22 In the PAC framework, the parameter (delta) represents:

A. The error rate.
B. The dimensionality of the data.
C. The confidence parameter (probability that the error is high).
D. The learning rate.

23 A concept class is PAC-learnable if there exists an algorithm that outputs a hypothesis such that with probability at least :

A.
B.
C.
D.

24 In PAC learning, Sample Complexity refers to:

A. The complexity of the loss function.
B. The time complexity of the algorithm.
C. The number of training examples required to guarantee a valid hypothesis with high probability.
D. The number of features in the dataset.

25 For a finite hypothesis space , the number of samples required for consistent PAC learning is proportional to:

A.
B.
C.
D.

26 Which of the following implies Agnostic PAC Learning?

A. The error must be exactly zero.
B. The target function is assumed to be within the hypothesis space .
C. The target function may not belong to , and we seek the hypothesis with minimum risk.
D. There is no noise in the data.

27 Which type of learning is characterized by the absence of labels but the presence of a goal to discover hidden structures?

A. Supervised Learning
B. Reinforcement Learning
C. Semisupervised Learning
D. Unsupervised Learning

28 Consider a dataset where inputs are images of animals and labels are 'Cat', 'Dog', or 'Bird'. This is a:

A. Clustering problem
B. Multi-label Classification problem
C. Regression problem
D. Multi-class Classification problem

29 Underfitting is often a result of:

A. Too much training data.
B. The model being too complex.
C. Training for too many epochs.
D. The model being too simple to capture the underlying trend.

30 Which of the following is NOT a component of the Statistical Learning Framework?

A. Input Space
B. Loss Function
C. Output Space
D. The exact formula of the Target Function

31 Which statement best describes the Bias-Variance Tradeoff?

A. Bias and Variance are independent of model complexity.
B. Ideally, we want high bias and high variance.
C. Increasing model complexity decreases bias and increases variance.
D. Increasing model complexity increases bias and decreases variance.

32 In the context of PAC learning, a hypothesis is consistent with the training data if:

A. It correctly classifies all training examples (Empirical error is 0).
B. It is a linear function.
C. It has the lowest generalization error.
D. It is chosen randomly.

33 Which of the following represents the Zero-One Loss function for binary classification ()?

A.
B. if , else
C.
D.

34 In Semi-Supervised Learning:

A. All data is labeled.
B. The agent learns from rewards.
C. No data is labeled.
D. A small amount of labeled data is used with a large amount of unlabeled data.

35 Which learning paradigm is most suitable for a robot learning to walk by trial and error?

A. Supervised Learning
B. Unsupervised Learning
C. Transductive Learning
D. Reinforcement Learning

36 What is the primary goal of the Validation Set?

A. To tune hyperparameters and evaluate the model during development to prevent overfitting.
B. To train the model parameters.
C. To report the final accuracy of the model.
D. To increase the size of the training data.

37 If a learning algorithm has high Variance, it implies:

A. The algorithm is very sensitive to specific sets of training data (small changes in data lead to large changes in the model).
B. The algorithm always produces the same model regardless of the data.
C. The algorithm pays very little attention to the training data.
D. The algorithm has high systematic error.

38 The assumption that the training data and future test data are drawn from the same distribution is called:

A. The Bayesian assumption.
B. The i.i.d. assumption (Independent and Identically Distributed).
C. The Markov assumption.
D. The linearity assumption.

39 In the definition of PAC learning, the term 'Probably' refers to:

A. The confidence .
B. The error .
C. The hypothesis space size.
D. The loss function.

40 Which of the following is a potential solution to Overfitting?

A. Regularization (e.g., adding a penalty for complexity).
B. Reducing the size of the training set.
C. Making the model more complex.
D. Increasing the number of features.

41 In Linear Regression, the inductive bias typically includes the assumption that:

A. The relationship between input and output is linear.
B. The data is clustered.
C. The relationship is a high-degree polynomial.
D. The output is a discrete class label.

42 In the equation , the term represents:

A. Variance error.
B. Irreducible error.
C. Bias error.
D. Training error.

43 Which inequality is commonly used in PAC learning derivations to bound the probability of large deviations?

A. Euler's Identity
B. Hoeffding's Inequality
C. Newton's Second Law
D. Pythagorean Theorem

44 A hypothesis space is said to be infinite if:

A. It contains continuous parameters (e.g., all possible linear separators in ).
B. It contains a finite number of hypotheses.
C. It is empty.
D. It only contains decision trees of depth 3.

45 The difference between the True Risk and the Empirical Risk is often called:

A. Generalization Gap
B. Training Loss
C. Inductive Bias
D. Bayes Error

46 Which of the following datasets would be most appropriate for a Regression problem?

A. Emails labeled as spam/ham.
B. Handwritten digits 0-9.
C. Historical data of house sizes and their selling prices.
D. Photos labeled with names of people.

47 In an Unsupervised Learning setting, Dimensionality Reduction aims to:

A. Cluster data into groups.
B. Increase the number of features to capture more detail.
C. Label the data automatically.
D. Reduce the number of random variables under consideration by obtaining a set of principal variables.

48 Why do we need a Test Set that is completely separate from the Training Set?

A. To calculate the gradient.
B. To make the training faster.
C. To provide an unbiased evaluation of the final model fit.
D. To use for hyperparameter tuning.

49 In the context of Machine Learning scope, Computer Vision typically involves:

A. Optimizing database queries.
B. Analyzing text sentiment.
C. Predicting stock prices.
D. Extracting information from images and videos.

50 The 'Realizability Assumption' in PAC learning states that:

A. The sample size is infinite.
B. The data is noiseless.
C. There exists a hypothesis in the hypothesis space such that .
D. The learning algorithm is efficient.