Unit 5 - Practice Quiz

CSE275 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary goal of hyperparameter optimization in machine learning?

Hyperparameter optimization techniques Easy
A. To find the best set of hyperparameters that maximizes model performance.
B. To learn the model parameters like weights and biases from data.
C. To automatically select the features for the model.
D. To speed up the data collection process.

2 Which of the following is an example of a model hyperparameter?

Hyperparameter optimization techniques Easy
A. The weights learned by a linear regression model.
B. The final prediction of a model for a new data point.
C. Learning rate in a neural network.
D. The number of samples in the training dataset.

3 How does the Grid Search algorithm explore the hyperparameter space?

Grid search vs random search limitations Easy
A. It uses a probabilistic model to predict which hyperparameters will perform best.
B. It samples a fixed number of combinations randomly from the hyperparameter space.
C. It exhaustively checks every possible combination of a predefined set of hyperparameter values.
D. It evolves a population of hyperparameter sets using genetic operators.

4 What is the main limitation of Grid Search?

Grid search vs random search limitations Easy
A. It becomes extremely slow and computationally expensive as the number of hyperparameters increases.
B. It always finds a suboptimal solution compared to Random Search.
C. It is unable to handle categorical hyperparameters.
D. It can only be used for simple models like linear regression.

5 What is the primary advantage of Random Search over Grid Search?

Grid search vs random search limitations Easy
A. It explores every single point in the search space.
B. It is often more efficient because it doesn't waste time on unimportant hyperparameters.
C. It is more systematic and easier to reproduce.
D. It guarantees finding the global optimal hyperparameter settings.

6 If you have a limited time budget for hyperparameter tuning, why might Random Search be a better choice than Grid Search?

Grid search vs random search limitations Easy
A. Random Search is more likely to find a better solution within a smaller number of trials.
B. Random Search requires less memory to run.
C. Grid Search cannot be stopped early and must run to completion.
D. Grid Search is not compatible with modern machine learning libraries.

7 Evolutionary algorithms for hyperparameter tuning are inspired by which real-world process?

Evolutionary hyperparameter tuning Easy
A. The movement of particles in physics.
B. Biological evolution and natural selection.
C. The way humans make decisions based on past experience.
D. The physical process of annealing metals.

8 In an evolutionary algorithm, what does the 'fitness function' typically measure?

Evolutionary hyperparameter tuning Easy
A. The performance of a model trained with a specific set of hyperparameters.
B. The complexity of the model.
C. The speed at which the model trains.
D. The number of hyperparameters being tuned.

9 What do the 'crossover' and 'mutation' operators do in an evolutionary algorithm?

Evolutionary hyperparameter tuning Easy
A. They select the best hyperparameter sets to survive to the next generation.
B. They define the initial population of hyperparameter sets.
C. They evaluate the performance of the current hyperparameter sets.
D. They create new hyperparameter sets from existing ones.

10 What is the core idea of Bayesian Optimization?

Bayesian optimization (conceptual) Easy
A. It uses principles of biological evolution to find the best hyperparameters.
B. It uses information from past evaluations to decide which hyperparameters to try next.
C. It divides the hyperparameter space into an exhaustive grid.
D. It randomly selects hyperparameters from a uniform distribution.

11 In Bayesian Optimization, the function that guides the search for the next point to evaluate is called the:

Bayesian optimization (conceptual) Easy
A. Loss function.
B. Objective function.
C. Acquisition function.
D. Fitness function.

12 Compared to Random Search, a major benefit of Bayesian Optimization is that it typically:

Bayesian optimization (conceptual) Easy
A. Is much simpler to set up and run.
B. Requires no initial data to start.
C. Finds a good solution with fewer model training iterations.
D. Runs faster for each individual iteration.

13 Which of the following is a hyperparameter that would be optimized for a Random Forest ensemble?

Optimization for ensemble learning Easy
A. The number of trees in the forest.
B. The size of the training data.
C. The predictions made by the ensemble.
D. The weights of the features in a single decision tree.

14 In boosting methods like AdaBoost or Gradient Boosting, how are the models in the ensemble optimized?

Optimization for ensemble learning Easy
A. They are trained sequentially, with each new model focusing on the mistakes of the previous ones.
B. They are trained independently and their results are averaged.
C. A single best model is selected from a large pool of trained models.
D. They are all trained at the same time on different subsets of data.

15 In the 'stacking' ensemble method, what is the role of the 'meta-learner'?

Optimization for ensemble learning Easy
A. To learn the best way to combine the predictions from the base models.
B. To preprocess the input data for all base models.
C. To generate diverse training data for the base models.
D. To select the best single model from the ensemble.

16 What is the main purpose of Automated Machine Learning (AutoML)?

Introduction to automated machine learning (AutoML) Easy
A. To replace the need for data.
B. To design new types of neural network architectures.
C. To automate the entire machine learning pipeline from data prep to model deployment.
D. To create better data visualization tools.

17 Which of these tasks is a core component of most AutoML systems?

Introduction to automated machine learning (AutoML) Easy
A. Algorithm selection and hyperparameter tuning.
B. Ethical review of the model's impact.
C. Defining the business problem to be solved.
D. Communicating model results to stakeholders.

18 A primary benefit of using AutoML for a data scientist is that it:

Introduction to automated machine learning (AutoML) Easy
A. Requires no computational resources.
B. Eliminates the need for any human oversight.
C. Always produces a perfect, error-free model.
D. Saves time by automating repetitive and experimental tasks.

19 Why are hyperparameters not learned during the model training process like regular parameters?

Hyperparameter optimization techniques Easy
A. Because they are not numerical values.
B. Because they define the structure of the model or the learning process itself.
C. Because there are too many of them to learn.
D. Because it is computationally impossible to learn them.

20 In evolutionary hyperparameter tuning, what does 'selection' refer to?

Evolutionary hyperparameter tuning Easy
A. Choosing which hyperparameters to tune.
B. Choosing the best-performing hyperparameter sets to create the next generation.
C. Choosing the machine learning model to use.
D. Choosing the dataset for training.

21 A machine learning model has 5 hyperparameters. You decide to test 4 values for each hyperparameter. If you use Grid Search, how many model evaluations will be performed, and what is the primary limitation this illustrates?

Grid search vs random search limitations Medium
A. 625 evaluations (); It illustrates the risk of overfitting the validation set.
B. 20 evaluations; It illustrates inefficiency in low-dimensional spaces.
C. 1024 evaluations (); It illustrates the "curse of dimensionality".
D. 1024 evaluations (); It illustrates the difficulty with non-continuous parameters.

22 In Bayesian Optimization, what is the primary role of the 'acquisition function' (e.g., Expected Improvement)?

Bayesian optimization (conceptual) Medium
A. To guide the search by balancing exploration (trying new areas) and exploitation (refining known good areas).
B. To calculate the cross-validation score after each trial.
C. To define the final prediction score of the machine learning model.
D. To build a probabilistic surrogate model of the objective function.

23 In an evolutionary algorithm for hyperparameter tuning, the 'crossover' operation is analogous to which of the following actions?

Evolutionary hyperparameter tuning Medium
A. Evaluating the performance (fitness) of a hyperparameter configuration on a validation set.
B. Combining parts of two well-performing hyperparameter configurations to create a new one.
C. Selecting the best performing hyperparameter configurations for the next generation.
D. Randomly changing a single hyperparameter value in a configuration.

24 Imagine you are tuning two hyperparameters for a model: learning rate (very influential) and dropout rate (less influential). With a fixed budget of 25 trials, why is Random Search often more effective than a Grid Search?

Grid search vs random search limitations Medium
A. Random Search is guaranteed to find the global optimum.
B. Random Search explores 25 unique values for each hyperparameter, while Grid Search only explores 5.
C. Grid Search wastes evaluations by testing the same learning rates with different, less important dropout rates.
D. Grid Search can only handle continuous hyperparameters.

25 When creating a weighted average ensemble of several models, the optimal weights are often found by solving a constrained optimization problem. What is the typical objective of this optimization?

Optimization for ensemble learning Medium
A. To minimize the error (e.g., MSE or log-loss) of the weighted average prediction on a validation set.
B. To maximize the variance of the predictions from the ensemble.
C. To ensure all weights are equal, promoting model fairness.
D. To maximize the training time to ensure model convergence.

26 Which of the following tasks is a core component of most AutoML systems, aiming to reduce manual effort in the ML pipeline?

Introduction to automated machine learning (AutoML) Medium
A. Collecting and generating new raw data.
B. Automated feature engineering and model selection.
C. Defining the business problem and success metrics.
D. Final model deployment and ethical review.

27 You are tuning a deep learning model where each evaluation takes 6 hours. You have a budget for approximately 30-40 evaluations. Which hyperparameter optimization technique is most appropriate for this scenario?

Hyperparameter optimization techniques Medium
A. Random Search, because it is simple to implement and parallelize.
B. Grid Search, because it is exhaustive and guarantees finding the best combination.
C. Manual Search, because the long training time allows for human intuition to guide the process.
D. Bayesian Optimization, because it builds a model of the search space to make intelligent choices for the next evaluation.

28 What is the primary role of the 'mutation' operation in evolutionary algorithms for hyperparameter tuning?

Evolutionary hyperparameter tuning Medium
A. To ensure the population converges to a single best solution quickly.
B. To combine two good solutions into a new one.
C. To evaluate the performance of each individual in the population.
D. To maintain diversity in the population and prevent premature convergence to a local optimum.

29 How does the surrogate model (e.g., a Gaussian Process) in Bayesian Optimization contribute to its efficiency compared to Random Search?

Bayesian optimization (conceptual) Medium
A. It replaces the actual model training, making evaluations instantaneous.
B. It approximates the objective function and quantifies uncertainty, allowing for more informed decisions on which hyperparameters to try next.
C. It guarantees that each subsequent evaluation will yield a better result.
D. It provides a deterministic map of the entire search space after one evaluation.

30 In the context of optimizing an ensemble, why is it often beneficial to combine models that are diverse (i.e., they make different errors)?

Optimization for ensemble learning Medium
A. Diverse models are computationally cheaper to train.
B. When one model makes an error, other, different models may correct it, leading to a lower overall ensemble error.
C. Diversity is only important for classification tasks, not regression.
D. Diversity ensures that the ensemble's bias is always lower than any individual model's bias.

31 When is Grid Search a more suitable choice than Random Search?

Grid search vs random search limitations Medium
A. When the computational budget is extremely limited.
B. When dealing with a low-dimensional search space (e.g., 2-3 hyperparameters) and you suspect the optimal values lie on a grid.
C. When the number of hyperparameters is very large (e.g., >10).
D. When the objective function is non-deterministic and noisy.

32 A key trade-off when using an AutoML framework compared to manual modeling is often described as:

Introduction to automated machine learning (AutoML) Medium
A. Data Size vs. Model Complexity: AutoML can only handle small datasets.
B. Speed vs. Accuracy: AutoML is faster but always less accurate than a manually tuned model.
C. Computation Cost vs. Human Effort: AutoML reduces manual work but may require significant computational resources.
D. Performance vs. Interpretability: AutoML models are always black boxes.

33 In evolutionary hyperparameter tuning, what does the 'fitness function' typically represent?

Evolutionary hyperparameter tuning Medium
A. The number of trainable parameters in the model.
B. A performance metric, such as validation accuracy or MSE, for a given set of hyperparameters.
C. The diversity of the current population of hyperparameter sets.
D. The computational time required to train the model.

34 Which of the following hyperparameter types would be most challenging for standard Grid Search to handle effectively?

Hyperparameter optimization techniques Medium
A. An integer hyperparameter with a small range (e.g., number of trees from 10 to 50).
B. A continuous hyperparameter that needs fine-tuning (e.g., learning rate).
C. A categorical hyperparameter with 3 choices (e.g., activation function).
D. A boolean hyperparameter (e.g., use_batchnorm True/False).

35 In boosting algorithms like Gradient Boosting, the optimization process involves sequentially adding new models. How is each new model trained?

Optimization for ensemble learning Medium
A. To predict the target variable directly, same as the first model.
B. On a completely different set of features to promote diversity.
C. On a random bootstrap sample of the original data.
D. To correct the errors (i.e., predict the residuals) of the existing ensemble.

36 What is a potential disadvantage of Bayesian Optimization?

Bayesian optimization (conceptual) Medium
A. It cannot handle categorical or conditional hyperparameters.
B. The computational overhead of fitting and optimizing the acquisition function can become significant.
C. It is inherently a sequential process and cannot be parallelized.
D. It is less sample-efficient than Random Search for expensive objective functions.

37 Consider a search space with both continuous (e.g., learning rate) and categorical (e.g., optimizer type) hyperparameters. Which optimization method naturally handles this mixed-type space without requiring significant adaptation?

Hyperparameter optimization techniques Medium
A. Bayesian Optimization with appropriate kernels (e.g., tree-based surrogate models).
B. Newton's method.
C. Standard Grid Search.
D. Gradient-based optimization.

38 You are using an evolutionary algorithm to tune a neural network. The 'population' consists of 50 different network configurations. After evaluating all 50, the 'selection' phase begins. What is the most likely goal of this phase?

Evolutionary hyperparameter tuning Medium
A. To randomly mutate every configuration to create 50 new ones.
B. To average the hyperparameters of all 50 configurations.
C. To choose a subset of high-performing configurations ('parents') to produce the next generation.
D. To choose a single best configuration and discard all others.

39 The search for the best ML pipeline (including preprocessing, model, and hyperparameters) can be framed as a Combined Algorithm Selection and Hyperparameter optimization (CASH) problem. Which technique is conceptually best suited to solve the CASH problem?

Introduction to automated machine learning (AutoML) Medium
A. A single, large neural network that learns the entire pipeline.
B. Linear Regression to predict the best hyperparameters.
C. Grid Search, by creating a massive grid of all possible pipelines.
D. Bayesian Optimization, by modeling the performance of different pipeline configurations.

40 In a stacking ensemble, a 'meta-learner' is trained. What is the primary optimization goal when training this meta-learner?

Optimization for ensemble learning Medium
A. To select the most diverse subset of features for training.
B. To combine the predictions of the base models in a way that minimizes the final ensemble's error.
C. To train faster than any of the individual base models.
D. To find the optimal hyperparameters for the base models.

41 Consider a 10-dimensional hyperparameter space where only 2 dimensions significantly impact model performance. If both Grid Search and Random Search are given an identical, limited evaluation budget (e.g., 243 trials), why is Random Search statistically far more likely to find a near-optimal configuration? A Grid Search with this budget could only test 3 points per dimension (, for 5 dimensions), leaving 5 dimensions completely unexplored.

Grid search vs random search limitations Hard
A. Random Search uses a surrogate model to predict the most promising areas, unlike Grid Search.
B. Grid Search is guaranteed to find the global optimum if the grid is fine enough, making it better with any budget.
C. Random Search is not constrained by a fixed grid, so every trial evaluates a unique combination across all 10 dimensions, maximizing the chance of sampling effective values in the two important dimensions.
D. The total number of evaluations in Random Search is independent of the dimensionality of the search space.

42 A data scientist is using Bayesian Optimization with a Gaussian Process (GP) surrogate model. The true objective function (model validation loss vs. hyperparameters) is discovered to be non-stationary and has multiple sharp, discontinuous regions. What is the most likely consequence of this characteristic on the Bayesian Optimization process?

Bayesian optimization (conceptual) Hard
A. The acquisition function, such as Expected Improvement (EI), will automatically adapt and put more weight on exploration, effectively handling the discontinuities.
B. The Gaussian Process will require a different kernel, like a linear kernel, to model the discontinuities accurately.
C. Bayesian Optimization will perform better than Random Search because its probabilistic model can explicitly represent discontinuities.
D. The GP's smoothness assumption will be violated, leading to inaccurate uncertainty estimates and potentially causing the acquisition function to guide the search towards suboptimal regions.

43 In an evolutionary algorithm for hyperparameter tuning, the population consistently converges to a suboptimal local minimum after just a few generations. Which combination of operator adjustments is most likely to mitigate this premature convergence and promote a more global search?

Evolutionary hyperparameter tuning Hard
A. Increase the mutation rate and decrease the selection pressure (e.g., use tournament selection with a smaller tournament size).
B. Decrease the mutation rate and increase the selection pressure (e.g., use elitism to preserve the best individuals).
C. Increase the crossover rate while completely eliminating mutation.
D. Implement fitness sharing to encourage niching but keep selection pressure high.

44 When constructing a weighted average ensemble of three models (A, B, C) with similar individual accuracies, it's found that models A and B have highly correlated errors, while model C's errors are largely uncorrelated with A and B. During optimization to minimize the ensemble's mean squared error, what will be the likely distribution of the optimal weights ()?

Optimization for ensemble learning Hard
A. The weight for model C () will be significantly larger than for A and B (), which will be down-weighted due to their redundancy.
B. All weights () will be approximately equal, as their individual accuracies are similar.
C. The weights for A and B () will be high, and the weight for C () will be near zero, as A and B reinforce each other.
D. The optimization will be unstable and fail to converge due to the high correlation between models A and B.

45 The Combined Algorithm Selection and Hyperparameter optimization (CASH) problem is a core challenge in AutoML. Why is solving the CASH problem fundamentally more complex than performing hyperparameter optimization for a single, pre-determined algorithm?

Introduction to automated machine learning (AutoML) Hard
A. The CASH problem involves a larger number of hyperparameters, but the optimization landscape remains similarly structured.
B. The search space is heterogeneous and conditional: hyperparameters for one algorithm are irrelevant for another, creating a complex, structured search space that simple optimization methods cannot handle.
C. Algorithm selection is a discrete optimization problem, while hyperparameter tuning is continuous, and combining them is mathematically impossible without heuristics.
D. The objective function in CASH is multi-modal, whereas for single-algorithm HPO it is always convex.

46 In Bayesian Optimization, how does the Upper Confidence Bound (UCB) acquisition function balance the exploration-exploitation trade-off, and how does its behavior contrast with Expected Improvement (EI) when uncertainty is very high?

Bayesian optimization (conceptual) Hard
A. EI balances exploration and exploitation using a trade-off parameter, while UCB is a purely exploitative strategy.
B. UCB and EI are mathematically equivalent; they only differ in their implementation details and computational cost.
C. UCB explicitly balances the predicted mean (exploitation) and the uncertainty/standard deviation (exploration) via a tunable parameter. In high-uncertainty regions, UCB becomes more explorative, whereas EI might still favor points with a slightly better-predicted mean.
D. UCB primarily focuses on exploitation by sampling at the highest predicted mean, while EI focuses on exploration.

47 Consider a search space where two hyperparameters, A (learning rate) and B (dropout rate), exhibit a strong, non-linear interaction effect on model performance. Which statement most accurately describes the limitations of Grid and Random Search in optimizing such a space?

Grid search vs random search limitations Hard
A. Grid Search may completely miss the optimal interaction region if it doesn't align with its grid axes, while Random Search has a higher probability of sampling points within that region due to its uniform coverage.
B. Random Search is ineffective here because it does not model relationships between hyperparameters.
C. Grid Search is superior because its systematic approach guarantees it will test the interaction points, while Random Search might miss them by chance.
D. Both methods are equally effective, as they will eventually sample the optimal point if the number of trials is high enough.

48 Multi-fidelity optimization techniques like Hyperband accelerate hyperparameter search by evaluating many configurations on a small budget (e.g., few epochs) and promoting promising ones to higher budgets. What is the central assumption these methods rely on, the violation of which would lead to poor performance?

Hyperparameter optimization techniques Hard
A. The assumption that all hyperparameters are independent and do not have interaction effects.
B. The assumption that the loss function is convex with respect to the hyperparameters.
C. The 'ranking correlation' assumption: the relative performance of hyperparameter configurations on a small budget is a good predictor of their relative performance on a large budget.
D. The assumption that the optimal configuration can be found by evaluating at least 50% of the configurations on the full budget.

49 Population-Based Training (PBT) is a hybrid HPO method. How does PBT fundamentally differ from a standard parallelized genetic algorithm (GA) in its approach to exploration and exploitation?

Evolutionary hyperparameter tuning Hard
A. PBT uses gradient-based methods to update hyperparameters, while GAs use mutation and crossover.
B. In PBT, members of the population are trained continuously and exploit information from the rest of the population mid-training to update their hyperparameters, whereas in a standard GA, evaluation is a fixed, terminal process for each generation.
C. PBT maintains a static population throughout the process, while GAs create entirely new generations based on fitness.
D. GAs evolve both hyperparameters and model weights simultaneously, while PBT only evolves hyperparameters.

50 In a stacking ensemble, a meta-learner is trained on the out-of-fold (OOF) predictions from base learners to make the final prediction. What is the primary optimization-related consequence of training the meta-learner on in-sample predictions (i.e., predictions on the same data the base learners were trained on) instead of OOF predictions?

Optimization for ensemble learning Hard
A. The meta-learner's objective function would become non-convex, making it impossible to find an optimal solution.
B. The meta-learner would severely overfit because the base learners' predictions on in-sample data are unrealistically accurate, leading it to trust them too much.
C. It would lead to underfitting, as the meta-learner would not have enough information to learn the relationship between base learner outputs.
D. The optimization process would become significantly faster as it avoids the need for cross-validation.

51 What is a primary theoretical limitation of standard Bayesian Optimization that makes it computationally challenging to apply directly to very high-dimensional hyperparameter spaces (e.g., > 50 dimensions)?

Bayesian optimization (conceptual) Hard
A. The acquisition function becomes impossible to compute in more than 20 dimensions.
B. High-dimensional spaces are always non-convex, which violates the core assumptions of Bayesian Optimization.
C. Bayesian Optimization is inherently a sequential process and cannot be parallelized in high dimensions.
D. The performance of the Gaussian Process surrogate model degrades significantly, as it suffers from the 'curse of dimensionality,' making it difficult to model the objective function and estimate uncertainty accurately.

52 When defining the search space for a hyperparameter like learning rate or regularization strength, it is standard practice to use a log-uniform distribution (e.g., from to ) rather than a uniform distribution. What is the primary optimization-related justification for this?

Hyperparameter optimization techniques Hard
A. This practice prevents the optimizer from sampling a value of exactly zero, which can cause mathematical errors.
B. Log-uniform distributions are computationally cheaper for random sampling algorithms to process.
C. Uniform distributions are only suitable for integer-valued hyperparameters, not continuous ones.
D. The impact of these hyperparameters is often multiplicative, meaning changes in magnitude (e.g., from to ) are more important than changes in absolute value (e.g., from $0.09$ to $0.1$).

53 While Random Search is generally more efficient than Grid Search, in which specific, albeit rare, scenario could Grid Search theoretically outperform Random Search given the same number of function evaluations?

Grid search vs random search limitations Hard
A. A problem where the objective function is highly non-convex with many local minima.
B. Any problem where the hyperparameter search space contains only categorical variables.
C. A low-dimensional problem (e.g., 2D) where the objective function's iso-performance contours are perfectly aligned with the grid axes, and the user has prior knowledge to place the grid optimally.
D. A high-dimensional problem where most hyperparameters are irrelevant.

54 When applying a genetic algorithm to a hyperparameter space with mixed data types (e.g., continuous learning rate, integer number of layers, categorical activation function), what is a primary challenge that a naive implementation of a standard crossover operator (like single-point crossover) would face?

Evolutionary hyperparameter tuning Hard
A. Crossover is only defined for binary representations and cannot be used for continuous or integer values.
B. It can produce invalid offspring. For example, averaging a 'ReLU' and 'Sigmoid' category is nonsensical, and crossing over bit representations of floats can lead to values outside the desired range.
C. It would cause the algorithm to converge much faster than mutation-only approaches, leading to premature convergence.
D. It would systematically decrease the fitness of the population over time due to a loss of genetic diversity.

55 The AdaBoost algorithm is an ensemble method that sequentially adds weak learners. The optimization objective at each step is to train a new learner that focuses on the instances that previous learners misclassified. How is this re-focusing mathematically achieved during the optimization of the subsequent weak learner?

Optimization for ensemble learning Hard
A. By increasing the weights of the misclassified instances in the training set, forcing the new learner to pay more attention to them to minimize the weighted training error.
B. By training each new learner on a bootstrap sample of the original data, with misclassified points having a higher probability of being selected.
C. By using a different loss function for each subsequent learner in the sequence.
D. By removing all correctly classified instances from the training set for the next learner.

56 You are tasked with tuning a model that has a large conditional hyperparameter space (e.g., an SVM where choosing a kernel activates a different subset of parameters like gamma, degree, or coef0). Which class of HPO algorithms is most naturally suited to handle this structured search space without requiring manual encoding or separate optimization runs?

Hyperparameter optimization techniques Hard
A. Standard genetic algorithms with a flattened representation of all possible hyperparameters.
B. Random Search, by randomly sampling a condition and then its associated parameters.
C. Tree-based model-based optimization methods, such as those using Tree-structured Parzen Estimators (TPE).
D. Grid Search, by defining a separate grid for each possible condition.

57 A modern AutoML system is used for a critical application and produces a model with state-of-the-art predictive accuracy. However, a post-hoc analysis reveals the model is a 'black box' that heavily relies on uninterpretable features, making it impossible to audit for fairness or debug unexpected failures. This scenario highlights which critical limitation of a purely optimization-driven AutoML approach?

Introduction to automated machine learning (AutoML) Hard
A. AutoML systems are incapable of producing models that are interpretable.
B. The dataset was not large enough for the AutoML system to find a simpler, more interpretable model.
C. AutoML systems often optimize for a single metric (e.g., accuracy), potentially at the expense of other crucial non-functional requirements like interpretability, fairness, and robustness.
D. The optimization algorithm used by the AutoML system was flawed and overfitted to the accuracy metric.

58 When using a Gaussian Process (GP) as a surrogate in Bayesian Optimization, the choice of kernel function is crucial. If we have a strong prior belief that the objective function is very smooth and that hyperparameters that are close in Euclidean distance should have similar performance, but we have no knowledge about its periodicity or structure, what would be the most standard and robust kernel choice?

Bayesian optimization (conceptual) Hard
A. A linear kernel, as it is the simplest model and avoids overfitting the surrogate.
B. A periodic kernel, as it can capture cyclical patterns in the hyperparameter space.
C. Matérn kernel (with ), as it provides a good balance between smoothness and flexibility without being infinitely smooth like the RBF kernel.
D. Radial Basis Function (RBF) / Squared Exponential kernel, as it assumes infinite differentiability, which is too strong an assumption without specific knowledge.

59 In the context of evolutionary hyperparameter tuning for deep learning models, what is a primary advantage of evolving a learning rate schedule (e.g., the parameters for a cyclical or decay schedule) rather than evolving a single, fixed learning rate?

Evolutionary hyperparameter tuning Hard
A. It significantly reduces the number of hyperparameters to be optimized, simplifying the search space.
B. It allows the optimization process to find policies that can navigate complex loss landscapes more effectively, such as starting with a high learning rate for exploration and reducing it later for fine-tuning and convergence.
C. It guarantees that the training process will never diverge, regardless of the schedule parameters chosen.
D. Evolving a single fixed learning rate is an NP-hard problem, whereas evolving a schedule is computationally tractable.

60 Let be the number of trials in a Random Search. The probability of a trial falling into a desired quantile of the search space volume (e.g., the top 5%, so ) is simply . The probability that at least one of trials falls in this region is . What is the most important practical implication of this formula for hyperparameter optimization?

Grid search vs random search limitations Hard
A. It proves that Random Search will always find a better solution than Grid Search.
B. It demonstrates that a larger search space volume requires proportionally more trials to find a good solution.
C. It shows that the probability of success decreases exponentially with the number of dimensions in the search space.
D. The number of trials required to achieve a high probability of success is independent of the number of dimensions, depending only on the desired probability and the size of the optimal region.