1 $What is the primary goal of hyperparameter optimization in machine learning?$

Hyperparameter optimization techniques Easy

A.

To learn the model parameters like weights and biases from data.

B.

To find the best set of hyperparameters that maximizes model performance.

C.

To speed up the data collection process.

D.

To automatically select the features for the model.

2 $Which of the following is an example of a model hyperparameter?$

Hyperparameter optimization techniques Easy

A.

The number of samples in the training dataset.

B.

The weights learned by a linear regression model.

C.

Learning rate in a neural network.

D.

The final prediction of a model for a new data point.

3 $How does the Grid Search algorithm explore the hyperparameter space?$

Grid search vs random search limitations Easy

A.

It evolves a population of hyperparameter sets using genetic operators.

B.

It exhaustively checks every possible combination of a predefined set of hyperparameter values.

C.

It uses a probabilistic model to predict which hyperparameters will perform best.

D.

It samples a fixed number of combinations randomly from the hyperparameter space.

4 $What is the main limitation of Grid Search?$

Grid search vs random search limitations Easy

A.

It becomes extremely slow and computationally expensive as the number of hyperparameters increases.

B.

It is unable to handle categorical hyperparameters.

C.

It always finds a suboptimal solution compared to Random Search.

D.

It can only be used for simple models like linear regression.

5 $What is the primary advantage of Random Search over Grid Search?$

Grid search vs random search limitations Easy

A.

It is more systematic and easier to reproduce.

B.

It is often more efficient because it doesn't waste time on unimportant hyperparameters.

C.

It guarantees finding the global optimal hyperparameter settings.

D.

It explores every single point in the search space.

6 $If you have a limited time budget for hyperparameter tuning, why might Random Search be a better choice than Grid Search?$

Grid search vs random search limitations Easy

A.

Grid Search cannot be stopped early and must run to completion.

B.

Random Search is more likely to find a better solution within a smaller number of trials.

C.

Random Search requires less memory to run.

D.

Grid Search is not compatible with modern machine learning libraries.

7 $Evolutionary algorithms for hyperparameter tuning are inspired by which real-world process?$

Evolutionary hyperparameter tuning Easy

A.

Biological evolution and natural selection.

B.

The way humans make decisions based on past experience.

C.

The movement of particles in physics.

D.

The physical process of annealing metals.

8 $In an evolutionary algorithm, what does the 'fitness function' typically measure?$

Evolutionary hyperparameter tuning Easy

A.

The performance of a model trained with a specific set of hyperparameters.

B.

The complexity of the model.

C.

The number of hyperparameters being tuned.

D.

The speed at which the model trains.

9 $What do the 'crossover' and 'mutation' operators do in an evolutionary algorithm?$

Evolutionary hyperparameter tuning Easy

A.

They evaluate the performance of the current hyperparameter sets.

B.

They select the best hyperparameter sets to survive to the next generation.

C.

They create new hyperparameter sets from existing ones.

D.

They define the initial population of hyperparameter sets.

10 $What is the core idea of Bayesian Optimization?$

Bayesian optimization (conceptual) Easy

A.

It uses principles of biological evolution to find the best hyperparameters.

B.

It uses information from past evaluations to decide which hyperparameters to try next.

C.

It randomly selects hyperparameters from a uniform distribution.

D.

It divides the hyperparameter space into an exhaustive grid.

11 $In Bayesian Optimization, the function that guides the search for the next point to evaluate is called the:$

Bayesian optimization (conceptual) Easy

A.

Objective function.

B.

Loss function.

C.

Fitness function.

D.

Acquisition function.

12 $Compared to Random Search, a major benefit of Bayesian Optimization is that it typically:$

Bayesian optimization (conceptual) Easy

A.

Is much simpler to set up and run.

B.

Runs faster for each individual iteration.

C.

Finds a good solution with fewer model training iterations.

D.

Requires no initial data to start.

13 $Which of the following is a hyperparameter that would be optimized for a Random Forest ensemble?$

Optimization for ensemble learning Easy

A.

The number of trees in the forest.

B.

The weights of the features in a single decision tree.

C.

The size of the training data.

D.

The predictions made by the ensemble.

14 $In boosting methods like AdaBoost or Gradient Boosting, how are the models in the ensemble optimized?$

Optimization for ensemble learning Easy

A.

They are trained independently and their results are averaged.

B.

They are trained sequentially, with each new model focusing on the mistakes of the previous ones.

C.

They are all trained at the same time on different subsets of data.

D.

A single best model is selected from a large pool of trained models.

15 $In the 'stacking' ensemble method, what is the role of the 'meta-learner'?$

Optimization for ensemble learning Easy

A.

To learn the best way to combine the predictions from the base models.

B.

To generate diverse training data for the base models.

C.

To preprocess the input data for all base models.

D.

To select the best single model from the ensemble.

16 $What is the main purpose of Automated Machine Learning (AutoML)?$

Introduction to automated machine learning (AutoML) Easy

A.

To automate the entire machine learning pipeline from data prep to model deployment.

B.

To design new types of neural network architectures.

C.

To create better data visualization tools.

D.

To replace the need for data.

17 $Which of these tasks is a core component of most AutoML systems?$

Introduction to automated machine learning (AutoML) Easy

A.

Algorithm selection and hyperparameter tuning.

B.

Communicating model results to stakeholders.

C.

Ethical review of the model's impact.

D.

Defining the business problem to be solved.

18 $A primary benefit of using AutoML for a data scientist is that it:$

Introduction to automated machine learning (AutoML) Easy

A.

Always produces a perfect, error-free model.

B.

Saves time by automating repetitive and experimental tasks.

C.

Eliminates the need for any human oversight.

D.

Requires no computational resources.

19 $Why are hyperparameters not learned during the model training process like regular parameters?$

Hyperparameter optimization techniques Easy

A.

Because they are not numerical values.

B.

Because there are too many of them to learn.

C.

Because it is computationally impossible to learn them.

D.

Because they define the structure of the model or the learning process itself.

20 $In evolutionary hyperparameter tuning, what does 'selection' refer to?$

Evolutionary hyperparameter tuning Easy

A.

Choosing the machine learning model to use.

B.

Choosing the dataset for training.

C.

Choosing which hyperparameters to tune.

D.

Choosing the best-performing hyperparameter sets to create the next generation.

21 $A machine learning model has 5 hyperparameters. You decide to test 4 values for each hyperparameter. If you use Grid Search, how many model evaluations will be performed, and what is the primary limitation this illustrates?$

Grid search vs random search limitations Medium

A.

20 evaluations; It illustrates inefficiency in low-dimensional spaces.

B.

1024 evaluations (); It illustrates the difficulty with non-continuous parameters.

C.

1024 evaluations (); It illustrates the "curse of dimensionality".

D.

625 evaluations (); It illustrates the risk of overfitting the validation set.

22 $In Bayesian Optimization, what is the primary role of the 'acquisition function' (e.g., Expected Improvement)?$

Bayesian optimization (conceptual) Medium

A.

To define the final prediction score of the machine learning model.

B.

To calculate the cross-validation score after each trial.

C.

To build a probabilistic surrogate model of the objective function.

D.

To guide the search by balancing exploration (trying new areas) and exploitation (refining known good areas).

23 $In an evolutionary algorithm for hyperparameter tuning, the 'crossover' operation is analogous to which of the following actions?$

Evolutionary hyperparameter tuning Medium

A.

Randomly changing a single hyperparameter value in a configuration.

B.

Selecting the best performing hyperparameter configurations for the next generation.

C.

Combining parts of two well-performing hyperparameter configurations to create a new one.

D.

Evaluating the performance (fitness) of a hyperparameter configuration on a validation set.

24 $Imagine you are tuning two hyperparameters for a model: learning rate (very influential) and dropout rate (less influential). With a fixed budget of 25 trials, why is Random Search often more effective than a Grid Search?$

Grid search vs random search limitations Medium

A.

Grid Search wastes evaluations by testing the same learning rates with different, less important dropout rates.

B.

Random Search explores 25 unique values for each hyperparameter, while Grid Search only explores 5.

C.

Grid Search can only handle continuous hyperparameters.

D.

Random Search is guaranteed to find the global optimum.

25 $When creating a weighted average ensemble of several models, the optimal weights are often found by solving a constrained optimization problem. What is the typical objective of this optimization?$

Optimization for ensemble learning Medium

A.

To ensure all weights are equal, promoting model fairness.

B.

To maximize the training time to ensure model convergence.

C.

To minimize the error (e.g., MSE or log-loss) of the weighted average prediction on a validation set.

D.

To maximize the variance of the predictions from the ensemble.

26 $Which of the following tasks is a core component of most AutoML systems, aiming to reduce manual effort in the ML pipeline?$

Introduction to automated machine learning (AutoML) Medium

A.

Final model deployment and ethical review.

B.

Defining the business problem and success metrics.

C.

Automated feature engineering and model selection.

D.

Collecting and generating new raw data.

27 $You are tuning a deep learning model where each evaluation takes 6 hours. You have a budget for approximately 30-40 evaluations. Which hyperparameter optimization technique is most appropriate for this scenario?$

Hyperparameter optimization techniques Medium

A.

Bayesian Optimization, because it builds a model of the search space to make intelligent choices for the next evaluation.

B.

Manual Search, because the long training time allows for human intuition to guide the process.

C.

Grid Search, because it is exhaustive and guarantees finding the best combination.

D.

Random Search, because it is simple to implement and parallelize.

28 $What is the primary role of the 'mutation' operation in evolutionary algorithms for hyperparameter tuning?$

Evolutionary hyperparameter tuning Medium

A.

To maintain diversity in the population and prevent premature convergence to a local optimum.

B.

To ensure the population converges to a single best solution quickly.

C.

To evaluate the performance of each individual in the population.

D.

To combine two good solutions into a new one.

29 $How does the surrogate model (e.g., a Gaussian Process) in Bayesian Optimization contribute to its efficiency compared to Random Search?$

Bayesian optimization (conceptual) Medium

A.

It replaces the actual model training, making evaluations instantaneous.

B.

It approximates the objective function and quantifies uncertainty, allowing for more informed decisions on which hyperparameters to try next.

C.

It provides a deterministic map of the entire search space after one evaluation.

D.

It guarantees that each subsequent evaluation will yield a better result.

30 $In the context of optimizing an ensemble, why is it often beneficial to combine models that are diverse (i.e., they make different errors)?$

Optimization for ensemble learning Medium

A.

Diverse models are computationally cheaper to train.

B.

Diversity ensures that the ensemble's bias is always lower than any individual model's bias.

C.

When one model makes an error, other, different models may correct it, leading to a lower overall ensemble error.

D.

Diversity is only important for classification tasks, not regression.

31 $When is Grid Search a more suitable choice than Random Search?$

Grid search vs random search limitations Medium

A.

When dealing with a low-dimensional search space (e.g., 2-3 hyperparameters) and you suspect the optimal values lie on a grid.

B.

When the objective function is non-deterministic and noisy.

C.

When the computational budget is extremely limited.

D.

When the number of hyperparameters is very large (e.g., >10).

32 $A key trade-off when using an AutoML framework compared to manual modeling is often described as:$

Introduction to automated machine learning (AutoML) Medium

A.

Data Size vs. Model Complexity: AutoML can only handle small datasets.

B.

Performance vs. Interpretability: AutoML models are always black boxes.

C.

Computation Cost vs. Human Effort: AutoML reduces manual work but may require significant computational resources.

D.

Speed vs. Accuracy: AutoML is faster but always less accurate than a manually tuned model.

33 $In evolutionary hyperparameter tuning, what does the 'fitness function' typically represent?$

Evolutionary hyperparameter tuning Medium

A.

The computational time required to train the model.

B.

A performance metric, such as validation accuracy or MSE, for a given set of hyperparameters.

C.

The diversity of the current population of hyperparameter sets.

D.

The number of trainable parameters in the model.

34 $Which of the following hyperparameter types would be most challenging for standard Grid Search to handle effectively?$

Hyperparameter optimization techniques Medium

A.

A categorical hyperparameter with 3 choices (e.g., activation function).

B.

A continuous hyperparameter that needs fine-tuning (e.g., learning rate).

C.

A boolean hyperparameter (e.g., use_batchnorm True/False).

D.

An integer hyperparameter with a small range (e.g., number of trees from 10 to 50).

35 $In boosting algorithms like Gradient Boosting, the optimization process involves sequentially adding new models. How is each new model trained?$

Optimization for ensemble learning Medium

A.

To predict the target variable directly, same as the first model.

B.

On a completely different set of features to promote diversity.

C.

To correct the errors (i.e., predict the residuals) of the existing ensemble.

D.

On a random bootstrap sample of the original data.

36 $What is a potential disadvantage of Bayesian Optimization?$

Bayesian optimization (conceptual) Medium

A.

The computational overhead of fitting and optimizing the acquisition function can become significant.

B.

It is less sample-efficient than Random Search for expensive objective functions.

C.

It is inherently a sequential process and cannot be parallelized.

D.

It cannot handle categorical or conditional hyperparameters.

37 $Consider a search space with both continuous (e.g., learning rate) and categorical (e.g., optimizer type) hyperparameters. Which optimization method naturally handles this mixed-type space without requiring significant adaptation?$

Hyperparameter optimization techniques Medium

A.

Gradient-based optimization.

B.

Newton's method.

C.

Bayesian Optimization with appropriate kernels (e.g., tree-based surrogate models).

D.

Standard Grid Search.

38 $You are using an evolutionary algorithm to tune a neural network. The 'population' consists of 50 different network configurations. After evaluating all 50, the 'selection' phase begins. What is the most likely goal of this phase?$

Evolutionary hyperparameter tuning Medium

A.

To choose a subset of high-performing configurations ('parents') to produce the next generation.

B.

To average the hyperparameters of all 50 configurations.

C.

To randomly mutate every configuration to create 50 new ones.

D.

To choose a single best configuration and discard all others.

39 $The search for the best ML pipeline (including preprocessing, model, and hyperparameters) can be framed as a Combined Algorithm Selection and Hyperparameter optimization (CASH) problem. Which technique is conceptually best suited to solve the CASH problem?$

Introduction to automated machine learning (AutoML) Medium

A.

Bayesian Optimization, by modeling the performance of different pipeline configurations.

B.

A single, large neural network that learns the entire pipeline.

C.

Grid Search, by creating a massive grid of all possible pipelines.

D.

Linear Regression to predict the best hyperparameters.

40 $In a stacking ensemble, a 'meta-learner' is trained. What is the primary optimization goal when training this meta-learner?$

Optimization for ensemble learning Medium

A.

To train faster than any of the individual base models.

B.

To find the optimal hyperparameters for the base models.

C.

To select the most diverse subset of features for training.

D.

To combine the predictions of the base models in a way that minimizes the final ensemble's error.

41 $Consider a 10-dimensional hyperparameter space where only 2 dimensions significantly impact model performance. If both Grid Search and Random Search are given an identical, limited evaluation budget (e.g., 243 trials), why is Random Search statistically far more likely to find a near-optimal configuration? A Grid Search with this budget could only test 3 points per dimension (, for 5 dimensions), leaving 5 dimensions completely unexplored.$

Grid search vs random search limitations Hard

A.

Grid Search is guaranteed to find the global optimum if the grid is fine enough, making it better with any budget.

B.

Random Search is not constrained by a fixed grid, so every trial evaluates a unique combination across all 10 dimensions, maximizing the chance of sampling effective values in the two important dimensions.

C.

The total number of evaluations in Random Search is independent of the dimensionality of the search space.

D.

Random Search uses a surrogate model to predict the most promising areas, unlike Grid Search.

42 $A data scientist is using Bayesian Optimization with a Gaussian Process (GP) surrogate model. The true objective function (model validation loss vs. hyperparameters) is discovered to be non-stationary and has multiple sharp, discontinuous regions. What is the most likely consequence of this characteristic on the Bayesian Optimization process?$

Bayesian optimization (conceptual) Hard

A.

Bayesian Optimization will perform better than Random Search because its probabilistic model can explicitly represent discontinuities.

B.

The acquisition function, such as Expected Improvement (EI), will automatically adapt and put more weight on exploration, effectively handling the discontinuities.

C.

The Gaussian Process will require a different kernel, like a linear kernel, to model the discontinuities accurately.

D.

The GP's smoothness assumption will be violated, leading to inaccurate uncertainty estimates and potentially causing the acquisition function to guide the search towards suboptimal regions.

43 $In an evolutionary algorithm for hyperparameter tuning, the population consistently converges to a suboptimal local minimum after just a few generations. Which combination of operator adjustments is most likely to mitigate this premature convergence and promote a more global search?$

Evolutionary hyperparameter tuning Hard

A.

Increase the mutation rate and decrease the selection pressure (e.g., use tournament selection with a smaller tournament size).

B.

Increase the crossover rate while completely eliminating mutation.

C.

Implement fitness sharing to encourage niching but keep selection pressure high.

D.

Decrease the mutation rate and increase the selection pressure (e.g., use elitism to preserve the best individuals).

44 $When constructing a weighted average ensemble of three models (A, B, C) with similar individual accuracies, it's found that models A and B have highly correlated errors, while model C's errors are largely uncorrelated with A and B. During optimization to minimize the ensemble's mean squared error, what will be the likely distribution of the optimal weights ()?$

Optimization for ensemble learning Hard

A.

The weights for A and B () will be high, and the weight for C () will be near zero, as A and B reinforce each other.

B.

The weight for model C () will be significantly larger than for A and B (), which will be down-weighted due to their redundancy.

C.

The optimization will be unstable and fail to converge due to the high correlation between models A and B.

D.

All weights () will be approximately equal, as their individual accuracies are similar.

45 $The Combined Algorithm Selection and Hyperparameter optimization (CASH) problem is a core challenge in AutoML. Why is solving the CASH problem fundamentally more complex than performing hyperparameter optimization for a single, pre-determined algorithm?$

Introduction to automated machine learning (AutoML) Hard

A.

Algorithm selection is a discrete optimization problem, while hyperparameter tuning is continuous, and combining them is mathematically impossible without heuristics.

B.

The CASH problem involves a larger number of hyperparameters, but the optimization landscape remains similarly structured.

C.

The search space is heterogeneous and conditional: hyperparameters for one algorithm are irrelevant for another, creating a complex, structured search space that simple optimization methods cannot handle.

D.

The objective function in CASH is multi-modal, whereas for single-algorithm HPO it is always convex.

46 $In Bayesian Optimization, how does the Upper Confidence Bound (UCB) acquisition function balance the exploration-exploitation trade-off, and how does its behavior contrast with Expected Improvement (EI) when uncertainty is very high?$

Bayesian optimization (conceptual) Hard

A.

UCB and EI are mathematically equivalent; they only differ in their implementation details and computational cost.

B.

UCB explicitly balances the predicted mean (exploitation) and the uncertainty/standard deviation (exploration) via a tunable parameter. In high-uncertainty regions, UCB becomes more explorative, whereas EI might still favor points with a slightly better-predicted mean.

C.

UCB primarily focuses on exploitation by sampling at the highest predicted mean, while EI focuses on exploration.

D.

EI balances exploration and exploitation using a trade-off parameter, while UCB is a purely exploitative strategy.

47 $Consider a search space where two hyperparameters, A (learning rate) and B (dropout rate), exhibit a strong, non-linear interaction effect on model performance. Which statement most accurately describes the limitations of Grid and Random Search in optimizing such a space?$

Grid search vs random search limitations Hard

A.

Grid Search may completely miss the optimal interaction region if it doesn't align with its grid axes, while Random Search has a higher probability of sampling points within that region due to its uniform coverage.

B.

Random Search is ineffective here because it does not model relationships between hyperparameters.

C.

Grid Search is superior because its systematic approach guarantees it will test the interaction points, while Random Search might miss them by chance.

D.

Both methods are equally effective, as they will eventually sample the optimal point if the number of trials is high enough.

48 $Multi-fidelity optimization techniques like Hyperband accelerate hyperparameter search by evaluating many configurations on a small budget (e.g., few epochs) and promoting promising ones to higher budgets. What is the central assumption these methods rely on, the violation of which would lead to poor performance?$

Hyperparameter optimization techniques Hard

A.

The 'ranking correlation' assumption: the relative performance of hyperparameter configurations on a small budget is a good predictor of their relative performance on a large budget.

B.

The assumption that all hyperparameters are independent and do not have interaction effects.

C.

The assumption that the optimal configuration can be found by evaluating at least 50% of the configurations on the full budget.

D.

The assumption that the loss function is convex with respect to the hyperparameters.

49 $Population-Based Training (PBT) is a hybrid HPO method. How does PBT fundamentally differ from a standard parallelized genetic algorithm (GA) in its approach to exploration and exploitation?$

Evolutionary hyperparameter tuning Hard

A.

PBT uses gradient-based methods to update hyperparameters, while GAs use mutation and crossover.

B.

GAs evolve both hyperparameters and model weights simultaneously, while PBT only evolves hyperparameters.

C.

In PBT, members of the population are trained continuously and exploit information from the rest of the population mid-training to update their hyperparameters, whereas in a standard GA, evaluation is a fixed, terminal process for each generation.

D.

PBT maintains a static population throughout the process, while GAs create entirely new generations based on fitness.

50 $In a stacking ensemble, a meta-learner is trained on the out-of-fold (OOF) predictions from base learners to make the final prediction. What is the primary optimization-related consequence of training the meta-learner on in-sample predictions (i.e., predictions on the same data the base learners were trained on) instead of OOF predictions?$

Optimization for ensemble learning Hard

A.

The optimization process would become significantly faster as it avoids the need for cross-validation.

B.

It would lead to underfitting, as the meta-learner would not have enough information to learn the relationship between base learner outputs.

C.

The meta-learner would severely overfit because the base learners' predictions on in-sample data are unrealistically accurate, leading it to trust them too much.

D.

The meta-learner's objective function would become non-convex, making it impossible to find an optimal solution.

51 $What is a primary theoretical limitation of standard Bayesian Optimization that makes it computationally challenging to apply directly to very high-dimensional hyperparameter spaces (e.g., > 50 dimensions)?$

Bayesian optimization (conceptual) Hard

A.

High-dimensional spaces are always non-convex, which violates the core assumptions of Bayesian Optimization.

B.

The performance of the Gaussian Process surrogate model degrades significantly, as it suffers from the 'curse of dimensionality,' making it difficult to model the objective function and estimate uncertainty accurately.

C.

Bayesian Optimization is inherently a sequential process and cannot be parallelized in high dimensions.

D.

The acquisition function becomes impossible to compute in more than 20 dimensions.

52 $When defining the search space for a hyperparameter like learning rate or regularization strength, it is standard practice to use a log-uniform distribution (e.g., from to) rather than a uniform distribution. What is the primary optimization-related justification for this?$

Hyperparameter optimization techniques Hard

A.

This practice prevents the optimizer from sampling a value of exactly zero, which can cause mathematical errors.

B.

The impact of these hyperparameters is often multiplicative, meaning changes in magnitude (e.g., from to) are more important than changes in absolute value (e.g., from $0.09$ to $0.1$).

C.

Log-uniform distributions are computationally cheaper for random sampling algorithms to process.

D.

Uniform distributions are only suitable for integer-valued hyperparameters, not continuous ones.

53 $While Random Search is generally more efficient than Grid Search, in which specific, albeit rare, scenario could Grid Search theoretically outperform Random Search given the same number of function evaluations?$

Grid search vs random search limitations Hard

A.

A problem where the objective function is highly non-convex with many local minima.

B.

A high-dimensional problem where most hyperparameters are irrelevant.

C.

Any problem where the hyperparameter search space contains only categorical variables.

D.

A low-dimensional problem (e.g., 2D) where the objective function's iso-performance contours are perfectly aligned with the grid axes, and the user has prior knowledge to place the grid optimally.

54 $When applying a genetic algorithm to a hyperparameter space with mixed data types (e.g., continuous learning rate, integer number of layers, categorical activation function), what is a primary challenge that a naive implementation of a standard crossover operator (like single-point crossover) would face?$

Evolutionary hyperparameter tuning Hard

A.

It would cause the algorithm to converge much faster than mutation-only approaches, leading to premature convergence.

B.

It can produce invalid offspring. For example, averaging a 'ReLU' and 'Sigmoid' category is nonsensical, and crossing over bit representations of floats can lead to values outside the desired range.

C.

It would systematically decrease the fitness of the population over time due to a loss of genetic diversity.

D.

Crossover is only defined for binary representations and cannot be used for continuous or integer values.

55 $The AdaBoost algorithm is an ensemble method that sequentially adds weak learners. The optimization objective at each step is to train a new learner that focuses on the instances that previous learners misclassified. How is this re-focusing mathematically achieved during the optimization of the subsequent weak learner?$

Optimization for ensemble learning Hard

A.

By using a different loss function for each subsequent learner in the sequence.

B.

By removing all correctly classified instances from the training set for the next learner.

C.

By training each new learner on a bootstrap sample of the original data, with misclassified points having a higher probability of being selected.

D.

By increasing the weights of the misclassified instances in the training set, forcing the new learner to pay more attention to them to minimize the weighted training error.

56 $You are tasked with tuning a model that has a large conditional hyperparameter space (e.g., an SVM where choosing a kernel activates a different subset of parameters like gamma, degree, or coef0). Which class of HPO algorithms is most naturally suited to handle this structured search space without requiring manual encoding or separate optimization runs?$

Hyperparameter optimization techniques Hard

A.

Random Search, by randomly sampling a condition and then its associated parameters.

B.

Tree-based model-based optimization methods, such as those using Tree-structured Parzen Estimators (TPE).

C.

Standard genetic algorithms with a flattened representation of all possible hyperparameters.

D.

Grid Search, by defining a separate grid for each possible condition.

57 $A modern AutoML system is used for a critical application and produces a model with state-of-the-art predictive accuracy. However, a post-hoc analysis reveals the model is a 'black box' that heavily relies on uninterpretable features, making it impossible to audit for fairness or debug unexpected failures. This scenario highlights which critical limitation of a purely optimization-driven AutoML approach?$

Introduction to automated machine learning (AutoML) Hard

A.

The dataset was not large enough for the AutoML system to find a simpler, more interpretable model.

B.

The optimization algorithm used by the AutoML system was flawed and overfitted to the accuracy metric.

C.

AutoML systems often optimize for a single metric (e.g., accuracy), potentially at the expense of other crucial non-functional requirements like interpretability, fairness, and robustness.

D.

AutoML systems are incapable of producing models that are interpretable.

58 $When using a Gaussian Process (GP) as a surrogate in Bayesian Optimization, the choice of kernel function is crucial. If we have a strong prior belief that the objective function is very smooth and that hyperparameters that are close in Euclidean distance should have similar performance, but we have no knowledge about its periodicity or structure, what would be the most standard and robust kernel choice?$

Bayesian optimization (conceptual) Hard

A.

Matérn kernel (with), as it provides a good balance between smoothness and flexibility without being infinitely smooth like the RBF kernel.

B.

A periodic kernel, as it can capture cyclical patterns in the hyperparameter space.

C.

Radial Basis Function (RBF) / Squared Exponential kernel, as it assumes infinite differentiability, which is too strong an assumption without specific knowledge.

D.

A linear kernel, as it is the simplest model and avoids overfitting the surrogate.

59 $In the context of evolutionary hyperparameter tuning for deep learning models, what is a primary advantage of evolving a learning rate schedule (e.g., the parameters for a cyclical or decay schedule) rather than evolving a single, fixed learning rate?$

Evolutionary hyperparameter tuning Hard

A.

It significantly reduces the number of hyperparameters to be optimized, simplifying the search space.

B.

It guarantees that the training process will never diverge, regardless of the schedule parameters chosen.

C.

Evolving a single fixed learning rate is an NP-hard problem, whereas evolving a schedule is computationally tractable.

D.

It allows the optimization process to find policies that can navigate complex loss landscapes more effectively, such as starting with a high learning rate for exploration and reducing it later for fine-tuning and convergence.

60 $Let be the number of trials in a Random Search. The probability of a trial falling into a desired quantile of the search space volume (e.g., the top 5%, so) is simply . The probability that at least one of trials falls in this region is . What is the most important practical implication of this formula for hyperparameter optimization?$

Grid search vs random search limitations Hard

A.

The number of trials required to achieve a high probability of success is independent of the number of dimensions, depending only on the desired probability and the size of the optimal region.

B.

It demonstrates that a larger search space volume requires proportionally more trials to find a good solution.

C.

It shows that the probability of success decreases exponentially with the number of dimensions in the search space.

D.

It proves that Random Search will always find a better solution than Grid Search.

Unit 5 - Practice Quiz