1 $What is the primary goal of a regression algorithm?$

Difference between regression and classification Easy

A.

To cluster data into groups.

B.

To reduce the dimensionality of the data.

C.

To classify data into discrete categories.

D.

To predict a continuous numerical value.

2 $Which of the following problems is an example of regression?$

Difference between regression and classification Easy

A.

Identifying if an email is spam or not spam.

B.

Categorizing news articles into topics like 'sports' or 'politics'.

C.

Predicting the price of a house based on its features.

D.

Recognizing a handwritten digit as 0 through 9.

3 $A regression model with high bias makes strong assumptions about the data and is likely to...?$

Bias-variance considerations in regression Easy

A.

Have high variance.

B.

Overfit the data.

C.

Perfectly fit the data.

D.

Underfit the data.

4 $A model that performs extremely well on the training data but poorly on unseen test data is said to have...?$

Bias-variance considerations in regression Easy

A.

High variance.

B.

Low complexity.

C.

A good bias-variance tradeoff.

D.

High bias.

5 $In Simple Linear Regression, how many independent (predictor) variables are used to predict the dependent (target) variable?$

Simple Linear Regression Easy

A.

Exactly one.

B.

One or more.

C.

Exactly two.

D.

Zero.

6 $In the Simple Linear Regression equation, what does the term represent?$

Simple Linear Regression Easy

A.

The y-intercept of the regression line.

B.

The slope of the regression line.

C.

The predicted value of y.

D.

The error term.

7 $What is the main difference between Multiple Linear Regression and Simple Linear Regression?$

Multiple Linear Regression Easy

A.

Multiple Linear Regression models non-linear relationships, while Simple Linear Regression models linear ones.

B.

Multiple Linear Regression is used for classification, while Simple Linear Regression is for regression.

C.

Multiple Linear Regression uses two or more independent variables, while Simple Linear Regression uses only one.

D.

Simple Linear Regression is always more accurate.

8 $Which equation represents a Multiple Linear Regression model with two predictor variables, and ?$

Multiple Linear Regression Easy

A.

B.

C.

D.

9 $In a multiple regression model for predicting salary, the coefficient for the 'Years of Experience' feature is +5000. What is the correct interpretation of this coefficient?$

Interpretation of coefficients Easy

A.

The maximum possible salary is $5000.

B.

Holding all other features constant, for each additional year of experience, the predicted salary increases by $5000.

C.

The average salary for all individuals is $5000.

D.

A person with 0 years of experience will have a salary of $5000.

10 $If a regression coefficient for a variable is zero, what does this imply about the model's prediction?$

Interpretation of coefficients Easy

A.

There is no linear relationship between that variable and the target variable.

B.

The model is underfitting.

C.

The data for that variable contains errors.

D.

The variable is the most important predictor.

11 $Which of the following are two common types of regularized linear regression models?$

Regularized Regression models Easy

A.

Decision Tree and Random Forest Regression.

B.

Ridge and Lasso Regression.

C.

Linear and Logistic Regression.

D.

K-Means and DBSCAN Regression.

12 $What is the primary motivation for using regularized regression instead of standard linear regression?$

Regularized Regression models Easy

A.

To speed up the model training process significantly.

B.

To prevent overfitting by penalizing large coefficients.

C.

To handle categorical variables automatically.

D.

To ensure the model always finds a perfect fit.

13 $In Ridge or Lasso regression, what happens to the model's coefficients as the regularization hyperparameter (or) is increased?$

Effect of regularization on model complexity Easy

A.

The magnitudes of the coefficients are pushed towards zero.

B.

The coefficients remain unchanged.

C.

The magnitudes of the coefficients grow larger.

D.

The y-intercept is pushed to zero.

14 $Which regularization technique has the ability to shrink some coefficients to exactly zero, thus performing automatic feature selection?$

Effect of regularization on model complexity Easy

A.

Principal Component Regression.

B.

Ridge Regression (L2).

C.

Lasso Regression (L1).

D.

Polynomial Regression.

15 $Why would a data scientist use polynomial feature expansion with a linear regression model?$

Polynomial feature expansion Easy

A.

To model non-linear relationships between features and the target.

B.

To reduce the number of features in the dataset.

C.

To ensure all features are on the same scale.

D.

To convert categorical features into numerical ones.

16 $If you start with a single feature and apply a polynomial feature expansion of degree 2, what new feature will be added to your model (in addition to the original feature and an intercept)?$

Polynomial feature expansion Easy

A.

B.

C.

D.

17 $Which of the following is a classic example of a tree-based model that can be used for regression?$

Tree-Based regression models Easy

A.

K-Means Clustering.

B.

Logistic Regression.

C.

Support Vector Machine (for classification).

D.

Decision Tree Regressor.

18 $In a basic regression tree, what value is typically predicted for a new data point that falls into a specific leaf node?$

Tree-Based regression models Easy

A.

The most frequent target value in that leaf.

B.

The average of the target values of all training samples in that leaf.

C.

A class label like 'A' or 'B'.

D.

The coefficient of the most important feature.

19 $What is the defining characteristic of time-series data?$

Time-series Regression models Easy

A.

The data is always normally distributed.

B.

The data contains many categorical features.

C.

The data points are ordered chronologically.

D.

The data has no target variable.

20 $In the context of a time-series regression for predicting today's sales (), what would a 'lag-1' feature typically be?$

Time-series Regression models Easy

A.

Tomorrow's sales ()

B.

The average of all past sales.

C.

The sales from the same day last year.

D.

Yesterday's sales ()

21 $A data scientist is building a model to predict the exact amount of rainfall (in millimeters) for the next day. A colleague suggests they use Logistic Regression. Why is this suggestion inappropriate for the problem as stated?$

Difference between regression and classification Medium

A.

Because Logistic Regression assumes a linear relationship, which is unlikely for rainfall prediction.

B.

Because rainfall data often contains outliers, which Logistic Regression cannot handle.

C.

Because Logistic Regression is a classification algorithm that predicts a probability or a discrete class, not a continuous value.

D.

Because Logistic Regression is computationally more expensive than linear regression models.

22 $A regression model has been trained and evaluated. It shows a very low training error (RMSE of 5.5) but a very high validation error (RMSE of 50.2). Which of the following best describes the model's condition and a suitable remedy?$

Bias-variance considerations in regression Medium

A.

High variance (overfitting); apply L2 regularization or reduce model complexity.

B.

High bias (underfitting); increase model complexity by adding polynomial features.

C.

High bias (underfitting); simplify the model by removing features.

D.

High variance (overfitting); gather more training data with different features.

23 $You have built a simple linear regression model to predict house prices based on square footage. The model's R-squared () value is 0.65. How should this value be interpreted?$

Simple Linear Regression Medium

A.

For every 1 square foot increase, the price increases by 65%.

B.

The correlation between house price and square footage is 0.65.

C.

The model's predictions are correct 65% of the time.

D.

65% of the variability in house prices can be explained by the square footage.

24 $In a multiple linear regression model, you notice that the p-values for two features, 'years_of_experience' and 'age', are very high, suggesting they are not statistically significant. However, when you remove either one, the other's p-value becomes very low (significant). What is the most likely cause of this phenomenon?$

Multiple Linear Regression Medium

A.

The model is suffering from high bias (underfitting).

B.

Multicollinearity between 'years_of_experience' and 'age'.

C.

Heteroscedasticity in the model's residuals.

D.

Non-linear relationships between the features and the target.

25 $A multiple linear regression model is built to predict the 'price' of a used car. The fitted model is: . Where 'age' is in years, 'mileage' is in miles, and 'is_luxury' is a binary variable (1 if luxury, 0 otherwise). What is the correct interpretation of the coefficient for 'age'?$

Interpretation of coefficients Medium

A.

A car that is one year older is worth $200 less than a brand new car.

B.

The coefficient is negative, which indicates an error in the model.

C.

For each additional year of age, the car's price decreases by $200.

D.

Holding mileage and luxury status constant, for each additional year of age, the car's price is predicted to decrease by $200.

26 $You are working on a regression problem with 100 features, and you suspect that many of them are redundant or irrelevant. You want to build a model that automatically performs feature selection. Which regularization technique would be most suitable for this specific goal?$

Regularized Regression models Medium

A.

Principal Component Regression (PCR)

B.

Lasso Regression (L1)

C.

Ridge Regression (L2)

D.

Elastic Net Regression with a high L2 ratio

27 $Consider a Ridge regression model. What is the effect of increasing the regularization parameter alpha (or lambda,) towards infinity ()?$

Effect of regularization on model complexity Medium

A.

The model coefficients will all be forced towards zero, resulting in a model that only predicts the mean of the target variable.

B.

The model will be identical to an unregularized Ordinary Least Squares model.

C.

The model's coefficients will grow infinitely large, causing numerical instability.

D.

The model will become perfectly fit to the training data, resulting in zero training error.

28 $A scatter plot of your single feature 'X' against your target 'y' shows a clear U-shaped (parabolic) relationship. You fit a simple linear regression model () and find it has a very high error. What is the most appropriate next step to improve the model?$

Polynomial feature expansion Medium

A.

Create a new feature and fit the model .

B.

Gather more data points for the existing feature 'X'.

C.

Apply L1 regularization to the existing linear model.

D.

Transform the target variable 'y' using a logarithm.

29 $How does a Random Forest Regressor typically improve upon a single, fully-grown Decision Tree Regressor?$

Tree-Based regression models Medium

A.

It is much more interpretable than a single decision tree.

B.

It is guaranteed to find the globally optimal set of splits.

C.

It reduces bias by growing deeper trees than a single decision tree.

D.

It reduces variance by averaging the predictions of many decorrelated trees.

30 $You are modeling monthly sales data and notice a strong seasonal pattern that repeats every 12 months, as well as an upward trend over time. Which of the following models is explicitly designed to handle both trend and seasonality?$

Time-series Regression models Medium

A.

A standard Multiple Linear Regression with time as a feature

B.

An Autoregressive (AR) model

C.

SARIMA (Seasonal Autoregressive Integrated Moving Average)

D.

A simple Moving Average (MA) model

31 $A data scientist is trying to decide on the degree for a polynomial regression model. They find that a degree-2 polynomial has a validation RMSE of 15. A degree-10 polynomial has a training RMSE of 2 but a validation RMSE of 40. What does the performance of the degree-10 model indicate?$

Bias-variance considerations in regression Medium

A.

The model has achieved the optimal bias-variance tradeoff.

B.

The model has very high bias and very low variance.

C.

The model has very low bias but very high variance.

D.

The model has very high bias and very high variance.

32 $When building a multiple linear regression model, you add a new feature that is completely uncorrelated with the target variable. What is the likely effect on the model's and Adjusted ?$

Multiple Linear Regression Medium

A.

Both and Adjusted will increase significantly.

B.

will slightly increase or stay the same, while Adjusted will likely decrease.

C.

will decrease, while Adjusted will increase.

D.

Both and Adjusted will decrease.

33 $After fitting a simple linear regression model, you create a residual plot (residuals vs. fitted values). You observe that the points form a distinct funnel shape, widening as the fitted values increase. Which assumption of linear regression is most clearly violated?$

Simple Linear Regression Medium

A.

Linearity

B.

Independence of errors

C.

Homoscedasticity (constant variance of errors)

D.

Normality of residuals

34 $In a regression model predicting employee salary, one of the predictors is 'Department', a categorical feature with three levels: 'Sales', 'HR', and 'Engineering'. 'Sales' is used as the reference category. The fitted model has a coefficient of +15000 for the 'Department_Engineering' dummy variable. What is the correct interpretation?$

Interpretation of coefficients Medium

A.

The average salary in the Engineering department is $15,000.

B.

Moving an employee from Sales to Engineering is predicted to increase their salary by $15,000.

C.

The predicted salary for an employee in Engineering is, on average, $15,000 higher than for a similar employee in Sales.

D.

The predicted salary for an employee in Engineering is, on average, $15,000 higher than for a similar employee in HR.

35 $A Lasso regression model is trained on a dataset. When the regularization strength is set to a very small, non-zero value, most coefficients are large. As is gradually increased, what is the expected behavior of the model's coefficients?$

Effect of regularization on model complexity Medium

A.

Only the smallest coefficients will be set to zero, while the largest ones remain unchanged.

B.

All coefficients will shrink towards zero proportionally but will never reach it.

C.

The coefficients will be randomly set to zero based on the value of .

D.

The magnitudes of all coefficients will shrink, and some will become exactly zero.

36 $You have a dataset with highly correlated features. You decide to use a regularized regression model to prevent overfitting and improve stability. Why might Ridge Regression be a better choice than Lasso Regression in this specific scenario?$

Regularized Regression models Medium

A.

Lasso is unable to handle multicollinearity and will fail to converge.

B.

Ridge tends to shrink the coefficients of correlated features towards each other, keeping all of them, while Lasso might arbitrarily pick one and eliminate the others.

C.

Ridge is computationally faster than Lasso when there are many features.

D.

Ridge can perform automatic feature selection, which is useful for correlated features.

37 $What is the primary risk associated with using a very high-degree polynomial (e.g., degree 20) in a polynomial regression model?$

Polynomial feature expansion Medium

A.

The computational cost will be too high for most modern computers to handle.

B.

The model's coefficients will be difficult to interpret due to multicollinearity between polynomial terms.

C.

The model is very likely to overfit the training data, leading to poor generalization on new data.

D.

The model will be unable to capture complex non-linear relationships.

38 $When a Decision Tree Regressor makes a prediction for a new, unseen data point, how is the prediction value determined?$

Tree-Based regression models Medium

A.

The prediction is determined by a linear regression model fitted on the training samples within the final leaf node.

B.

The prediction is the target value of the single closest training sample in the feature space.

C.

The new data point is passed down the tree, and the prediction is the average of the target values of all training samples in the leaf node it reaches.

D.

The tree calculates the weighted average of the target values of all training samples, with weights determined by the path taken.

39 $An analyst is using an Autoregressive model of order p, AR(p), to forecast a time series. What is the fundamental principle of an AR(p) model?$

Time-series Regression models Medium

A.

It predicts the future value of the series as a linear combination of its own 'p' most recent past values.

B.

It predicts the future value of the series based on the past 'p' forecast errors (shocks).

C.

It predicts the future value by differencing the series 'p' times to make it stationary.

D.

It predicts the future value as a function of 'p' external predictor variables.

40 $Which pair of evaluation metrics is most appropriate for a regression task versus a binary classification task, respectively?$

Difference between regression and classification Medium

A.

Accuracy and Mean Absolute Error (MAE)

B.

R-squared and Precision

C.

Root Mean Squared Error (RMSE) and Area Under the ROC Curve (AUC)

D.

Mean Squared Error (MSE) and Log Loss

41 $In a multiple linear regression scenario with two highly correlated predictor variables, and (correlation), both having a true positive relationship with the target, how would the estimated coefficients and likely behave in a Ridge regression versus a Lasso regression as the regularization strength is increased?$

Regularized Regression models Hard

A.

Both Ridge and Lasso will shrink both coefficients towards zero at exactly the same rate, maintaining their initial ratio.

B.

Lasso will shrink both coefficients towards each other and then towards zero. Ridge is likely to arbitrarily drive one coefficient to zero while keeping the other.

C.

Both Ridge and Lasso will drive one coefficient to zero and keep the other, as this is the optimal way to handle multicollinearity.

D.

Ridge will shrink both coefficients towards each other and then towards zero. Lasso is likely to arbitrarily drive one coefficient to zero while keeping the other.

42 $You are fitting a polynomial regression model to a dataset where the true underlying relationship is a simple linear function but the data has a high level of irreducible error (high variance noise). You fit two models: a degree-1 polynomial (linear) and a degree-10 polynomial. Which statement most accurately describes the bias and variance of the degree-10 model compared to the degree-1 model?$

Bias-variance considerations in regression Hard

A.

The degree-10 model will have low variance but high bias, as it over-simplifies the noisy data by fitting a complex curve.

B.

The degree-10 model will have low bias on the training set but very high variance, leading to poor generalization on a test set.

C.

The degree-10 model will have low bias and low variance because its flexibility allows it to perfectly model both the trend and the noise.

D.

The degree-10 model will have high bias and high variance because its complexity prevents it from capturing the simple true trend.

43 $A Gradient Boosting Regressor (GBR) and a Random Forest Regressor (RFR) are trained on the same dataset. The GBR is trained with a small learning rate and a large number of estimators, while the RFR is trained with deep, unpruned trees. If both models show signs of overfitting, how does the nature of this overfitting typically differ between the two models?$

Tree-Based regression models Hard

A.

The GBR's overfitting is due to high variance in its individual weak learners. The RFR's overfitting is due to a systematic bias introduced by the bagging process.

B.

The GBR's overfitting is primarily due to reducing bias to an extremely low level at the cost of increased variance, while the RFR's overfitting is due to averaging many low-bias, high-variance models where the variance is not reduced enough.

C.

Both models overfit primarily by reducing variance at the cost of bias.

D.

The GBR overfits by sequentially fitting to residuals, which leads to high bias. The RFR overfits by creating trees that are too simple, leading to high variance.

44 $A multiple regression model is used to predict house prices: . The fitted model yields a statistically significant coefficient . How should this coefficient be interpreted?$

Interpretation of coefficients Hard

A.

For every additional year of age, the house price is expected to decrease by $2.5, holding size constant.

B.

For every additional square foot of size, the house price is expected to decrease by $2.5, holding age constant.

C.

For every additional year of age, the expected marginal effect of an additional square foot of size on price decreases by $2.5.

D.

The model is misspecified because an interaction between size and age cannot have a negative effect on price.

45 $You are building a linear regression model to forecast sales () using advertising spend () as a predictor. After fitting the model, you perform a Durbin-Watson test on the residuals and get a test statistic of 0.35. What is the most critical implication of this result for your model's statistical inference?$

Time-series Regression models Hard

A.

The model suffers from severe multicollinearity, making the coefficient estimates unreliable.

B.

The residuals are not normally distributed, which invalidates the t-tests and F-tests for coefficient significance.

C.

The residuals exhibit strong positive autocorrelation, which violates the independence of errors assumption and leads to underestimated standard errors of the coefficients.

D.

The relationship between sales and advertising spend is non-linear, meaning the model has high bias.

46 $In the context of regularized linear regression (like Ridge or Lasso), how does the concept of "effective degrees of freedom" change as the regularization parameter is increased from 0 to infinity?$

Effect of regularization on model complexity Hard

A.

It increases monotonically from 0 towards .

B.

It first increases as the model finds important features and then decreases.

C.

It decreases monotonically from (the number of predictors) towards 0.

D.

It remains constant at regardless of the value of .

47 $You are building a multiple linear regression model. You include predictors for a person's weight in kilograms () and their weight in pounds (). Assuming no measurement error (), what is the precise mathematical consequence for the Ordinary Least Squares (OLS) estimation process?$

Multiple Linear Regression Hard

A.

The design matrix becomes singular and its inverse does not exist, so no unique solution for the coefficient vector can be found.

B.

The model will produce coefficient estimates, but their standard errors will be infinitely large, making them useless.

C.

The R-squared of the model will be artificially inflated to 1.0, regardless of the target variable.

D.

The OLS algorithm in most software will automatically detect the collinearity and drop one of the two variables.

48 $Consider a regression problem where you apply a polynomial feature expansion of degree 5 to a single feature . You then train a Ridge regression model on these 5 derived features (). What is the primary effect of a very large regularization parameter on the resulting regression function ?$

Polynomial feature expansion Hard

A.

The function will still be a complex 5th-degree polynomial but with significantly smaller oscillations.

B.

The function will approximate a constant function (the mean of the target).

C.

The function will approximate a simple linear function of (a line, but not necessarily flat).

D.

The function will become exactly zero for all, i.e., .

49 $According to the Gauss-Markov theorem, the Ordinary Least Squares (OLS) estimator for the coefficients in a simple linear regression model is the Best Linear Unbiased Estimator (BLUE). What does "Best" in this context specifically refer to?$

Simple Linear Regression Hard

A.

It has the minimum sampling variance among all linear unbiased estimators.

B.

It provides the highest possible R-squared value for the training data.

C.

It is robust to violations of the normality of errors assumption.

D.

It is the most computationally efficient estimator to calculate.

50 $You are tasked with modeling the number of customer support tickets received per hour. The target variable is a non-negative integer (0, 1, 2, ...). A colleague suggests using a Poisson regression model. How does this model blur the line between typical regression and classification tasks?$

Difference between regression and classification Hard

A.

It is neither regression nor classification; it belongs to a separate category of 'counting models' that have no overlap with either.

B.

It is purely a regression task because it uses a generalized linear model framework to predict an expected value.

C.

It is purely a classification task because the output is from a discrete, ordered set of integers.

D.

It predicts a continuous rate parameter () for a count distribution, but the ultimate output variable is discrete, sharing characteristics with both regression (predicting a numeric value) and classification (predicting from a set of integer classes).

51 $When tuning an XGBoost regressor, you observe that decreasing the eta (learning rate) parameter significantly improves the model's performance on a validation set, but only if you also substantially increase the n_estimators parameter. Why is this combined adjustment necessary for improved performance?$

Tree-Based regression models Hard

A.

A smaller eta forces the model to focus only on the most important features, and more trees are needed to eventually consider all features.

B.

The eta and n_estimators parameters are inversely proportional by definition in the XGBoost algorithm to maintain a constant model complexity.

C.

A smaller eta increases the variance of each individual tree, which must be compensated for by averaging more trees (n_estimators).

D.

A smaller eta makes each tree contribute less to the final prediction, requiring more trees (n_estimators) to reach a good cumulative model. This slower, more gradual learning process is less likely to overfit.

52 $You are working with a dataset that has a very large number of features (), many of which are highly correlated with each other in groups. You want to perform feature selection, but also want to avoid arbitrarily discarding correlated features that might be collectively predictive. Which regression model is explicitly designed to handle this "grouping effect" among correlated features?$

Regularized Regression models Hard

A.

Elastic Net Regression

B.

Ridge Regression

C.

Principal Component Regression (PCR)

D.

Lasso Regression

53 $A financial services company is building a regression model to predict the exact dollar amount of a potential loan default. The cost of under-predicting the default amount is extremely high, while the cost of over-predicting is relatively low. Given this asymmetric cost function, what kind of model characteristic should be prioritized during development, even if it harms standard metrics like Mean Squared Error (MSE)?$

Bias-variance considerations in regression Hard

A.

A model that focuses exclusively on minimizing the irreducible error through better data collection.

B.

A model that minimizes the Median Absolute Error instead of the Mean Squared Error, as it's more robust.

C.

A model with higher bias and lower variance, as simpler, more stable models are always preferable in finance.

D.

A model with lower bias and potentially higher variance, as its flexibility is needed to capture extreme high-default events.

54 $In a regression model, both the independent variable (e.g., advertising spend) and the dependent variable (e.g., sales) are log-transformed: . The fitted coefficient is . How is this coefficient correctly interpreted in a practical sense?$

Interpretation of coefficients Hard

A.

A 1-unit increase in is associated with a 0.8% increase in .

B.

A 1% increase in is associated with an expected 0.8% increase in .

C.

A 1% increase in is associated with a 0.8-unit increase in .

D.

A 1-unit increase in is associated with an 0.8-unit increase in .

55 $You are analyzing the coefficient paths of a Lasso regression as the regularization parameter varies. The path plot shows the magnitude of each coefficient as a function of . What critical information for model selection can be derived from the order in which coefficients become non-zero as decreases from a very large value towards zero?$

Effect of regularization on model complexity Hard

A.

It indicates which features are most correlated with each other, as their paths will have identical slopes.

B.

It provides a data-driven ranking of feature importance, as stronger predictors tend to enter the model (become non-zero) at higher levels of regularization.

C.

It shows the optimal value of directly, which is the point where the first coefficient becomes non-zero.

D.

It determines the sign (positive or negative) of the relationship, which is fixed regardless of once the coefficient is non-zero.

56 $In a multiple linear regression analysis, what is the key distinction between a high-leverage point and a high-influence outlier?$

Multiple Linear Regression Hard

A.

High-leverage and high-influence are synonymous terms for any outlier that significantly affects the regression line's slope or intercept.

B.

A high-leverage point has an extreme value for the target variable (), while a high-influence outlier has extreme values for predictor variables ().

C.

A high-leverage point is always influential, but a high-influence outlier may not necessarily have high leverage.

D.

A high-leverage point has an extreme value for one or more predictor variables (), while a high-influence outlier is a point that, if removed, would cause a large change in the regression model's coefficients. A point can have high leverage without being influential.

57 $You are creating polynomial features of degree 3 from a set of 10 original features. The original features have vastly different scales (e.g., age from 20-60, income from 50,000-200,000). Why is it critically important to scale the original features before applying the polynomial expansion?$

Polynomial feature expansion Hard

A.

Because polynomial expansion is only mathematically defined for features scaled between 0 and 1.

B.

To reduce the total number of polynomial features generated, as scaling can merge redundant terms.

C.

To ensure that the resulting design matrix is orthogonal, which simplifies the calculation of the OLS coefficients.

D.

To prevent features with large scales from numerically dominating the polynomial terms, which can cause instability in model fitting algorithms and render regularization ineffective for small-scale features.

58 $You are building an autoregressive model to forecast a time series. You notice the series has a clear upward trend and a strong seasonal pattern. What is the most severe consequence of fitting a standard AR(p) model directly to this non-stationary data without any transformation?$

Time-series Regression models Hard

A.

The model will perfectly fit the training data but will only be able to forecast the mean of the series for all future time steps.

B.

The model will likely produce a spurious regression, where variables appear to have a statistically significant relationship that is driven by the common trend, not a true causal link, leading to unreliable forecasts.

C.

The model will fail to compute because the time-series matrix will be singular due to the deterministic trend.

D.

The model's residuals will be perfectly normally distributed, but the coefficient estimates will be biased towards zero due to the trend.

59 $You have trained a Random Forest Regressor on a dataset where the feature X ranges from 0 to 100. The model has learned the relationship well within this range. What value will the trained model most likely predict for a new data point with X = 200 ?$

Tree-Based regression models Hard

A.

A value close to the average prediction for the training instances where X was at its maximum observed value (near 100).

B.

It will extrapolate the learned trend linearly and predict a value significantly higher than any seen in the training data.

C.

The model will return a NaN or an error because the value is outside the training domain.

D.

A value close to the overall average of the target variable across the entire training set.

60 $The ability of Lasso regression (penalty) to produce sparse models (i.e., set some coefficients to exactly zero) is often explained by its geometric interpretation. In a two-coefficient case (), what is the key geometric property of the Lasso constraint region () that leads to this sparsity?$

Regularized Regression models Hard

A.

The constraint region is an unbounded square, which allows coefficients to be pushed to exactly zero without violating the constraint.

B.

The constraint region is a rhombus with sharp corners at the axes. The elliptical contours of the residual sum of squares (RSS) are likely to make their first contact with the constraint region at one of these corners.

C.

The constraint region is a circle (), which allows the RSS contours to touch tangentially at a point where both coefficients are non-zero.

D.

The constraint region is a non-convex shape, which creates multiple local minima, some of which are on the axes where coefficients are zero.

Unit 4 - Practice Quiz