1Which of the following best describes the fundamental difference between Regression and Classification?
A.Regression predicts discrete class labels, while classification predicts continuous values.
B.Regression predicts continuous quantities, while classification predicts discrete class labels.
C.Regression uses unsupervised learning, while classification uses supervised learning.
D.Regression requires categorical features, while classification requires numerical features.
Correct Answer: Regression predicts continuous quantities, while classification predicts discrete class labels.
Explanation:Regression models target a continuous numerical output (e.g., house price), whereas classification models target a categorical or discrete label (e.g., spam or not spam).
Incorrect! Try again.
2Which loss function is most commonly used for Simple Linear Regression?
A.Cross-Entropy Loss
B.Hinge Loss
C.Mean Squared Error (MSE)
D.Kullback-Leibler Divergence
Correct Answer: Mean Squared Error (MSE)
Explanation:Linear regression typically minimizes the residual sum of squares, which is equivalent to minimizing the Mean Squared Error (MSE) defined as .
Incorrect! Try again.
3In the context of the Bias-Variance trade-off, what does high bias usually indicate?
A.The model is overfitting the training data.
B.The model is too complex and captures random noise.
C.The model is underfitting and fails to capture the underlying trend.
D.The model has high variance across different training sets.
Correct Answer: The model is underfitting and fails to capture the underlying trend.
Explanation:High bias implies that the algorithm makes strong simplifying assumptions (like assuming data is linear when it is quadratic), leading to underfitting.
Incorrect! Try again.
4Which evaluation metric is defined as ?
A.Root Mean Squared Error (RMSE)
B.R-squared ()
C.Mean Absolute Error (MAE)
D.Adjusted R-squared
Correct Answer: R-squared ()
Explanation:The score (coefficient of determination) represents the proportion of the variance for the dependent variable that's explained by an independent variable or variables in a regression model.
Incorrect! Try again.
5In Simple Linear Regression represented by , what does represent?
A.The y-intercept when .
B.The irreducible error term.
C.The change in for a one-unit increase in .
D.The variance of the error term.
Correct Answer: The change in for a one-unit increase in .
Explanation: is the slope coefficient, indicating how much the target variable changes when the predictor variable increases by exactly one unit.
Incorrect! Try again.
6What happens to the variance of a regression model as model complexity increases?
A.Variance decreases.
B.Variance increases.
C.Variance remains constant.
D.Variance becomes zero.
Correct Answer: Variance increases.
Explanation:As a model becomes more complex (e.g., higher-degree polynomial), it becomes more sensitive to specific fluctuations in the training data, leading to higher variance.
Incorrect! Try again.
7Which of the following is a key assumption of Ordinary Least Squares (OLS) regression?
A.The relationship between X and Y is non-linear.
B.The residuals (errors) have constant variance (Homoscedasticity).
C.The independent variables must be highly correlated with each other.
D.The target variable must be categorical.
Correct Answer: The residuals (errors) have constant variance (Homoscedasticity).
Explanation:Homoscedasticity is the assumption that the variance of error terms is constant across all levels of the independent variables.
Incorrect! Try again.
8In Multiple Linear Regression, what problem arises when independent variables are highly correlated with each other?
A.Homoscedasticity
B.Multicollinearity
C.Underfitting
D.Non-stationarity
Correct Answer: Multicollinearity
Explanation:Multicollinearity occurs when independent variables in a regression model are highly correlated. This makes it difficult to determine the individual effect of each independent variable on the dependent variable.
Incorrect! Try again.
9How does Ridge Regression (L2 regularization) modify the OLS cost function?
A.It adds the sum of the absolute values of coefficients:
B.It adds the sum of the squared values of coefficients:
C.It multiplies the error by a factor of .
D.It adds a constant bias term to the prediction.
Correct Answer: It adds the sum of the squared values of coefficients:
Explanation:Ridge regression adds an L2 penalty term equal to the square of the magnitude of coefficients () to shrink the coefficients.
Incorrect! Try again.
10What is the primary advantage of Lasso Regression (L1 regularization) over Ridge Regression?
A.It always produces a higher score.
B.It is computationally faster to solve.
C.It can force coefficients to exactly zero, performing feature selection.
D.It works better when multicollinearity is not present.
Correct Answer: It can force coefficients to exactly zero, performing feature selection.
Explanation:Due to the geometry of the L1 penalty (diamond shape), Lasso regression tends to shrink less important coefficients to exactly zero, effectively selecting a subset of features.
Incorrect! Try again.
11The combination of L1 and L2 regularization is known as:
A.Ridge Regression
B.Lasso Regression
C.Elastic Net
D.Logistic Regression
Correct Answer: Elastic Net
Explanation:Elastic Net is a regularized regression method that linearly combines the L1 and L2 penalties of the Lasso and Ridge methods.
Incorrect! Try again.
12If the regularization parameter is set to 0 in Ridge or Lasso regression, the model becomes equivalent to:
A.A constant mean predictor.
B.Ordinary Least Squares (OLS) Linear Regression.
C.A Decision Tree Regressor.
D.Support Vector Regression.
Correct Answer: Ordinary Least Squares (OLS) Linear Regression.
Explanation:When , the penalty term vanishes, and the objective function essentially becomes minimizing the Residual Sum of Squares, which is standard OLS.
Incorrect! Try again.
13What is the impact of increasing the regularization parameter on model bias and variance?
Explanation:Increasing regularization restricts the model's flexibility. This reduces overfitting (variance) but simplifies the model assumptions, potentially leading to underfitting (bias).
Incorrect! Try again.
14Polynomial regression creates a non-linear decision boundary by:
A.Using a non-linear loss function.
B.Transforming the input features into higher-degree terms and fitting a linear model.
C.Using a tree-based structure.
D.Applying a sigmoid activation function to the output.
Correct Answer: Transforming the input features into higher-degree terms and fitting a linear model.
Explanation:Polynomial regression extends linear regression by adding interaction terms and powers of the original features (e.g., ) as new features, but the parameters are still estimated linearly.
Incorrect! Try again.
15In a regression model , how is the coefficient interpreted?
A.The value of when is zero.
B.The change in for a one-unit change in , holding constant.
C.The correlation between and .
D.The change in for a one-unit change in , regardless of .
Correct Answer: The change in for a one-unit change in , holding constant.
Explanation:In multiple regression, a coefficient represents the partial effect of that variable on the response, assuming all other variables in the model remain fixed.
Incorrect! Try again.
16Why is 'Adjusted ' often preferred over standard '' in Multiple Linear Regression?
A.Standard cannot be calculated for multiple variables.
B.Standard decreases when useful variables are added.
C.Standard never decreases when a new variable is added, even if it is irrelevant.
D.Adjusted is always between 0 and 1.
Correct Answer: Standard never decreases when a new variable is added, even if it is irrelevant.
Explanation:Adjusted penalizes the addition of extraneous predictors to the model, whereas standard will monotonically increase (or stay same) with every added feature regardless of its predictive power.
Incorrect! Try again.
17Which of the following is a characteristic of Tree-Based Regression models (e.g., Decision Trees)?
A.They assume a linear relationship between features and target.
Correct Answer: They produce piecewise constant predictions.
Explanation:Decision trees split the feature space into rectangles and predict a constant value (usually the mean of the training samples in that region) for each region, resulting in a piecewise constant output.
Incorrect! Try again.
18In a Decision Tree Regressor, what criterion is typically minimized to determine the best split?
A.Gini Impurity
B.Entropy
C.Variance (or MSE) of the target values in the child nodes.
D.Log-Likelihood
Correct Answer: Variance (or MSE) of the target values in the child nodes.
Explanation:For regression trees, the splitting criterion usually seeks to minimize the variance (or sum of squared errors) within the resulting leaf nodes.
Incorrect! Try again.
19What is 'lag' in the context of Time-Series Regression?
A.The time taken to train the model.
B.Using past values of the target variable () as features to predict the current value ().
C.The error between the predicted and actual value.
D.The seasonality period of the data.
Correct Answer: Using past values of the target variable () as features to predict the current value ().
Explanation:Lag features involve shifting the time series data so that past observations are used as input features for forecasting current or future observations.
Incorrect! Try again.
20Which issue is specific to Time-Series Regression that violates the standard independent and identically distributed (i.i.d) assumption?
A.Multicollinearity
B.Autocorrelation
C.Heteroscedasticity
D.High Dimensionality
Correct Answer: Autocorrelation
Explanation:In time series, current values are often correlated with past values (autocorrelation), violating the assumption that observations are independent of one another.
Incorrect! Try again.
21Consider the equation . This is an example of:
A.Multiple Linear Regression
B.Polynomial Regression
C.Logistic Regression
D.Ridge Regression
Correct Answer: Polynomial Regression
Explanation:The inclusion of the term makes this a polynomial regression model (specifically, quadratic).
Incorrect! Try again.
22When interpreting coefficients, if a variable has a P-value < 0.05, it generally means:
A.The variable has no effect on the target.
B.The variable is statistically significant in predicting the target.
C.The variable is highly correlated with other variables.
D.The variable causes overfitting.
Correct Answer: The variable is statistically significant in predicting the target.
Explanation:A low P-value (typically < 0.05) indicates that we can reject the null hypothesis that the coefficient is zero, suggesting a significant relationship.
Incorrect! Try again.
23What is the main risk of using a high-degree polynomial for regression without regularization?
A.High Bias
B.Low Variance
C.Overfitting
D.Underfitting
Correct Answer: Overfitting
Explanation:High-degree polynomials provide excessive flexibility, allowing the curve to pass through nearly every data point (including noise), resulting in severe overfitting.
Incorrect! Try again.
24Which method is commonly used to select the optimal regularization parameter ?
A.Gradient Descent
B.Cross-Validation
C.Maximum Likelihood Estimation
D.Calculating the derivative of the loss function
Correct Answer: Cross-Validation
Explanation:Grid search or randomized search combined with Cross-Validation is the standard approach to empirically find the hyperparameter that generalizes best to unseen data.
Incorrect! Try again.
25What is the closed-form solution (Normal Equation) for the coefficient vector in OLS?
A.
B.
C.
D.
Correct Answer:
Explanation:The Normal Equation provides the analytical solution for the least squares problem by minimizing the cost function with respect to .
Incorrect! Try again.
26Which regression technique is most robust to outliers?
A.Ordinary Least Squares (OLS)
B.L2 Regularized Regression (Ridge)
C.Decision Tree Regression
D.Polynomial Regression (High degree)
Correct Answer: Decision Tree Regression
Explanation:Decision trees are generally more robust to outliers than linear regression models because they split data based on thresholds and don't rely on global distance minimization (like squared errors) which penalizes outliers heavily.
Incorrect! Try again.
27In Time-Series regression, why is random shuffling of data for Train/Test split inappropriate?
A.It is computationally expensive.
B.It introduces look-ahead bias (data leakage).
C.It increases the variance of the model.
D.It reduces the dimensionality of the data.
Correct Answer: It introduces look-ahead bias (data leakage).
Explanation:Time-series data has a temporal order. Random splitting might allow the model to train on future data to predict past events, which is impossible in real-world forecasting (data leakage).
Incorrect! Try again.
28If the Residual Plot (residuals vs. predicted values) shows a funnel shape (fanning out), what assumption is violated?
A.Linearity
B.Independence
C.Homoscedasticity
D.Normality
Correct Answer: Homoscedasticity
Explanation:A funnel shape indicates that the variance of errors changes as the predicted value changes (Heteroscedasticity), violating the assumption of constant variance (Homoscedasticity).
Incorrect! Try again.
29Which of the following creates a 'stair-step' approximation of the regression function?
A.Linear Regression
B.Ridge Regression
C.Decision Trees
D.Polynomial Regression
Correct Answer: Decision Trees
Explanation:Decision trees partition the feature space into regions and assign a constant value to each region, resulting in a step-function or stair-step appearance.
Incorrect! Try again.
30What is the effect of feature scaling (Standardization) on Ordinary Least Squares (OLS) regression coefficients?
A.It changes the predictions ().
B.It changes the interpretation and magnitude of coefficients but not the model performance metrics ().
C.It is strictly required for OLS to work mathematically.
D.It prevents overfitting.
Correct Answer: It changes the interpretation and magnitude of coefficients but not the model performance metrics ().
Explanation:For OLS, scaling affects the scale of the coefficients (beta) but the resulting line/plane and predictions remain relatively the same. However, for Regularized regression (Ridge/Lasso), scaling is crucial.
Incorrect! Try again.
31Why is feature scaling essential for Regularized Regression (Ridge/Lasso)?
A.Because the penalty term is scale-dependent.
B.Because it uses Gradient Descent.
C.Because the closed-form solution doesn't exist.
D.To remove outliers.
Correct Answer: Because the penalty term is scale-dependent.
Explanation:Regularization penalties ( or ) penalize the magnitude of coefficients. If features have different scales, the penalty will unfairly affect features with smaller scales (larger coefficients) differently.
Incorrect! Try again.
32The Root Mean Squared Error (RMSE) is in the same unit as:
A.The square of the target variable.
B.The target variable.
C.The variance of the target variable.
D.Dimensionless (no units).
Correct Answer: The target variable.
Explanation:Since MSE involves squaring the errors, taking the square root (RMSE) brings the metric back to the original units of the target variable .
Incorrect! Try again.
33Which technique is specifically designed to handle seasonality in Time-Series Regression?
A.Adding polynomial features.
B.Using seasonal dummy variables or Fourier terms.
C.Increasing the regularization parameter.
D.Using Lasso regression.
Correct Answer: Using seasonal dummy variables or Fourier terms.
Explanation:Seasonality represents periodic patterns. These can be modeled by adding features like dummy variables for months/days or Fourier series components.
Incorrect! Try again.
34In the equation , the term is called:
A.A quadratic term.
B.An interaction term.
C.A bias term.
D.A regularization term.
Correct Answer: An interaction term.
Explanation:An interaction term () allows the effect of one independent variable () on to depend on the value of another independent variable ().
Incorrect! Try again.
35What does a negative coefficient for a predictor variable imply?
A.There is no relationship.
B.As the predictor increases, the target variable tends to decrease.
C.As the predictor increases, the target variable tends to increase.
D.The predictor is an error term.
Correct Answer: As the predictor increases, the target variable tends to decrease.
Explanation:A negative slope/coefficient indicates an inverse relationship between the independent variable and the dependent variable.
Incorrect! Try again.
36The term 'Residual' in regression refers to:
A.The predicted value.
B.The difference between the actual value and the predicted value ().
C.The mean of the target variable.
D.The intercept of the regression line.
Correct Answer: The difference between the actual value and the predicted value ().
Explanation:Residuals represent the error of the model for a specific data point, calculated as Observed minus Predicted.
Incorrect! Try again.
37Which algorithm creates an ensemble of regression trees to improve performance and reduce variance?
A.Simple Linear Regression
B.Random Forest Regression
C.Logistic Regression
D.K-Means Clustering
Correct Answer: Random Forest Regression
Explanation:Random Forest constructs a multitude of decision trees at training time and outputs the average prediction of the individual trees, effectively reducing variance.
Incorrect! Try again.
38Stationarity in time-series data implies:
A.The mean and variance change over time.
B.The statistical properties (mean, variance) are constant over time.
C.The data is strictly linear.
D.There are no missing values.
Correct Answer: The statistical properties (mean, variance) are constant over time.
Explanation:Many time-series models (like ARIMA, though regression can handle some non-stationarity via features) assume stationarity, meaning the data generating process doesn't drift or expand in variance over time.
Incorrect! Try again.
39Which feature expansion method can approximate any continuous function given a high enough degree?
A.One-hot encoding
B.Polynomial feature expansion
C.Standard scaling
D.Interaction only expansion
Correct Answer: Polynomial feature expansion
Explanation:According to the Stone-Weierstrass theorem, polynomials can approximate any continuous function on a closed interval to any degree of accuracy, provided the degree is high enough.
Incorrect! Try again.
40If your regression model has a training MSE of 10 and a testing MSE of 50, the model is likely:
A.Underfitting
B.Overfitting
C.Ideally fit
D.Biased
Correct Answer: Overfitting
Explanation:A large gap between training performance (good/low error) and testing performance (bad/high error) is the hallmark of overfitting.
Incorrect! Try again.
41Lasso regression solves a convex optimization problem. The constraint region for Lasso is shaped like a:
A.Circle/Sphere
B.Diamond/Polyhedron
C.Hyperbola
D.Parabola
Correct Answer: Diamond/Polyhedron
Explanation:The L1 norm constraint () forms a diamond shape in 2D (or polyhedron in nD), which has corners where coefficients can become zero.
Incorrect! Try again.
42Ridge regression constraint region is shaped like a:
A.Circle/Sphere
B.Diamond
C.Square
D.Triangle
Correct Answer: Circle/Sphere
Explanation:The L2 norm constraint () forms a circle in 2D (or hypersphere), which rarely touches the axes, thus rarely setting coefficients to exactly zero.
Incorrect! Try again.
43In the context of Decision Tree Regression, what is 'Pruning'?
A.Adding more features to the dataset.
B.Removing branches from the tree to reduce complexity and overfitting.
C.Increasing the depth of the tree.
D.Changing the loss function.
Correct Answer: Removing branches from the tree to reduce complexity and overfitting.
Explanation:Pruning is a technique to reduce the size of decision trees by removing sections of the tree that provide little power to classify instances, thereby reducing overfitting.
Incorrect! Try again.
44Which of the following is NOT a metric for Regression?
A.Mean Squared Error
B.Accuracy
C.Mean Absolute Error
D.R-squared
Correct Answer: Accuracy
Explanation:Accuracy (percentage of correct predictions) is a metric for Classification. In regression, we measure the distance of error (MSE, MAE) rather than 'correct/incorrect' hits.
Incorrect! Try again.
45When performing Polynomial Regression, if the degree is set too low (e.g., for curved data), the model suffers from:
A.High Variance
B.High Bias
C.Multicollinearity
D.Overfitting
Correct Answer: High Bias
Explanation:Setting the degree too low restricts the model to a linear fit, which cannot capture the curvature of the data, resulting in high bias (underfitting).
Incorrect! Try again.
46What is the key difference between Gradient Boosting Regressors and Random Forests?
A.Random Forests are sequential; Gradient Boosting is parallel.
B.Random Forests build trees independent of each other; Gradient Boosting builds trees sequentially to correct previous errors.
C.Random Forests cannot handle regression.
D.Gradient Boosting increases variance.
Correct Answer: Random Forests build trees independent of each other; Gradient Boosting builds trees sequentially to correct previous errors.
Explanation:Gradient Boosting relies on boosting (sequential correction of residuals), while Random Forests rely on bagging (independent parallel trees averaged together).
Incorrect! Try again.
47Which of the following transformations helps in handling non-linear relationships in a Linear Regression framework?
A.Log transformation of the target or features.
B.Min-Max Scaling.
C.Standardization.
D.Ridge Regularization.
Correct Answer: Log transformation of the target or features.
Explanation:Applying non-linear transformations like Log, Square Root, or Exponential to inputs or outputs can linearize the relationship between variables, allowing linear regression to fit the data better.
Incorrect! Try again.
48In Elastic Net, if the mixing parameter (where penalty is ), the model becomes:
A.Ridge Regression
B.Lasso Regression
C.OLS
D.None of the above
Correct Answer: Lasso Regression
Explanation:If the mixing ratio assigns 100% weight to the L1 penalty, the Elastic Net effectively becomes Lasso Regression.
Incorrect! Try again.
49What does the intercept term represent geometrically in Simple Linear Regression?
A.The slope of the line.
B.The point where the regression line crosses the Y-axis.
C.The point where the regression line crosses the X-axis.
D.The mean of the X values.
Correct Answer: The point where the regression line crosses the Y-axis.
Explanation:The intercept is the value of when , which corresponds to the intersection with the vertical (Y) axis.
Incorrect! Try again.
50Which technique decomposes a time series into Trend, Seasonality, and Residual components?
A.Polynomial Expansion
B.Seasonal Decomposition
C.Regularization
D.Gradient Descent
Correct Answer: Seasonal Decomposition
Explanation:Seasonal decomposition separates a time series into a trend component (long-term progression), a seasonal component (periodic pattern), and a residual (noise) component.