1 $In the context of supervised learning, what distinguishes a regression problem from a classification problem?$

A.

The target variable is continuous.

B.

The target variable is categorical.

C.

The input features must be categorical.

D.

Regression requires unsupervised data.

2 $Which visualization tool is most commonly used in Exploratory Data Analysis (EDA) to visualize the linear relationship between a single feature and the target variable?$

A.

Histogram

B.

Scatter plot

C.

Pie chart

D.

Box plot

3 $When analyzing the relationship between multiple variables in a dataset, which matrix helps quantify the linear correlation between every pair of features?$

A.

Confusion matrix

B.

Hessian matrix

C.

Correlation matrix

D.

Covariance matrix

4 $In a simple linear regression model, what does represent?$

A.

The slope of the line

B.

The y-intercept

C.

The residual error

D.

The learning rate

5 $Which Scikit-Learn class is used to perform Ordinary Least Squares (OLS) linear regression?$

A.

sklearn.linear_model.SGDRegressor

B.

sklearn.linear_model.LinearRegression

C.

sklearn.linear_model.Ridge

D.

sklearn.linear_model.Lasso

6 $What is the objective function that Ordinary Least Squares (OLS) minimizes?$

A.

Sum of Squared Errors (SSE)

B.

Mean Absolute Error (MAE)

C.

Hinge Loss

D.

Cross-Entropy Loss

7 $What is the primary motivation for using the RANSAC (RANdom SAmple Consensus) algorithm in regression?$

A.

To increase the speed of training.

B.

To fit a model in the presence of a significant number of outliers.

C.

To perform feature selection.

D.

To handle missing values automatically.

8 $In the RANSAC algorithm, what does the 'residual_threshold' parameter define?$

A.

The maximum number of iterations.

B.

The minimum number of samples required to fit the model.

C.

The maximum residual for a data sample to be classified as an inlier.

D.

The learning rate of the estimator.

9 $Which metric is calculated as ?$

A.

Mean Squared Error

B.

Explained Variance Score

C.

Coefficient of Determination

D.

Mean Absolute Error

10 $If an score is 1.0, what does this indicate about the regression model?$

A.

The model is underfitting.

B.

The model explains none of the variability of the response data.

C.

The model perfectly fits the data.

D.

The model is a constant line.

11 $Why is Mean Squared Error (MSE) often preferred over Mean Absolute Error (MAE) for optimization?$

A.

MSE is robust to outliers.

B.

MSE is differentiable everywhere, making gradient-based optimization easier.

C.

MSE is always smaller than MAE.

D.

MSE has the same unit as the target variable.

12 $To perform Polynomial Regression using a linear model in Scikit-Learn, which transformer must be applied first?$

A.

StandardScaler

B.

PolynomialFeatures

C.

OneHotEncoder

D.

SimpleImputer

13 $What is the main risk associated with using a high-degree polynomial in regression?$

A.

Underfitting

B.

High bias

C.

Overfitting

D.

Convergence failure

14 $Which of the following techniques helps reduce overfitting in regression models by adding a penalty term to the loss function?$

A.

Normalization

B.

Regularization

C.

Standardization

D.

Augmentation

15 $Ridge regression minimizes the sum of squared residuals plus a penalty term based on:$

A.

The sum of absolute values of coefficients (norm).

B.

The sum of squared values of coefficients (norm).

C.

The number of non-zero coefficients.

D.

The maximum coefficient value.

16 $Which property makes Lasso regression useful for feature selection?$

A.

It shrinks coefficients uniformly.

B.

It forces some coefficients to become exactly zero.

C.

It works best when .

D.

It increases the magnitude of coefficients.

17 $In Scikit-Learn, which regression model combines both and regularization penalties?$

A.

Ridge

B.

Lasso

C.

ElasticNet

D.

BayesianRidge

18 $In regularized regression, what is the role of the hyperparameter (or)?$

A.

It controls the learning rate.

B.

It determines the degree of the polynomial.

C.

It controls the strength of the regularization penalty.

D.

It sets the intercept to zero.

19 $Why is feature scaling (e.g., Standardization) important before applying Ridge or Lasso regression?$

A.

To convert categorical data to numeric.

B.

Because the penalty term is sensitive to the scale of the coefficients.

C.

To remove missing values.

D.

To ensure the target variable is normally distributed.

20 $Support Vector Regression (SVR) tries to fit as many data points as possible within a margin of width:$

A.

B.

(epsilon)

C.

D.

Zero

21 $In Support Vector Regression, what is the role of the kernel function?$

A.

To calculate the error metric.

B.

To map the input data into a higher-dimensional feature space to handle non-linearity.

C.

To select the best features.

D.

To normalize the target variable.

22 $Which parameter in SVR controls the trade-off between the smoothness of the decision function and the tolerance for training errors?$

A.

Kernel

B.

Gamma

C.

C

D.

Degree

23 $What is the primary criterion used by Decision Tree Regressors to split a node?$

A.

Gini Impurity

B.

Information Gain

C.

MSE (Variance reduction)

D.

Log-Loss

24 $One major advantage of Decision Tree Regression is:$

A.

It never overfits.

B.

It does not require feature scaling or normalization.

C.

It creates smooth, continuous prediction curves.

D.

It always extrapolates well.

25 $What is a characteristic behavior of a Decision Tree Regressor when predicting values outside the range of the training data?$

A.

It extrapolates linearly.

B.

It predicts the average of the closest training samples (constant prediction).

C.

It returns a null value.

D.

It automatically creates a polynomial fit.

26 $Random Forest Regression improves upon a single Decision Tree by utilizing which technique?$

A.

Gradient Boosting

B.

Bagging (Bootstrap Aggregating)

C.

Pruning

D.

Kernel trick

27 $In a Random Forest Regressor, how is the final prediction determined?$

A.

Weighted sum of the features.

B.

Majority vote of the trees.

C.

Average of the predictions of all individual trees.

D.

The prediction of the tree with the highest accuracy.

28 $Which parameter in RandomForestRegressor determines the number of trees in the forest?$

A.

max_depth

B.

n_estimators

C.

min_samples_split

D.

bootstrap

29 $Random Forests introduce randomness in two ways: bootstrap sampling and:$

A.

Random initialization of weights.

B.

Selecting a random subset of features at each split.

C.

Randomly shuffling the target labels.

D.

Randomly pruning the trees.

30 $What is the 'Out-of-Bag' (OOB) score in Random Forests?$

A.

The accuracy on the test set.

B.

A validation score calculated using the samples not included in the bootstrap sample for each tree.

C.

The error rate of the worst tree.

D.

The training error of the full ensemble.

31 $Which Scikit-Learn function splits a dataset into training and testing sets?$

A.

sklearn.model_selection.cross_val_score

B.

sklearn.model_selection.train_test_split

C.

sklearn.metrics.split_data

D.

sklearn.preprocessing.train_test

32 $In the equation for ElasticNet:, what does (or l1_ratio in scikit-learn) control?$

A.

The overall regularization strength.

B.

The mix between Ridge and Lasso regularization.

C.

The degree of the polynomial.

D.

The tolerance for stopping criteria.

33 $A residual plot shows the residuals on the y-axis and the predicted values on the x-axis. What pattern indicates a good regression model?$

A.

A clear U-shape curve.

B.

A linear trend.

C.

Points randomly scattered around the horizontal axis (zero).

D.

A funnel shape (heteroscedasticity).

34 $When using SGDRegressor from Scikit-Learn, which hyperparameter defines the update rule schedule (how the learning rate changes over time)?$

A.

penalty

B.

learning_rate

C.

alpha

D.

loss

35 $Which of the following is an intrinsic weakness of Linear Regression?$

A.

It is computationally expensive.

B.

It cannot model non-linear relationships without feature engineering.

C.

It is difficult to interpret.

D.

It requires categorical features.

36 $In the context of regression metrics, what does Median Absolute Error provide that Mean Absolute Error does not?$

A.

Differentiability.

B.

Robustness to outliers.

C.

Percentage error calculation.

D.

Squared penalization.

37 $What is Multicollinearity?$

A.

When the target variable is categorical.

B.

When independent features are highly correlated with each other.

C.

When the model has too many polynomial features.

D.

When the training data is too small.

38 $How does DecisionTreeRegressor handle missing values in Scikit-Learn (standard implementation)?$

A.

It handles them natively.

B.

It ignores the rows with missing values.

C.

It requires imputation (filling missing values) before training.

D.

It treats them as a separate category.

39 $Which plot is typically used to inspect if the residuals follow a normal distribution?$

A.

Box plot

B.

Q-Q (Quantile-Quantile) plot

C.

Scatter plot

D.

Bar chart

40 $In Polynomial Regression, if you increase the degree of the polynomial significantly, the model becomes:$

A.

More biased.

B.

Less flexible.

C.

More complex with higher variance.

D.

Linear.

41 $What is the result of applying fit_transform on the training data and then transform on the test data during scaling?$

A.

Data leakage.

B.

Incorrect scaling.

C.

Correct application of preprocessing parameters learnt from training to test data.

D.

Overfitting.

42 $Which Scikit-Learn attribute holds the estimated coefficients for a Linear Regression model after fitting?$

A.

model.weights_

B.

model.coef_

C.

model.intercept_

D.

model.params_

43 $What is the analytical solution to find the optimal weights for Linear Regression called?$

A.

Gradient Descent

B.

The Normal Equation

C.

Backpropagation

D.

Coordinate Descent

44 $When using Support Vector Regression with an RBF kernel, what happens if the parameter (gamma) is very large?$

A.

The model behaves like a linear regression.

B.

The influence of each training example reaches very far.

C.

The influence of each training example is limited to a close radius, leading to overfitting.

D.

The model becomes a flat line.

45 $Why might one use Adjusted instead of standard ?$

A.

To calculate error in absolute terms.

B.

To account for the number of predictors, penalizing the addition of useless features.

C.

To ensure the score is always positive.

D.

To handle categorical variables.

46 $In the context of Bias-Variance tradeoff, a simple linear model with few features typically has:$

A.

High Bias and Low Variance

B.

Low Bias and High Variance

C.

Low Bias and Low Variance

D.

High Bias and High Variance

47 $Which of the following creates a pipeline in Scikit-Learn that scales data then fits a regressor?$

A.

sklearn.pipeline.make_pipeline(StandardScaler(), LinearRegression())

B.

sklearn.linear_model.LinearRegression(normalize=True)

C.

sklearn.model_selection.cross_val_score(LinearRegression())

D.

sklearn.compose.ColumnTransformer()

48 $What is the interpretation of the slope coefficient in the model ?$

A.

The value of when .

B.

The change in for a one-unit increase in .

C.

The percentage change in .

D.

The correlation between and .

49 $Which regression algorithm constructs a model based on the principle of 'recursive binary splitting'?$

A.

Linear Regression

B.

Support Vector Regression

C.

Decision Tree Regression

D.

Ridge Regression

50 $When interpreting a heatmap of a correlation matrix, a value of -0.9 between two features indicates:$

A.

A weak negative linear relationship.

B.

A strong positive linear relationship.

C.

A strong negative linear relationship.

D.

No linear relationship.

Unit 4 - Practice Quiz