1In the context of supervised learning, what distinguishes a regression problem from a classification problem?
A.The target variable is continuous.
B.The input features must be categorical.
C.Regression requires unsupervised data.
D.The target variable is categorical.
Correct Answer: The target variable is continuous.
Explanation:
Regression models predict a continuous target variable (e.g., house prices, temperature), whereas classification models predict categorical class labels.
Incorrect! Try again.
2Which visualization tool is most commonly used in Exploratory Data Analysis (EDA) to visualize the linear relationship between a single feature and the target variable?
A.Scatter plot
B.Pie chart
C.Histogram
D.Box plot
Correct Answer: Scatter plot
Explanation:
A scatter plot maps one variable on the x-axis and the other on the y-axis, making it ideal for visualizing the correlation and relationship between two continuous variables.
Incorrect! Try again.
3When analyzing the relationship between multiple variables in a dataset, which matrix helps quantify the linear correlation between every pair of features?
A.Covariance matrix
B.Confusion matrix
C.Correlation matrix
D.Hessian matrix
Correct Answer: Correlation matrix
Explanation:
A correlation matrix shows the correlation coefficients between variables, measuring the strength and direction of linear relationships. It is often visualized as a heatmap.
Incorrect! Try again.
4In a simple linear regression model , what does represent?
A.The residual error
B.The slope of the line
C.The learning rate
D.The y-intercept
Correct Answer: The y-intercept
Explanation:
(often denoted as or ) is the bias term or y-intercept, representing the predicted value of when is 0.
Incorrect! Try again.
5Which Scikit-Learn class is used to perform Ordinary Least Squares (OLS) linear regression?
sklearn.linear_model.LinearRegression fits a linear model with coefficients to minimize the residual sum of squares between the observed targets and predicted targets.
Incorrect! Try again.
6What is the objective function that Ordinary Least Squares (OLS) minimizes?
A.Sum of Squared Errors (SSE)
B.Hinge Loss
C.Cross-Entropy Loss
D.Mean Absolute Error (MAE)
Correct Answer: Sum of Squared Errors (SSE)
Explanation:
OLS minimizes the Sum of Squared Errors (SSE), also known as Residual Sum of Squares (RSS), defined as .
Incorrect! Try again.
7What is the primary motivation for using the RANSAC (RANdom SAmple Consensus) algorithm in regression?
A.To fit a model in the presence of a significant number of outliers.
B.To increase the speed of training.
C.To handle missing values automatically.
D.To perform feature selection.
Correct Answer: To fit a model in the presence of a significant number of outliers.
Explanation:
RANSAC fits a model to a subset of the data (inliers) and ignores data points that deviate significantly (outliers), making it robust to noisy data.
Incorrect! Try again.
8In the RANSAC algorithm, what does the 'residual_threshold' parameter define?
A.The maximum number of iterations.
B.The learning rate of the estimator.
C.The minimum number of samples required to fit the model.
D.The maximum residual for a data sample to be classified as an inlier.
Correct Answer: The maximum residual for a data sample to be classified as an inlier.
Explanation:
The residual_threshold sets the limit; data points with prediction errors (residuals) smaller than this threshold are considered inliers.
Incorrect! Try again.
9Which metric is calculated as ?
A.Coefficient of Determination
B.Mean Squared Error
C.Explained Variance Score
D.Mean Absolute Error
Correct Answer: Coefficient of Determination
Explanation:
This is the formula for the Coefficient of Determination ( score), which represents the proportion of variance in the dependent variable explained by the independent variables.
Incorrect! Try again.
10If an score is 1.0, what does this indicate about the regression model?
A.The model is underfitting.
B.The model is a constant line.
C.The model perfectly fits the data.
D.The model explains none of the variability of the response data.
Correct Answer: The model perfectly fits the data.
Explanation:
An score of 1.0 means the model's predictions exactly match the observed values ().
Incorrect! Try again.
11Why is Mean Squared Error (MSE) often preferred over Mean Absolute Error (MAE) for optimization?
A.MSE is always smaller than MAE.
B.MSE has the same unit as the target variable.
C.MSE is differentiable everywhere, making gradient-based optimization easier.
D.MSE is robust to outliers.
Correct Answer: MSE is differentiable everywhere, making gradient-based optimization easier.
Explanation:
The squaring function in MSE is smooth and convex, making it differentiable everywhere, which simplifies derivative calculation for Gradient Descent. MAE is not differentiable at 0.
Incorrect! Try again.
12To perform Polynomial Regression using a linear model in Scikit-Learn, which transformer must be applied first?
A.SimpleImputer
B.OneHotEncoder
C.StandardScaler
D.PolynomialFeatures
Correct Answer: PolynomialFeatures
Explanation:
PolynomialFeatures generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.
Incorrect! Try again.
13What is the main risk associated with using a high-degree polynomial in regression?
A.Underfitting
B.Convergence failure
C.High bias
D.Overfitting
Correct Answer: Overfitting
Explanation:
High-degree polynomials create highly complex models that can capture noise in the training data, leading to overfitting and poor generalization to new data.
Incorrect! Try again.
14Which of the following techniques helps reduce overfitting in regression models by adding a penalty term to the loss function?
A.Normalization
B.Augmentation
C.Regularization
D.Standardization
Correct Answer: Regularization
Explanation:
Regularization (like Ridge or Lasso) adds a penalty term based on the magnitude of the coefficients to the loss function to constrain the model complexity.
Incorrect! Try again.
15Ridge regression minimizes the sum of squared residuals plus a penalty term based on:
A.The maximum coefficient value.
B.The number of non-zero coefficients.
C.The sum of absolute values of coefficients ( norm).
D.The sum of squared values of coefficients ( norm).
Correct Answer: The sum of squared values of coefficients ( norm).
Explanation:
Ridge regression adds the penalty term (the norm) to the objective function.
Incorrect! Try again.
16Which property makes Lasso regression useful for feature selection?
A.It forces some coefficients to become exactly zero.
B.It works best when .
C.It increases the magnitude of coefficients.
D.It shrinks coefficients uniformly.
Correct Answer: It forces some coefficients to become exactly zero.
Explanation:
Due to the geometry of the penalty, Lasso regression tends to produce sparse solutions where irrelevant feature coefficients are driven exactly to zero.
Incorrect! Try again.
17In Scikit-Learn, which regression model combines both and regularization penalties?
A.Lasso
B.BayesianRidge
C.ElasticNet
D.Ridge
Correct Answer: ElasticNet
Explanation:
ElasticNet is a linear regression model that trains with both and priors as regularizers. It is useful when there are multiple features which are correlated.
Incorrect! Try again.
18In regularized regression, what is the role of the hyperparameter (or )?
A.It controls the learning rate.
B.It sets the intercept to zero.
C.It determines the degree of the polynomial.
D.It controls the strength of the regularization penalty.
Correct Answer: It controls the strength of the regularization penalty.
Explanation:
A higher increases the penalty, shrinking coefficients more (reducing variance but increasing bias). If , it becomes standard OLS.
Incorrect! Try again.
19Why is feature scaling (e.g., Standardization) important before applying Ridge or Lasso regression?
A.To ensure the target variable is normally distributed.
B.To convert categorical data to numeric.
C.Because the penalty term is sensitive to the scale of the coefficients.
D.To remove missing values.
Correct Answer: Because the penalty term is sensitive to the scale of the coefficients.
Explanation:
Regularization penalizes coefficient magnitude. If features are on different scales, the penalty will unevenly affect features with smaller ranges vs. larger ranges, leading to a biased model.
Incorrect! Try again.
20Support Vector Regression (SVR) tries to fit as many data points as possible within a margin of width:
A.
B. (epsilon)
C.
D.Zero
Correct Answer: (epsilon)
Explanation:
SVR uses an -insensitive tube. Errors within distance from the predicted line are ignored. The goal is to fit the tube around the data.
Incorrect! Try again.
21In Support Vector Regression, what is the role of the kernel function?
A.To calculate the error metric.
B.To select the best features.
C.To map the input data into a higher-dimensional feature space to handle non-linearity.
D.To normalize the target variable.
Correct Answer: To map the input data into a higher-dimensional feature space to handle non-linearity.
Explanation:
Kernels (like RBF or Polynomial) allow SVR to find linear relationships in high-dimensional spaces, which correspond to non-linear relationships in the original input space.
Incorrect! Try again.
22Which parameter in SVR controls the trade-off between the smoothness of the decision function and the tolerance for training errors?
A.Degree
B.Kernel
C.Gamma
D.C
Correct Answer: C
Explanation:
is the regularization parameter. A high attempts to classify all training examples correctly (low bias, high variance), while a low encourages a smoother decision surface.
Incorrect! Try again.
23What is the primary criterion used by Decision Tree Regressors to split a node?
A.MSE (Variance reduction)
B.Log-Loss
C.Gini Impurity
D.Information Gain
Correct Answer: MSE (Variance reduction)
Explanation:
For regression trees, splits are chosen to minimize the Mean Squared Error (MSE) or variance within the resulting child nodes.
Incorrect! Try again.
24One major advantage of Decision Tree Regression is:
C.It does not require feature scaling or normalization.
D.It always extrapolates well.
Correct Answer: It does not require feature scaling or normalization.
Explanation:
Decision trees split data based on thresholds of individual features, so the absolute scale or distribution of features does not affect the structure of the tree.
Incorrect! Try again.
25What is a characteristic behavior of a Decision Tree Regressor when predicting values outside the range of the training data?
A.It automatically creates a polynomial fit.
B.It extrapolates linearly.
C.It returns a null value.
D.It predicts the average of the closest training samples (constant prediction).
Correct Answer: It predicts the average of the closest training samples (constant prediction).
Explanation:
Decision trees are piecewise constant models. They cannot extrapolate trends; for inputs outside the training range, they predict the value associated with the nearest leaf node.
Incorrect! Try again.
26Random Forest Regression improves upon a single Decision Tree by utilizing which technique?
A.Pruning
B.Gradient Boosting
C.Kernel trick
D.Bagging (Bootstrap Aggregating)
Correct Answer: Bagging (Bootstrap Aggregating)
Explanation:
Random Forests build multiple trees on bootstrap samples of the data and average their predictions to reduce variance and overfitting.
Incorrect! Try again.
27In a Random Forest Regressor, how is the final prediction determined?
A.Majority vote of the trees.
B.The prediction of the tree with the highest accuracy.
C.Weighted sum of the features.
D.Average of the predictions of all individual trees.
Correct Answer: Average of the predictions of all individual trees.
Explanation:
For regression tasks, the Random Forest averages the continuous output values of all the trees in the ensemble.
Incorrect! Try again.
28Which parameter in RandomForestRegressor determines the number of trees in the forest?
A.min_samples_split
B.max_depth
C.bootstrap
D.n_estimators
Correct Answer: n_estimators
Explanation:
n_estimators specifies the number of decision trees to be generated in the forest.
Incorrect! Try again.
29Random Forests introduce randomness in two ways: bootstrap sampling and:
A.Randomly shuffling the target labels.
B.Selecting a random subset of features at each split.
C.Random initialization of weights.
D.Randomly pruning the trees.
Correct Answer: Selecting a random subset of features at each split.
Explanation:
At each node split, Random Forest only considers a random subset of features (controlled by max_features) to decorrelate the trees.
Incorrect! Try again.
30What is the 'Out-of-Bag' (OOB) score in Random Forests?
A.The accuracy on the test set.
B.The error rate of the worst tree.
C.A validation score calculated using the samples not included in the bootstrap sample for each tree.
D.The training error of the full ensemble.
Correct Answer: A validation score calculated using the samples not included in the bootstrap sample for each tree.
Explanation:
About 1/3 of the data is not used (out-of-bag) for training a specific tree. These samples can be used to estimate the generalization error without a separate validation set.
Incorrect! Try again.
31Which Scikit-Learn function splits a dataset into training and testing sets?
train_test_split is the standard utility to split arrays or matrices into random train and test subsets.
Incorrect! Try again.
32In the equation for ElasticNet: , what does (or l1_ratio in scikit-learn) control?
A.The degree of the polynomial.
B.The tolerance for stopping criteria.
C.The overall regularization strength.
D.The mix between Ridge and Lasso regularization.
Correct Answer: The mix between Ridge and Lasso regularization.
Explanation:
The l1_ratio () controls the balance. If , it is Lasso; if , it is Ridge; values in between mix both penalties.
Incorrect! Try again.
33A residual plot shows the residuals on the y-axis and the predicted values on the x-axis. What pattern indicates a good regression model?
A.Points randomly scattered around the horizontal axis (zero).
B.A clear U-shape curve.
C.A funnel shape (heteroscedasticity).
D.A linear trend.
Correct Answer: Points randomly scattered around the horizontal axis (zero).
Explanation:
If residuals are randomly scattered around zero with no discernable pattern, it indicates that the model has captured the underlying trend and the errors are random noise.
Incorrect! Try again.
34When using SGDRegressor from Scikit-Learn, which hyperparameter defines the update rule schedule (how the learning rate changes over time)?
A.learning_rate
B.loss
C.penalty
D.alpha
Correct Answer: learning_rate
Explanation:
The learning_rate parameter (options like 'constant', 'optimal', 'invscaling', 'adaptive') determines how the step size changes during training.
Incorrect! Try again.
35Which of the following is an intrinsic weakness of Linear Regression?
A.It is difficult to interpret.
B.It requires categorical features.
C.It is computationally expensive.
D.It cannot model non-linear relationships without feature engineering.
Correct Answer: It cannot model non-linear relationships without feature engineering.
Explanation:
Standard Linear Regression assumes a linear relationship. To model curves, one must manually transform features (e.g., polynomial features) or use a different model.
Incorrect! Try again.
36In the context of regression metrics, what does Median Absolute Error provide that Mean Absolute Error does not?
A.Differentiability.
B.Robustness to outliers.
C.Percentage error calculation.
D.Squared penalization.
Correct Answer: Robustness to outliers.
Explanation:
By taking the median of the absolute errors, this metric ignores the influence of extreme outliers, whereas the Mean is pulled towards outliers.
Incorrect! Try again.
37What is Multicollinearity?
A.When the target variable is categorical.
B.When the training data is too small.
C.When independent features are highly correlated with each other.
D.When the model has too many polynomial features.
Correct Answer: When independent features are highly correlated with each other.
Explanation:
Multicollinearity occurs when features are linearly dependent. This can make coefficient estimates unstable and difficult to interpret in linear models.
Incorrect! Try again.
38How does DecisionTreeRegressor handle missing values in Scikit-Learn (standard implementation)?
A.It handles them natively.
B.It ignores the rows with missing values.
C.It requires imputation (filling missing values) before training.
D.It treats them as a separate category.
Correct Answer: It requires imputation (filling missing values) before training.
Explanation:
Scikit-Learn's CART implementation currently does not support missing values natively; an imputer (e.g., SimpleImputer) is required.
Incorrect! Try again.
39Which plot is typically used to inspect if the residuals follow a normal distribution?
A.Box plot
B.Bar chart
C.Scatter plot
D.Q-Q (Quantile-Quantile) plot
Correct Answer: Q-Q (Quantile-Quantile) plot
Explanation:
A Q-Q plot compares the quantiles of the residuals against the quantiles of a theoretical normal distribution. A straight line indicates normality.
Incorrect! Try again.
40In Polynomial Regression, if you increase the degree of the polynomial significantly, the model becomes:
A.Less flexible.
B.More biased.
C.More complex with higher variance.
D.Linear.
Correct Answer: More complex with higher variance.
Explanation:
Higher degrees allow the model to wiggle and fit training points precisely, increasing complexity and variance (risk of overfitting).
Incorrect! Try again.
41What is the result of applying fit_transform on the training data and then transform on the test data during scaling?
A.Incorrect scaling.
B.Overfitting.
C.Data leakage.
D.Correct application of preprocessing parameters learnt from training to test data.
Correct Answer: Correct application of preprocessing parameters learnt from training to test data.
Explanation:
You learn the parameters (mean, std) from the training set (fit) and apply them to both (transform) to ensure the test set represents unseen data scaled to the same reference.
Incorrect! Try again.
42Which Scikit-Learn attribute holds the estimated coefficients for a Linear Regression model after fitting?
A.model.weights_
B.model.params_
C.model.coef_
D.model.intercept_
Correct Answer: model.coef_
Explanation:
coef_ is the attribute containing the weights (coefficients) for the features. intercept_ holds the bias term.
Incorrect! Try again.
43What is the analytical solution to find the optimal weights for Linear Regression called?
A.Gradient Descent
B.Coordinate Descent
C.Backpropagation
D.The Normal Equation
Correct Answer: The Normal Equation
Explanation:
The Normal Equation is a closed-form solution: .
Incorrect! Try again.
44When using Support Vector Regression with an RBF kernel, what happens if the parameter (gamma) is very large?
A.The model behaves like a linear regression.
B.The influence of each training example reaches very far.
C.The model becomes a flat line.
D.The influence of each training example is limited to a close radius, leading to overfitting.
Correct Answer: The influence of each training example is limited to a close radius, leading to overfitting.
Explanation:
High gamma means the Gaussian curve is narrow; the model captures complex details around individual points, often causing overfitting.
Incorrect! Try again.
45Why might one use Adjusted instead of standard ?
A.To account for the number of predictors, penalizing the addition of useless features.
B.To calculate error in absolute terms.
C.To handle categorical variables.
D.To ensure the score is always positive.
Correct Answer: To account for the number of predictors, penalizing the addition of useless features.
Explanation:
Standard never decreases when features are added. Adjusted decreases if the new feature doesn't improve the model more than chance would expect.
Incorrect! Try again.
46In the context of Bias-Variance tradeoff, a simple linear model with few features typically has:
A.Low Bias and Low Variance
B.High Bias and High Variance
C.High Bias and Low Variance
D.Low Bias and High Variance
Correct Answer: High Bias and Low Variance
Explanation:
Simple models may fail to capture complex patterns (High Bias/Underfitting) but are stable and don't change much with different training sets (Low Variance).
Incorrect! Try again.
47Which of the following creates a pipeline in Scikit-Learn that scales data then fits a regressor?