1In the context of supervised learning, what distinguishes a regression problem from a classification problem?
A.The target variable is continuous.
B.The target variable is categorical.
C.The input features must be categorical.
D.Regression requires unsupervised data.
Correct Answer: The target variable is continuous.
Explanation:Regression models predict a continuous target variable (e.g., house prices, temperature), whereas classification models predict categorical class labels.
Incorrect! Try again.
2Which visualization tool is most commonly used in Exploratory Data Analysis (EDA) to visualize the linear relationship between a single feature and the target variable?
A.Histogram
B.Scatter plot
C.Pie chart
D.Box plot
Correct Answer: Scatter plot
Explanation:A scatter plot maps one variable on the x-axis and the other on the y-axis, making it ideal for visualizing the correlation and relationship between two continuous variables.
Incorrect! Try again.
3When analyzing the relationship between multiple variables in a dataset, which matrix helps quantify the linear correlation between every pair of features?
A.Confusion matrix
B.Hessian matrix
C.Correlation matrix
D.Covariance matrix
Correct Answer: Correlation matrix
Explanation:A correlation matrix shows the correlation coefficients between variables, measuring the strength and direction of linear relationships. It is often visualized as a heatmap.
Incorrect! Try again.
4In a simple linear regression model , what does represent?
A.The slope of the line
B.The y-intercept
C.The residual error
D.The learning rate
Correct Answer: The y-intercept
Explanation: (often denoted as or ) is the bias term or y-intercept, representing the predicted value of when is 0.
Incorrect! Try again.
5Which Scikit-Learn class is used to perform Ordinary Least Squares (OLS) linear regression?
Explanation:sklearn.linear_model.LinearRegression fits a linear model with coefficients to minimize the residual sum of squares between the observed targets and predicted targets.
Incorrect! Try again.
6What is the objective function that Ordinary Least Squares (OLS) minimizes?
A.Sum of Squared Errors (SSE)
B.Mean Absolute Error (MAE)
C.Hinge Loss
D.Cross-Entropy Loss
Correct Answer: Sum of Squared Errors (SSE)
Explanation:OLS minimizes the Sum of Squared Errors (SSE), also known as Residual Sum of Squares (RSS), defined as .
Incorrect! Try again.
7What is the primary motivation for using the RANSAC (RANdom SAmple Consensus) algorithm in regression?
A.To increase the speed of training.
B.To fit a model in the presence of a significant number of outliers.
C.To perform feature selection.
D.To handle missing values automatically.
Correct Answer: To fit a model in the presence of a significant number of outliers.
Explanation:RANSAC fits a model to a subset of the data (inliers) and ignores data points that deviate significantly (outliers), making it robust to noisy data.
Incorrect! Try again.
8In the RANSAC algorithm, what does the 'residual_threshold' parameter define?
A.The maximum number of iterations.
B.The minimum number of samples required to fit the model.
C.The maximum residual for a data sample to be classified as an inlier.
D.The learning rate of the estimator.
Correct Answer: The maximum residual for a data sample to be classified as an inlier.
Explanation:The residual_threshold sets the limit; data points with prediction errors (residuals) smaller than this threshold are considered inliers.
Incorrect! Try again.
9Which metric is calculated as ?
A.Mean Squared Error
B.Explained Variance Score
C.Coefficient of Determination
D.Mean Absolute Error
Correct Answer: Coefficient of Determination
Explanation:This is the formula for the Coefficient of Determination ( score), which represents the proportion of variance in the dependent variable explained by the independent variables.
Incorrect! Try again.
10If an score is 1.0, what does this indicate about the regression model?
A.The model is underfitting.
B.The model explains none of the variability of the response data.
C.The model perfectly fits the data.
D.The model is a constant line.
Correct Answer: The model perfectly fits the data.
Explanation:An score of 1.0 means the model's predictions exactly match the observed values ().
Incorrect! Try again.
11Why is Mean Squared Error (MSE) often preferred over Mean Absolute Error (MAE) for optimization?
A.MSE is robust to outliers.
B.MSE is differentiable everywhere, making gradient-based optimization easier.
C.MSE is always smaller than MAE.
D.MSE has the same unit as the target variable.
Correct Answer: MSE is differentiable everywhere, making gradient-based optimization easier.
Explanation:The squaring function in MSE is smooth and convex, making it differentiable everywhere, which simplifies derivative calculation for Gradient Descent. MAE is not differentiable at 0.
Incorrect! Try again.
12To perform Polynomial Regression using a linear model in Scikit-Learn, which transformer must be applied first?
A.StandardScaler
B.PolynomialFeatures
C.OneHotEncoder
D.SimpleImputer
Correct Answer: PolynomialFeatures
Explanation:PolynomialFeatures generates a new feature matrix consisting of all polynomial combinations of the features with degree less than or equal to the specified degree.
Incorrect! Try again.
13What is the main risk associated with using a high-degree polynomial in regression?
A.Underfitting
B.High bias
C.Overfitting
D.Convergence failure
Correct Answer: Overfitting
Explanation:High-degree polynomials create highly complex models that can capture noise in the training data, leading to overfitting and poor generalization to new data.
Incorrect! Try again.
14Which of the following techniques helps reduce overfitting in regression models by adding a penalty term to the loss function?
A.Normalization
B.Regularization
C.Standardization
D.Augmentation
Correct Answer: Regularization
Explanation:Regularization (like Ridge or Lasso) adds a penalty term based on the magnitude of the coefficients to the loss function to constrain the model complexity.
Incorrect! Try again.
15Ridge regression minimizes the sum of squared residuals plus a penalty term based on:
A.The sum of absolute values of coefficients ( norm).
B.The sum of squared values of coefficients ( norm).
C.The number of non-zero coefficients.
D.The maximum coefficient value.
Correct Answer: The sum of squared values of coefficients ( norm).
Explanation:Ridge regression adds the penalty term (the norm) to the objective function.
Incorrect! Try again.
16Which property makes Lasso regression useful for feature selection?
A.It shrinks coefficients uniformly.
B.It forces some coefficients to become exactly zero.
C.It works best when .
D.It increases the magnitude of coefficients.
Correct Answer: It forces some coefficients to become exactly zero.
Explanation:Due to the geometry of the penalty, Lasso regression tends to produce sparse solutions where irrelevant feature coefficients are driven exactly to zero.
Incorrect! Try again.
17In Scikit-Learn, which regression model combines both and regularization penalties?
A.Ridge
B.Lasso
C.ElasticNet
D.BayesianRidge
Correct Answer: ElasticNet
Explanation:ElasticNet is a linear regression model that trains with both and priors as regularizers. It is useful when there are multiple features which are correlated.
Incorrect! Try again.
18In regularized regression, what is the role of the hyperparameter (or )?
A.It controls the learning rate.
B.It determines the degree of the polynomial.
C.It controls the strength of the regularization penalty.
D.It sets the intercept to zero.
Correct Answer: It controls the strength of the regularization penalty.
Explanation:A higher increases the penalty, shrinking coefficients more (reducing variance but increasing bias). If , it becomes standard OLS.
Incorrect! Try again.
19Why is feature scaling (e.g., Standardization) important before applying Ridge or Lasso regression?
A.To convert categorical data to numeric.
B.Because the penalty term is sensitive to the scale of the coefficients.
C.To remove missing values.
D.To ensure the target variable is normally distributed.
Correct Answer: Because the penalty term is sensitive to the scale of the coefficients.
Explanation:Regularization penalizes coefficient magnitude. If features are on different scales, the penalty will unevenly affect features with smaller ranges vs. larger ranges, leading to a biased model.
Incorrect! Try again.
20Support Vector Regression (SVR) tries to fit as many data points as possible within a margin of width:
A.
B. (epsilon)
C.
D.Zero
Correct Answer: (epsilon)
Explanation:SVR uses an -insensitive tube. Errors within distance from the predicted line are ignored. The goal is to fit the tube around the data.
Incorrect! Try again.
21In Support Vector Regression, what is the role of the kernel function?
A.To calculate the error metric.
B.To map the input data into a higher-dimensional feature space to handle non-linearity.
C.To select the best features.
D.To normalize the target variable.
Correct Answer: To map the input data into a higher-dimensional feature space to handle non-linearity.
Explanation:Kernels (like RBF or Polynomial) allow SVR to find linear relationships in high-dimensional spaces, which correspond to non-linear relationships in the original input space.
Incorrect! Try again.
22Which parameter in SVR controls the trade-off between the smoothness of the decision function and the tolerance for training errors?
A.Kernel
B.Gamma
C.C
D.Degree
Correct Answer: C
Explanation: is the regularization parameter. A high attempts to classify all training examples correctly (low bias, high variance), while a low encourages a smoother decision surface.
Incorrect! Try again.
23What is the primary criterion used by Decision Tree Regressors to split a node?
A.Gini Impurity
B.Information Gain
C.MSE (Variance reduction)
D.Log-Loss
Correct Answer: MSE (Variance reduction)
Explanation:For regression trees, splits are chosen to minimize the Mean Squared Error (MSE) or variance within the resulting child nodes.
Incorrect! Try again.
24One major advantage of Decision Tree Regression is:
A.It never overfits.
B.It does not require feature scaling or normalization.
Correct Answer: It does not require feature scaling or normalization.
Explanation:Decision trees split data based on thresholds of individual features, so the absolute scale or distribution of features does not affect the structure of the tree.
Incorrect! Try again.
25What is a characteristic behavior of a Decision Tree Regressor when predicting values outside the range of the training data?
A.It extrapolates linearly.
B.It predicts the average of the closest training samples (constant prediction).
C.It returns a null value.
D.It automatically creates a polynomial fit.
Correct Answer: It predicts the average of the closest training samples (constant prediction).
Explanation:Decision trees are piecewise constant models. They cannot extrapolate trends; for inputs outside the training range, they predict the value associated with the nearest leaf node.
Incorrect! Try again.
26Random Forest Regression improves upon a single Decision Tree by utilizing which technique?
A.Gradient Boosting
B.Bagging (Bootstrap Aggregating)
C.Pruning
D.Kernel trick
Correct Answer: Bagging (Bootstrap Aggregating)
Explanation:Random Forests build multiple trees on bootstrap samples of the data and average their predictions to reduce variance and overfitting.
Incorrect! Try again.
27In a Random Forest Regressor, how is the final prediction determined?
A.Weighted sum of the features.
B.Majority vote of the trees.
C.Average of the predictions of all individual trees.
D.The prediction of the tree with the highest accuracy.
Correct Answer: Average of the predictions of all individual trees.
Explanation:For regression tasks, the Random Forest averages the continuous output values of all the trees in the ensemble.
Incorrect! Try again.
28Which parameter in RandomForestRegressor determines the number of trees in the forest?
A.max_depth
B.n_estimators
C.min_samples_split
D.bootstrap
Correct Answer: n_estimators
Explanation:n_estimators specifies the number of decision trees to be generated in the forest.
Incorrect! Try again.
29Random Forests introduce randomness in two ways: bootstrap sampling and:
A.Random initialization of weights.
B.Selecting a random subset of features at each split.
C.Randomly shuffling the target labels.
D.Randomly pruning the trees.
Correct Answer: Selecting a random subset of features at each split.
Explanation:At each node split, Random Forest only considers a random subset of features (controlled by max_features) to decorrelate the trees.
Incorrect! Try again.
30What is the 'Out-of-Bag' (OOB) score in Random Forests?
A.The accuracy on the test set.
B.A validation score calculated using the samples not included in the bootstrap sample for each tree.
C.The error rate of the worst tree.
D.The training error of the full ensemble.
Correct Answer: A validation score calculated using the samples not included in the bootstrap sample for each tree.
Explanation:About 1/3 of the data is not used (out-of-bag) for training a specific tree. These samples can be used to estimate the generalization error without a separate validation set.
Incorrect! Try again.
31Which Scikit-Learn function splits a dataset into training and testing sets?
Explanation:train_test_split is the standard utility to split arrays or matrices into random train and test subsets.
Incorrect! Try again.
32In the equation for ElasticNet: , what does (or l1_ratio in scikit-learn) control?
A.The overall regularization strength.
B.The mix between Ridge and Lasso regularization.
C.The degree of the polynomial.
D.The tolerance for stopping criteria.
Correct Answer: The mix between Ridge and Lasso regularization.
Explanation:The l1_ratio () controls the balance. If , it is Lasso; if , it is Ridge; values in between mix both penalties.
Incorrect! Try again.
33A residual plot shows the residuals on the y-axis and the predicted values on the x-axis. What pattern indicates a good regression model?
A.A clear U-shape curve.
B.A linear trend.
C.Points randomly scattered around the horizontal axis (zero).
D.A funnel shape (heteroscedasticity).
Correct Answer: Points randomly scattered around the horizontal axis (zero).
Explanation:If residuals are randomly scattered around zero with no discernable pattern, it indicates that the model has captured the underlying trend and the errors are random noise.
Incorrect! Try again.
34When using SGDRegressor from Scikit-Learn, which hyperparameter defines the update rule schedule (how the learning rate changes over time)?
A.penalty
B.learning_rate
C.alpha
D.loss
Correct Answer: learning_rate
Explanation:The learning_rate parameter (options like 'constant', 'optimal', 'invscaling', 'adaptive') determines how the step size changes during training.
Incorrect! Try again.
35Which of the following is an intrinsic weakness of Linear Regression?
A.It is computationally expensive.
B.It cannot model non-linear relationships without feature engineering.
C.It is difficult to interpret.
D.It requires categorical features.
Correct Answer: It cannot model non-linear relationships without feature engineering.
Explanation:Standard Linear Regression assumes a linear relationship. To model curves, one must manually transform features (e.g., polynomial features) or use a different model.
Incorrect! Try again.
36In the context of regression metrics, what does Median Absolute Error provide that Mean Absolute Error does not?
A.Differentiability.
B.Robustness to outliers.
C.Percentage error calculation.
D.Squared penalization.
Correct Answer: Robustness to outliers.
Explanation:By taking the median of the absolute errors, this metric ignores the influence of extreme outliers, whereas the Mean is pulled towards outliers.
Incorrect! Try again.
37What is Multicollinearity?
A.When the target variable is categorical.
B.When independent features are highly correlated with each other.
C.When the model has too many polynomial features.
D.When the training data is too small.
Correct Answer: When independent features are highly correlated with each other.
Explanation:Multicollinearity occurs when features are linearly dependent. This can make coefficient estimates unstable and difficult to interpret in linear models.
Incorrect! Try again.
38How does DecisionTreeRegressor handle missing values in Scikit-Learn (standard implementation)?
A.It handles them natively.
B.It ignores the rows with missing values.
C.It requires imputation (filling missing values) before training.
D.It treats them as a separate category.
Correct Answer: It requires imputation (filling missing values) before training.
Explanation:Scikit-Learn's CART implementation currently does not support missing values natively; an imputer (e.g., SimpleImputer) is required.
Incorrect! Try again.
39Which plot is typically used to inspect if the residuals follow a normal distribution?
A.Box plot
B.Q-Q (Quantile-Quantile) plot
C.Scatter plot
D.Bar chart
Correct Answer: Q-Q (Quantile-Quantile) plot
Explanation:A Q-Q plot compares the quantiles of the residuals against the quantiles of a theoretical normal distribution. A straight line indicates normality.
Incorrect! Try again.
40In Polynomial Regression, if you increase the degree of the polynomial significantly, the model becomes:
A.More biased.
B.Less flexible.
C.More complex with higher variance.
D.Linear.
Correct Answer: More complex with higher variance.
Explanation:Higher degrees allow the model to wiggle and fit training points precisely, increasing complexity and variance (risk of overfitting).
Incorrect! Try again.
41What is the result of applying fit_transform on the training data and then transform on the test data during scaling?
A.Data leakage.
B.Incorrect scaling.
C.Correct application of preprocessing parameters learnt from training to test data.
D.Overfitting.
Correct Answer: Correct application of preprocessing parameters learnt from training to test data.
Explanation:You learn the parameters (mean, std) from the training set (fit) and apply them to both (transform) to ensure the test set represents unseen data scaled to the same reference.
Incorrect! Try again.
42Which Scikit-Learn attribute holds the estimated coefficients for a Linear Regression model after fitting?
A.model.weights_
B.model.coef_
C.model.intercept_
D.model.params_
Correct Answer: model.coef_
Explanation:coef_ is the attribute containing the weights (coefficients) for the features. intercept_ holds the bias term.
Incorrect! Try again.
43What is the analytical solution to find the optimal weights for Linear Regression called?
A.Gradient Descent
B.The Normal Equation
C.Backpropagation
D.Coordinate Descent
Correct Answer: The Normal Equation
Explanation:The Normal Equation is a closed-form solution: .
Incorrect! Try again.
44When using Support Vector Regression with an RBF kernel, what happens if the parameter (gamma) is very large?
A.The model behaves like a linear regression.
B.The influence of each training example reaches very far.
C.The influence of each training example is limited to a close radius, leading to overfitting.
D.The model becomes a flat line.
Correct Answer: The influence of each training example is limited to a close radius, leading to overfitting.
Explanation:High gamma means the Gaussian curve is narrow; the model captures complex details around individual points, often causing overfitting.
Incorrect! Try again.
45Why might one use Adjusted instead of standard ?
A.To calculate error in absolute terms.
B.To account for the number of predictors, penalizing the addition of useless features.
C.To ensure the score is always positive.
D.To handle categorical variables.
Correct Answer: To account for the number of predictors, penalizing the addition of useless features.
Explanation:Standard never decreases when features are added. Adjusted decreases if the new feature doesn't improve the model more than chance would expect.
Incorrect! Try again.
46In the context of Bias-Variance tradeoff, a simple linear model with few features typically has:
A.High Bias and Low Variance
B.Low Bias and High Variance
C.Low Bias and Low Variance
D.High Bias and High Variance
Correct Answer: High Bias and Low Variance
Explanation:Simple models may fail to capture complex patterns (High Bias/Underfitting) but are stable and don't change much with different training sets (Low Variance).
Incorrect! Try again.
47Which of the following creates a pipeline in Scikit-Learn that scales data then fits a regressor?