Unit 4 - Practice Quiz

INT395 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 In the context of supervised learning, what distinguishes a regression problem from a classification problem?

A. The target variable is continuous.
B. The input features must be categorical.
C. Regression requires unsupervised data.
D. The target variable is categorical.

2 Which visualization tool is most commonly used in Exploratory Data Analysis (EDA) to visualize the linear relationship between a single feature and the target variable?

A. Scatter plot
B. Pie chart
C. Histogram
D. Box plot

3 When analyzing the relationship between multiple variables in a dataset, which matrix helps quantify the linear correlation between every pair of features?

A. Covariance matrix
B. Confusion matrix
C. Correlation matrix
D. Hessian matrix

4 In a simple linear regression model , what does represent?

A. The residual error
B. The slope of the line
C. The learning rate
D. The y-intercept

5 Which Scikit-Learn class is used to perform Ordinary Least Squares (OLS) linear regression?

A. sklearn.linear_model.SGDRegressor
B. sklearn.linear_model.Lasso
C. sklearn.linear_model.LinearRegression
D. sklearn.linear_model.Ridge

6 What is the objective function that Ordinary Least Squares (OLS) minimizes?

A. Sum of Squared Errors (SSE)
B. Hinge Loss
C. Cross-Entropy Loss
D. Mean Absolute Error (MAE)

7 What is the primary motivation for using the RANSAC (RANdom SAmple Consensus) algorithm in regression?

A. To fit a model in the presence of a significant number of outliers.
B. To increase the speed of training.
C. To handle missing values automatically.
D. To perform feature selection.

8 In the RANSAC algorithm, what does the 'residual_threshold' parameter define?

A. The maximum number of iterations.
B. The learning rate of the estimator.
C. The minimum number of samples required to fit the model.
D. The maximum residual for a data sample to be classified as an inlier.

9 Which metric is calculated as ?

A. Coefficient of Determination
B. Mean Squared Error
C. Explained Variance Score
D. Mean Absolute Error

10 If an score is 1.0, what does this indicate about the regression model?

A. The model is underfitting.
B. The model is a constant line.
C. The model perfectly fits the data.
D. The model explains none of the variability of the response data.

11 Why is Mean Squared Error (MSE) often preferred over Mean Absolute Error (MAE) for optimization?

A. MSE is always smaller than MAE.
B. MSE has the same unit as the target variable.
C. MSE is differentiable everywhere, making gradient-based optimization easier.
D. MSE is robust to outliers.

12 To perform Polynomial Regression using a linear model in Scikit-Learn, which transformer must be applied first?

A. SimpleImputer
B. OneHotEncoder
C. StandardScaler
D. PolynomialFeatures

13 What is the main risk associated with using a high-degree polynomial in regression?

A. Underfitting
B. Convergence failure
C. High bias
D. Overfitting

14 Which of the following techniques helps reduce overfitting in regression models by adding a penalty term to the loss function?

A. Normalization
B. Augmentation
C. Regularization
D. Standardization

15 Ridge regression minimizes the sum of squared residuals plus a penalty term based on:

A. The maximum coefficient value.
B. The number of non-zero coefficients.
C. The sum of absolute values of coefficients ( norm).
D. The sum of squared values of coefficients ( norm).

16 Which property makes Lasso regression useful for feature selection?

A. It forces some coefficients to become exactly zero.
B. It works best when .
C. It increases the magnitude of coefficients.
D. It shrinks coefficients uniformly.

17 In Scikit-Learn, which regression model combines both and regularization penalties?

A. Lasso
B. BayesianRidge
C. ElasticNet
D. Ridge

18 In regularized regression, what is the role of the hyperparameter (or )?

A. It controls the learning rate.
B. It sets the intercept to zero.
C. It determines the degree of the polynomial.
D. It controls the strength of the regularization penalty.

19 Why is feature scaling (e.g., Standardization) important before applying Ridge or Lasso regression?

A. To ensure the target variable is normally distributed.
B. To convert categorical data to numeric.
C. Because the penalty term is sensitive to the scale of the coefficients.
D. To remove missing values.

20 Support Vector Regression (SVR) tries to fit as many data points as possible within a margin of width:

A.
B. (epsilon)
C.
D. Zero

21 In Support Vector Regression, what is the role of the kernel function?

A. To calculate the error metric.
B. To select the best features.
C. To map the input data into a higher-dimensional feature space to handle non-linearity.
D. To normalize the target variable.

22 Which parameter in SVR controls the trade-off between the smoothness of the decision function and the tolerance for training errors?

A. Degree
B. Kernel
C. Gamma
D. C

23 What is the primary criterion used by Decision Tree Regressors to split a node?

A. MSE (Variance reduction)
B. Log-Loss
C. Gini Impurity
D. Information Gain

24 One major advantage of Decision Tree Regression is:

A. It never overfits.
B. It creates smooth, continuous prediction curves.
C. It does not require feature scaling or normalization.
D. It always extrapolates well.

25 What is a characteristic behavior of a Decision Tree Regressor when predicting values outside the range of the training data?

A. It automatically creates a polynomial fit.
B. It extrapolates linearly.
C. It returns a null value.
D. It predicts the average of the closest training samples (constant prediction).

26 Random Forest Regression improves upon a single Decision Tree by utilizing which technique?

A. Pruning
B. Gradient Boosting
C. Kernel trick
D. Bagging (Bootstrap Aggregating)

27 In a Random Forest Regressor, how is the final prediction determined?

A. Majority vote of the trees.
B. The prediction of the tree with the highest accuracy.
C. Weighted sum of the features.
D. Average of the predictions of all individual trees.

28 Which parameter in RandomForestRegressor determines the number of trees in the forest?

A. min_samples_split
B. max_depth
C. bootstrap
D. n_estimators

29 Random Forests introduce randomness in two ways: bootstrap sampling and:

A. Randomly shuffling the target labels.
B. Selecting a random subset of features at each split.
C. Random initialization of weights.
D. Randomly pruning the trees.

30 What is the 'Out-of-Bag' (OOB) score in Random Forests?

A. The accuracy on the test set.
B. The error rate of the worst tree.
C. A validation score calculated using the samples not included in the bootstrap sample for each tree.
D. The training error of the full ensemble.

31 Which Scikit-Learn function splits a dataset into training and testing sets?

A. sklearn.model_selection.train_test_split
B. sklearn.preprocessing.train_test
C. sklearn.metrics.split_data
D. sklearn.model_selection.cross_val_score

32 In the equation for ElasticNet: , what does (or l1_ratio in scikit-learn) control?

A. The degree of the polynomial.
B. The tolerance for stopping criteria.
C. The overall regularization strength.
D. The mix between Ridge and Lasso regularization.

33 A residual plot shows the residuals on the y-axis and the predicted values on the x-axis. What pattern indicates a good regression model?

A. Points randomly scattered around the horizontal axis (zero).
B. A clear U-shape curve.
C. A funnel shape (heteroscedasticity).
D. A linear trend.

34 When using SGDRegressor from Scikit-Learn, which hyperparameter defines the update rule schedule (how the learning rate changes over time)?

A. learning_rate
B. loss
C. penalty
D. alpha

35 Which of the following is an intrinsic weakness of Linear Regression?

A. It is difficult to interpret.
B. It requires categorical features.
C. It is computationally expensive.
D. It cannot model non-linear relationships without feature engineering.

36 In the context of regression metrics, what does Median Absolute Error provide that Mean Absolute Error does not?

A. Differentiability.
B. Robustness to outliers.
C. Percentage error calculation.
D. Squared penalization.

37 What is Multicollinearity?

A. When the target variable is categorical.
B. When the training data is too small.
C. When independent features are highly correlated with each other.
D. When the model has too many polynomial features.

38 How does DecisionTreeRegressor handle missing values in Scikit-Learn (standard implementation)?

A. It handles them natively.
B. It ignores the rows with missing values.
C. It requires imputation (filling missing values) before training.
D. It treats them as a separate category.

39 Which plot is typically used to inspect if the residuals follow a normal distribution?

A. Box plot
B. Bar chart
C. Scatter plot
D. Q-Q (Quantile-Quantile) plot

40 In Polynomial Regression, if you increase the degree of the polynomial significantly, the model becomes:

A. Less flexible.
B. More biased.
C. More complex with higher variance.
D. Linear.

41 What is the result of applying fit_transform on the training data and then transform on the test data during scaling?

A. Incorrect scaling.
B. Overfitting.
C. Data leakage.
D. Correct application of preprocessing parameters learnt from training to test data.

42 Which Scikit-Learn attribute holds the estimated coefficients for a Linear Regression model after fitting?

A. model.weights_
B. model.params_
C. model.coef_
D. model.intercept_

43 What is the analytical solution to find the optimal weights for Linear Regression called?

A. Gradient Descent
B. Coordinate Descent
C. Backpropagation
D. The Normal Equation

44 When using Support Vector Regression with an RBF kernel, what happens if the parameter (gamma) is very large?

A. The model behaves like a linear regression.
B. The influence of each training example reaches very far.
C. The model becomes a flat line.
D. The influence of each training example is limited to a close radius, leading to overfitting.

45 Why might one use Adjusted instead of standard ?

A. To account for the number of predictors, penalizing the addition of useless features.
B. To calculate error in absolute terms.
C. To handle categorical variables.
D. To ensure the score is always positive.

46 In the context of Bias-Variance tradeoff, a simple linear model with few features typically has:

A. Low Bias and Low Variance
B. High Bias and High Variance
C. High Bias and Low Variance
D. Low Bias and High Variance

47 Which of the following creates a pipeline in Scikit-Learn that scales data then fits a regressor?

A. sklearn.model_selection.cross_val_score(LinearRegression())
B. sklearn.pipeline.make_pipeline(StandardScaler(), LinearRegression())
C. sklearn.compose.ColumnTransformer()
D. sklearn.linear_model.LinearRegression(normalize=True)

48 What is the interpretation of the slope coefficient in the model ?

A. The correlation between and .
B. The change in for a one-unit increase in .
C. The percentage change in .
D. The value of when .

49 Which regression algorithm constructs a model based on the principle of 'recursive binary splitting'?

A. Ridge Regression
B. Decision Tree Regression
C. Support Vector Regression
D. Linear Regression

50 When interpreting a heatmap of a correlation matrix, a value of -0.9 between two features indicates:

A. No linear relationship.
B. A strong negative linear relationship.
C. A strong positive linear relationship.
D. A weak negative linear relationship.