Unit 4 - Practice Quiz

CSE274 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following best describes the fundamental difference between Regression and Classification?

A. Regression predicts discrete class labels, while classification predicts continuous values.
B. Regression predicts continuous quantities, while classification predicts discrete class labels.
C. Regression uses unsupervised learning, while classification uses supervised learning.
D. Regression requires categorical features, while classification requires numerical features.

2 Which loss function is most commonly used for Simple Linear Regression?

A. Cross-Entropy Loss
B. Hinge Loss
C. Mean Squared Error (MSE)
D. Kullback-Leibler Divergence

3 In the context of the Bias-Variance trade-off, what does high bias usually indicate?

A. The model is overfitting the training data.
B. The model is too complex and captures random noise.
C. The model is underfitting and fails to capture the underlying trend.
D. The model has high variance across different training sets.

4 Which evaluation metric is defined as ?

A. Root Mean Squared Error (RMSE)
B. R-squared ()
C. Mean Absolute Error (MAE)
D. Adjusted R-squared

5 In Simple Linear Regression represented by , what does represent?

A. The y-intercept when .
B. The irreducible error term.
C. The change in for a one-unit increase in .
D. The variance of the error term.

6 What happens to the variance of a regression model as model complexity increases?

A. Variance decreases.
B. Variance increases.
C. Variance remains constant.
D. Variance becomes zero.

7 Which of the following is a key assumption of Ordinary Least Squares (OLS) regression?

A. The relationship between X and Y is non-linear.
B. The residuals (errors) have constant variance (Homoscedasticity).
C. The independent variables must be highly correlated with each other.
D. The target variable must be categorical.

8 In Multiple Linear Regression, what problem arises when independent variables are highly correlated with each other?

A. Homoscedasticity
B. Multicollinearity
C. Underfitting
D. Non-stationarity

9 How does Ridge Regression (L2 regularization) modify the OLS cost function?

A. It adds the sum of the absolute values of coefficients:
B. It adds the sum of the squared values of coefficients:
C. It multiplies the error by a factor of .
D. It adds a constant bias term to the prediction.

10 What is the primary advantage of Lasso Regression (L1 regularization) over Ridge Regression?

A. It always produces a higher score.
B. It is computationally faster to solve.
C. It can force coefficients to exactly zero, performing feature selection.
D. It works better when multicollinearity is not present.

11 The combination of L1 and L2 regularization is known as:

A. Ridge Regression
B. Lasso Regression
C. Elastic Net
D. Logistic Regression

12 If the regularization parameter is set to 0 in Ridge or Lasso regression, the model becomes equivalent to:

A. A constant mean predictor.
B. Ordinary Least Squares (OLS) Linear Regression.
C. A Decision Tree Regressor.
D. Support Vector Regression.

13 What is the impact of increasing the regularization parameter on model bias and variance?

A. Bias increases, Variance decreases.
B. Bias decreases, Variance increases.
C. Both Bias and Variance increase.
D. Both Bias and Variance decrease.

14 Polynomial regression creates a non-linear decision boundary by:

A. Using a non-linear loss function.
B. Transforming the input features into higher-degree terms and fitting a linear model.
C. Using a tree-based structure.
D. Applying a sigmoid activation function to the output.

15 In a regression model , how is the coefficient interpreted?

A. The value of when is zero.
B. The change in for a one-unit change in , holding constant.
C. The correlation between and .
D. The change in for a one-unit change in , regardless of .

16 Why is 'Adjusted ' often preferred over standard '' in Multiple Linear Regression?

A. Standard cannot be calculated for multiple variables.
B. Standard decreases when useful variables are added.
C. Standard never decreases when a new variable is added, even if it is irrelevant.
D. Adjusted is always between 0 and 1.

17 Which of the following is a characteristic of Tree-Based Regression models (e.g., Decision Trees)?

A. They assume a linear relationship between features and target.
B. They require feature scaling (normalization/standardization).
C. They produce piecewise constant predictions.
D. They cannot handle categorical variables.

18 In a Decision Tree Regressor, what criterion is typically minimized to determine the best split?

A. Gini Impurity
B. Entropy
C. Variance (or MSE) of the target values in the child nodes.
D. Log-Likelihood

19 What is 'lag' in the context of Time-Series Regression?

A. The time taken to train the model.
B. Using past values of the target variable () as features to predict the current value ().
C. The error between the predicted and actual value.
D. The seasonality period of the data.

20 Which issue is specific to Time-Series Regression that violates the standard independent and identically distributed (i.i.d) assumption?

A. Multicollinearity
B. Autocorrelation
C. Heteroscedasticity
D. High Dimensionality

21 Consider the equation . This is an example of:

A. Multiple Linear Regression
B. Polynomial Regression
C. Logistic Regression
D. Ridge Regression

22 When interpreting coefficients, if a variable has a P-value < 0.05, it generally means:

A. The variable has no effect on the target.
B. The variable is statistically significant in predicting the target.
C. The variable is highly correlated with other variables.
D. The variable causes overfitting.

23 What is the main risk of using a high-degree polynomial for regression without regularization?

A. High Bias
B. Low Variance
C. Overfitting
D. Underfitting

24 Which method is commonly used to select the optimal regularization parameter ?

A. Gradient Descent
B. Cross-Validation
C. Maximum Likelihood Estimation
D. Calculating the derivative of the loss function

25 What is the closed-form solution (Normal Equation) for the coefficient vector in OLS?

A.
B.
C.
D.

26 Which regression technique is most robust to outliers?

A. Ordinary Least Squares (OLS)
B. L2 Regularized Regression (Ridge)
C. Decision Tree Regression
D. Polynomial Regression (High degree)

27 In Time-Series regression, why is random shuffling of data for Train/Test split inappropriate?

A. It is computationally expensive.
B. It introduces look-ahead bias (data leakage).
C. It increases the variance of the model.
D. It reduces the dimensionality of the data.

28 If the Residual Plot (residuals vs. predicted values) shows a funnel shape (fanning out), what assumption is violated?

A. Linearity
B. Independence
C. Homoscedasticity
D. Normality

29 Which of the following creates a 'stair-step' approximation of the regression function?

A. Linear Regression
B. Ridge Regression
C. Decision Trees
D. Polynomial Regression

30 What is the effect of feature scaling (Standardization) on Ordinary Least Squares (OLS) regression coefficients?

A. It changes the predictions ().
B. It changes the interpretation and magnitude of coefficients but not the model performance metrics ().
C. It is strictly required for OLS to work mathematically.
D. It prevents overfitting.

31 Why is feature scaling essential for Regularized Regression (Ridge/Lasso)?

A. Because the penalty term is scale-dependent.
B. Because it uses Gradient Descent.
C. Because the closed-form solution doesn't exist.
D. To remove outliers.

32 The Root Mean Squared Error (RMSE) is in the same unit as:

A. The square of the target variable.
B. The target variable.
C. The variance of the target variable.
D. Dimensionless (no units).

33 Which technique is specifically designed to handle seasonality in Time-Series Regression?

A. Adding polynomial features.
B. Using seasonal dummy variables or Fourier terms.
C. Increasing the regularization parameter.
D. Using Lasso regression.

34 In the equation , the term is called:

A. A quadratic term.
B. An interaction term.
C. A bias term.
D. A regularization term.

35 What does a negative coefficient for a predictor variable imply?

A. There is no relationship.
B. As the predictor increases, the target variable tends to decrease.
C. As the predictor increases, the target variable tends to increase.
D. The predictor is an error term.

36 The term 'Residual' in regression refers to:

A. The predicted value.
B. The difference between the actual value and the predicted value ().
C. The mean of the target variable.
D. The intercept of the regression line.

37 Which algorithm creates an ensemble of regression trees to improve performance and reduce variance?

A. Simple Linear Regression
B. Random Forest Regression
C. Logistic Regression
D. K-Means Clustering

38 Stationarity in time-series data implies:

A. The mean and variance change over time.
B. The statistical properties (mean, variance) are constant over time.
C. The data is strictly linear.
D. There are no missing values.

39 Which feature expansion method can approximate any continuous function given a high enough degree?

A. One-hot encoding
B. Polynomial feature expansion
C. Standard scaling
D. Interaction only expansion

40 If your regression model has a training MSE of 10 and a testing MSE of 50, the model is likely:

A. Underfitting
B. Overfitting
C. Ideally fit
D. Biased

41 Lasso regression solves a convex optimization problem. The constraint region for Lasso is shaped like a:

A. Circle/Sphere
B. Diamond/Polyhedron
C. Hyperbola
D. Parabola

42 Ridge regression constraint region is shaped like a:

A. Circle/Sphere
B. Diamond
C. Square
D. Triangle

43 In the context of Decision Tree Regression, what is 'Pruning'?

A. Adding more features to the dataset.
B. Removing branches from the tree to reduce complexity and overfitting.
C. Increasing the depth of the tree.
D. Changing the loss function.

44 Which of the following is NOT a metric for Regression?

A. Mean Squared Error
B. Accuracy
C. Mean Absolute Error
D. R-squared

45 When performing Polynomial Regression, if the degree is set too low (e.g., for curved data), the model suffers from:

A. High Variance
B. High Bias
C. Multicollinearity
D. Overfitting

46 What is the key difference between Gradient Boosting Regressors and Random Forests?

A. Random Forests are sequential; Gradient Boosting is parallel.
B. Random Forests build trees independent of each other; Gradient Boosting builds trees sequentially to correct previous errors.
C. Random Forests cannot handle regression.
D. Gradient Boosting increases variance.

47 Which of the following transformations helps in handling non-linear relationships in a Linear Regression framework?

A. Log transformation of the target or features.
B. Min-Max Scaling.
C. Standardization.
D. Ridge Regularization.

48 In Elastic Net, if the mixing parameter (where penalty is ), the model becomes:

A. Ridge Regression
B. Lasso Regression
C. OLS
D. None of the above

49 What does the intercept term represent geometrically in Simple Linear Regression?

A. The slope of the line.
B. The point where the regression line crosses the Y-axis.
C. The point where the regression line crosses the X-axis.
D. The mean of the X values.

50 Which technique decomposes a time series into Trend, Seasonality, and Residual components?

A. Polynomial Expansion
B. Seasonal Decomposition
C. Regularization
D. Gradient Descent