Unit 4 - Practice Quiz

INT394 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the fundamental difference between the target variables in classification and regression problems?

A. Classification predicts discrete class labels, while regression predicts continuous numerical values.
B. Classification predicts continuous values, while regression predicts discrete categories.
C. Both predict continuous values, but regression uses a different loss function.
D. Classification requires unsupervised learning, while regression requires supervised learning.

2 Which of the following scenarios is a regression problem?

A. Grouping customers into segments based on purchasing behavior.
B. Recognizing handwritten digits (0-9).
C. Predicting whether an email is spam or ham.
D. Predicting the price of a house based on its square footage.

3 In Simple Linear Regression, the relationship between the independent variable and the dependent variable is modeled as:

A.
B.
C.
D.

4 Which statement regarding Polynomial Regression is true?

A. It strictly requires non-parametric methods.
B. It cannot be solved using Ordinary Least Squares (OLS).
C. It is a form of linear regression because it is linear in the parameters (coefficients).
D. It is considered a non-linear regression because the curve is non-linear.

5 What happens if the degree of the polynomial in polynomial regression is chosen to be too high?

A. The model will underfit the data (High Bias).
B. The computational cost decreases significantly.
C. The model will generalize better to unseen data.
D. The model will overfit the data (High Variance).

6 Which loss function is most commonly used for Ordinary Least Squares (OLS) regression?

A. Cross-Entropy Loss
B. Hinge Loss
C. Kullback-Leibler Divergence
D. Mean Squared Error (MSE)

7 The Mean Squared Error (MSE) is calculated as:

A.
B.
C.
D.

8 Which loss function is more robust to outliers in a regression problem?

A. Root Mean Squared Error (RMSE)
B. Mean Squared Error (MSE)
C. Mean Absolute Error (MAE)
D. L2 Norm

9 In the context of regression regularization, Lasso Regression adds which penalty term to the loss function?

A. No penalty term
B. L1 penalty (Absolute magnitude of coefficients: )
C. A combination of L1 and L2 penalties
D. L2 penalty (Squared magnitude of coefficients: )

10 What is a defining characteristic of Non-Parametric Regression?

A. It assumes a fixed mathematical form (e.g., a line) with a finite set of parameters.
B. The number of parameters grows with the size of the training data.
C. It requires the data to be normally distributed.
D. It only works for classification problems.

11 In K-Nearest Neighbors (KNN) regression, how is the prediction for a new data point made?

A. By taking the majority vote of the class labels of neighbors.
B. By solving a linear equation .
C. By taking the average (or weighted average) of the target values of the 'K' closest training neighbors.
D. By calculating the probability using Bayes' theorem.

12 Which of the following is true regarding the choice of 'k' in KNN regression?

A. A very large 'k' leads to overfitting (high variance).
B. A very small 'k' (e.g., k=1) leads to high variance (overfitting).
C. A very small 'k' (e.g., k=1) leads to high bias (underfitting).
D. The value of 'k' does not affect the model performance.

13 What is the primary difference between Supervised Learning (Classification/Regression) and Unsupervised Learning (Clustering)?

A. Supervised learning is faster than unsupervised learning.
B. Supervised learning groups data, while unsupervised learning predicts values.
C. Supervised learning requires labeled data (input-output pairs), while unsupervised learning uses unlabeled data.
D. Unsupervised learning always yields better accuracy.

14 The Euclidean distance between two points and is given by:

A.
B.
C.
D.

15 Which distance measure corresponds to the norm and is calculated as the sum of absolute differences?

A. Chebyshev Distance
B. Cosine Distance
C. Euclidean Distance
D. Manhattan Distance

16 Cosine Similarity is particularly useful for:

A. Time series forecasting.
B. Geometric clustering of low-dimensional data.
C. Calculating distance on a grid.
D. Measuring the similarity between text documents (represented as vectors) irrespective of magnitude.

17 The Minkowski distance is a generalization of both Euclidean and Manhattan distances defined as . If , it becomes:

A. Euclidean Distance
B. Mahalanobis Distance
C. Manhattan Distance
D. Chebyshev Distance

18 Which of the following is a Partition-based clustering algorithm?

A. BIRCH
B. Agglomerative Clustering
C. DBSCAN
D. K-Means

19 What is the objective function that the K-Means algorithm tries to minimize?

A. Within-Cluster Sum of Squares (WCSS)
B. The number of clusters
C. Between-Cluster Sum of Squares
D. Silhouette Coefficient

20 Which of the following is a step in the K-Means algorithm?

A. Merging the two closest clusters.
B. Selecting the 'k' nearest neighbors for voting.
C. Assigning points to the nearest cluster centroid.
D. Drawing a separating hyperplane.

21 A major limitation of the standard K-Means algorithm is:

A. It is computationally very expensive for small datasets.
B. It always finds the global optimum.
C. It works well with non-convex cluster shapes.
D. It requires the number of clusters to be specified in advance.

22 How does K-Medoids differ from K-Means?

A. K-Medoids uses actual data points as centers (medoids) and is more robust to outliers.
B. K-Medoids uses the mean of the points as the center.
C. K-Medoids uses Euclidean distance exclusively.
D. K-Medoids is faster than K-Means.

23 Hierarchical clustering can be divided into two main types:

A. Centroid-based and Density-based
B. Agglomerative (Bottom-Up) and Divisive (Top-Down)
C. Supervised and Unsupervised
D. Linear and Non-linear

24 In Agglomerative Hierarchical Clustering, what does 'Single Linkage' measure?

A. The maximum distance between points in two clusters.
B. The distance between the centroids of two clusters.
C. The average distance between all pairs of points in two clusters.
D. The minimum distance between the closest pair of points in two clusters.

25 What is a Dendrogram?

A. A scatter plot of the clusters.
B. A diagram representing the tree structure of hierarchical clustering.
C. A method to calculate the derivative of a function.
D. A plot showing the loss function over iterations.

26 In hierarchical clustering, 'Complete Linkage' uses which distance metric to merge clusters?

A. Maximum distance between points (farthest neighbors).
B. Average distance between points.
C. Distance between centroids.
D. Minimum distance between points (nearest neighbors).

27 Which clustering method does NOT require specifying the number of clusters upfront?

A. K-Means
B. Gaussian Mixture Models
C. K-Medoids
D. Hierarchical Clustering

28 What is the Elbow Method used for?

A. To prevent overfitting in regression.
B. To visualize high-dimensional data.
C. To determine the optimal number of clusters () in K-Means.
D. To calculate the distance between clusters.

29 The Silhouette Score ranges between:

A. -infinity and +infinity
B. -1 and 1
C. 0 and 100
D. 0 and 1

30 A Silhouette Score close to +1 implies:

A. The clustering algorithm failed.
B. The point is well matched to its own cluster and far from neighboring clusters.
C. The point is assigned to the wrong cluster.
D. The point is on or very close to the decision boundary between two neighboring clusters.

31 Which metric is used for cluster validation when ground truth labels are available?

A. Silhouette Score
B. Davies-Bouldin Index
C. Rand Index (or Adjusted Rand Index)
D. Elbow Method

32 In the context of Ridge Regression, as the penalty parameter approaches infinity, the regression coefficients tend towards:

A. The OLS estimates
B. 1
C. Zero
D. Infinity

33 Which regression technique fits a local regression model to a subset of the data surrounding the query point?

A. Logistic Regression
B. Linear Regression
C. LOESS (Locally Estimated Scatterplot Smoothing)
D. Ridge Regression

34 Jaccard Similarity is defined as:

A.
B.
C.
D.

35 K-Means++ is an algorithm used for:

A. Calculating the final centroids.
B. Initializing the cluster centers to improve convergence speed and quality.
C. Determining the value of K automatically.
D. Post-processing the clusters.

36 Which of the following data shapes is K-Means least likely to handle correctly?

A. Clusters with similar variances.
B. Compact, well-separated blobs.
C. Concentric circles (e.g., a donut shape).
D. Spherical clusters of equal size.

37 The Dunn Index is an internal cluster validation metric where a higher value indicates:

A. Loose and overlapping clusters.
B. Compact and well-separated clusters.
C. Poor clustering performance.
D. High computational complexity.

38 Which statement regarding the bias-variance trade-off in regression is correct?

A. Complex non-linear models usually have low bias and high variance.
B. Variance refers to the error on the training set.
C. We want to maximize both bias and variance.
D. Simple linear models usually have low bias and high variance.

39 What is Ward's Method in hierarchical clustering?

A. A divisive method that splits based on density.
B. A method equivalent to single linkage.
C. A method that uses random linkage.
D. An agglomerative method that minimizes the increase in total within-cluster variance when merging.

40 Hamming distance is primarily used for:

A. Geospatial coordinates.
B. Image pixel intensity.
C. Categorical data or strings of equal length.
D. Continuous numerical data.

41 In kernel regression (e.g., Nadaraya-Watson), the 'bandwidth' parameter controls:

A. The smoothness of the fit (width of the kernel window).
B. The number of clusters.
C. The learning rate of the gradient descent.
D. The number of iterations.

42 Which of the following is NOT a metric for calculating the distance between two clusters in hierarchical clustering?

A. Single Linkage
B. Complete Linkage
C. Gradient Descent
D. Average Linkage

43 For a dataset with points, what is the time complexity of one iteration of K-Means with clusters and dimensions?

A.
B.
C.
D.

44 What is the main advantage of Hierarchical Clustering over K-Means?

A. It is computationally faster for large datasets.
B. It scales linearly with the number of data points.
C. It handles missing values natively.
D. It provides a taxonomy/hierarchy of clusters and doesn't require pre-specifying .

45 If a regression model has an (Coefficient of Determination) score of 1.0, it means:

A. The model is underfitting.
B. The model perfectly fits the data.
C. The model is a constant line.
D. The model explains none of the variability of the response data.

46 Which of these is a 'lazy learning' algorithm often used for regression?

A. Linear Regression
B. K-Nearest Neighbors (KNN)
C. K-Means
D. Ridge Regression

47 In the context of clustering, what is 'inter-cluster distance'?

A. The distance from a point to the origin.
B. The distance between different clusters.
C. The distance between points within the same cluster.
D. The sum of squared errors.

48 When using the Manhattan distance, the set of points at a constant distance from the origin forms a:

A. Sphere
B. Hyperbola
C. Square (rotated 45 degrees)
D. Circle

49 Which statement regarding outlier sensitivity is correct?

A. K-Means is less sensitive to outliers than K-Medoids.
B. Median-based methods are more sensitive to outliers than Mean-based methods.
C. K-Means is sensitive to outliers because the mean is influenced by extreme values.
D. Least Squares Regression is robust to outliers.

50 What is the 'Kernel Trick' in the context of non-linear regression (e.g., Support Vector Regression)?

A. Mapping data to a higher-dimensional space to make it linearly separable/fittable without explicitly calculating coordinates.
B. A method to reduce dimensionality.
C. Using a GPU kernel for faster processing.
D. Ignoring non-linear data points.