Unit 6 - Practice Quiz

CSE274 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 Which of the following best describes the primary goal of unsupervised learning?

A. To predict a continuous target variable based on input features
B. To discover hidden patterns or structures in unlabeled data
C. To classify data points into predefined categories using labeled training data
D. To optimize a reward signal in an interactive environment

2 What is the fundamental difference between supervised and unsupervised learning regarding input data?

A. Supervised learning uses unlabeled data, while unsupervised learning uses labeled data.
B. Supervised learning requires data with target labels (), whereas unsupervised learning uses only input features ().
C. Supervised learning is only for regression, while unsupervised learning is only for classification.
D. There is no difference; the terms refer to the algorithm's complexity.

3 Given two points and , which formula represents the Euclidean distance?

A.
B.
C.
D.

4 Which distance metric is also known as the norm or 'Taxicab' geometry?

A. Euclidean Distance
B. Mahalanobis Distance
C. Manhattan Distance
D. Cosine Distance

5 In which scenario is Cosine Distance (or Similarity) generally preferred over Euclidean Distance?

A. When calculating physical distances between locations on a map
B. When analyzing high-dimensional sparse data like text documents where magnitude matters less than orientation
C. When features are continuous variables with low dimensionality
D. When the data follows a strict Gaussian distribution

6 The K-Means clustering algorithm attempts to minimize which of the following objective functions?

A. The sum of absolute differences between points
B. The Within-Cluster Sum of Squares (WCSS) or Inertia
C. The distance between the farthest points in different clusters
D. The silhouette coefficient

7 Which of the following is a key limitation of the K-Means algorithm?

A. It requires the number of clusters to be specified in advance.
B. It can only handle text data.
C. It is computationally more expensive than hierarchical clustering for small datasets.
D. It always finds the global optimum regardless of initialization.

8 In the Elbow Method for choosing optimal , what feature of the plot indicates the best ?

A. The point where the curve reaches zero
B. The point where the curve peaks (maximum value)
C. The point of inflection where the rate of decrease in Inertia slows down significantly
D. The point where the curve becomes a straight vertical line

9 What is the primary difference between K-Means and K-Medoids?

A. K-Means uses actual data points as centers; K-Medoids uses the mean.
B. K-Means uses the mean of points as the centroid; K-Medoids restricts the center to be an actual data point.
C. K-Means works better with outliers than K-Medoids.
D. K-Medoids uses Euclidean distance exclusively, while K-Means uses Manhattan.

10 Why is K-Medoids generally considered more robust to outliers than K-Means?

A. Because it minimizes squared errors which penalize outliers heavily.
B. Because it uses medoids (actual points) and often Manhattan distance, reducing the influence of extreme values compared to means and squared errors.
C. Because it automatically removes outliers during processing.
D. Because it requires a density parameter.

11 Which of the following describes Agglomerative Hierarchical Clustering?

A. A top-down approach where all points start in one cluster and are recursively split.
B. A bottom-up approach where each point starts as its own cluster and pairs are merged iteratively.
C. A density-based approach relying on epsilon neighborhoods.
D. A centroid-based approach requiring a fixed .

12 In hierarchical clustering, what does the Single Linkage criterion measure to determine the distance between two clusters?

A. The distance between the centroids of the two clusters.
B. The maximum distance between any single point in one cluster and any point in the other.
C. The minimum distance between the closest pair of points, one from each cluster.
D. The average distance between all pairs of points in the two clusters.

13 Which linkage criterion in hierarchical clustering tends to produce compact, spherical clusters by minimizing the increase in variance when merging?

A. Single Linkage
B. Complete Linkage
C. Average Linkage
D. Ward's Method

14 What is a Dendrogram?

A. A 3D scatter plot of clusters.
B. A tree-like diagram that records the sequences of merges or splits in hierarchical clustering.
C. A graph showing the Elbow method.
D. A density map for DBSCAN.

15 What are the two main hyperparameters required for the DBSCAN algorithm?

A. Number of clusters () and random seed
B. Epsilon () and Minimum Points ()
C. Learning rate and batch size
D. Number of iterations and tolerance

16 In DBSCAN, a point is classified as a Core Point if:

A. It is reachable from another point but has fewer than neighbors.
B. It has at least neighbors within radius .
C. It is the centroid of the cluster.
D. It is the farthest point from the center.

17 One significant advantage of DBSCAN over K-Means is:

A. It is faster for high-dimensional data.
B. It can discover clusters of arbitrary shapes and identify outliers.
C. It works without any hyperparameters.
D. It always assigns every point to a cluster.

18 In the context of Anomaly Detection, what does an 'anomaly' or 'outlier' typically represent?

A. A data point that perfectly fits the trend.
B. A data point that deviates significantly from the majority of the data.
C. A missing value in the dataset.
D. The cluster center.

19 Which of the following is a density-based assumption often used in Anomaly Detection?

A. Anomalies lie in high-density regions.
B. Normal data lies in low-density regions.
C. Anomalies lie in low-density regions compared to normal data.
D. Density is irrelevant for anomaly detection.

20 The Silhouette Score ranges between:

A. 0 and 1
B. -1 and 1
C. 0 and
D. -100 and 100

21 If a data point has a Silhouette Coefficient close to -1, what does this indicate?

A. The point is well-clustered.
B. The point is on the decision boundary.
C. The point is likely assigned to the wrong cluster.
D. The point is the centroid.

22 The Davies–Bouldin Index evaluates clustering algorithms based on:

A. Only the compactness of clusters.
B. Only the separation between clusters.
C. The ratio of within-cluster scatter to between-cluster separation.
D. The absolute number of clusters created.

23 When comparing two clustering models using the Davies–Bouldin Index, which model is preferred?

A. The model with the higher index value.
B. The model with the lower index value.
C. The model with the value closest to 1.
D. The index cannot be used to compare models.

24 Calculate the Manhattan distance between points and .

A. 3
B. 6
C.
D. 9

25 What is the Curse of Dimensionality regarding distance metrics in unsupervised learning?

A. As dimensions increase, distance calculation becomes faster.
B. As dimensions increase, all points tend to become equidistant, making distance metrics less meaningful.
C. High dimensions make K-Means converge instantly.
D. Dimensions effectively reduce to 2D automatically.

26 Which clustering algorithm would be most appropriate for a dataset containing two concentric circles (a ring inside another ring)?

A. K-Means
B. K-Medoids
C. DBSCAN
D. Gaussian Mixture Models (standard)

27 In Hierarchical Clustering, the Complete Linkage criterion is susceptible to:

A. Chaining effect
B. Sensitivity to outliers
C. Producing very large clusters
D. Ignoring the number of points

28 What is the definition of Inertia in the context of clustering evaluation?

A. Sum of squared distances of samples to their closest cluster center.
B. Sum of distances between cluster centers.
C. Ratio of intra-cluster distance to inter-cluster distance.
D. The density of the densest cluster.

29 Why is Inertia not a normalized metric?

A. It ranges from 0 to 1.
B. It decreases as the number of clusters increases, potentially reaching zero if .
C. It is unaffected by the scale of the data.
D. It increases as the model gets better.

30 Which of the following requires Feature Scaling (Normalization/Standardization) the most?

A. Decision Trees
B. Random Forests
C. K-Means Clustering
D. Naive Bayes

31 In the K-Means algorithm, the 'assignment' step involves:

A. Moving the centroids to the average of the points.
B. Assigning each point to the nearest centroid.
C. Randomly picking points.
D. Calculating the silhouette score.

32 What is a 'Noise point' in DBSCAN?

A. A point with more than neighbors.
B. A point that is reachable from a core point but has few neighbors itself.
C. A point that is neither a core point nor reachable from a core point.
D. The initial starting point of the algorithm.

33 The Minkowski Distance is a generalization of which distances?

A. Only Euclidean
B. Only Manhattan
C. Euclidean () and Manhattan ()
D. Cosine and Jaccard

34 Which of these is a divisive hierarchical clustering method?

A. DIANA (Divisive Analysis)
B. AGNES (Agglomerative Nesting)
C. K-Means
D. DBSCAN

35 For a silhouette score , what does represent?

A. The distance to the nearest cluster centroid.
B. The mean distance between point and all other points in the same cluster.
C. The mean distance between point and all points in the nearest neighboring cluster.
D. The total variance of the dataset.

36 What is the main advantage of K-Means++ over standard random initialization in K-Means?

A. It avoids the need to select .
B. It spreads out the initial centroids to speed up convergence and avoid poor local optima.
C. It allows the algorithm to find non-convex clusters.
D. It uses Manhattan distance instead of Euclidean.

37 If the Dunn Index is used for evaluation, a higher value indicates:

A. Better clustering (compact and well-separated).
B. Worse clustering (overlapping and dispersed).
C. More clusters.
D. Fewer clusters.

38 Which algorithm is an example of Soft Clustering (where points belong to clusters with probabilities)?

A. K-Means
B. DBSCAN
C. Gaussian Mixture Models (GMM)
D. Single Linkage Hierarchical Clustering

39 In hierarchical clustering, if we want to stop merging when clusters are too far apart, we can:

A. Increase the learning rate.
B. Cut the dendrogram at a specific height (distance threshold).
C. Set .
D. Use the Elbow method.

40 Which of the following is an application of Anomaly Detection?

A. Customer segmentation for marketing.
B. Credit card fraud detection.
C. Document classification.
D. Predicting house prices.

41 Calculate the squared Euclidean distance between and .

A. 1
B.
C. 3
D. 9

42 How does the Average Linkage criterion calculate distance between Cluster A and Cluster B?

A. Distance between centroids.
B. Average of all pairwise distances between points in A and points in B.
C. Average of the minimum and maximum distances.
D. Distance between the two most central points.

43 Which of the following distance metrics satisfies the Triangle Inequality?

A. Euclidean Distance
B. Squared Euclidean Distance
C. Kullback-Leibler Divergence
D. Cosine Similarity

44 In the context of K-Means, if equals the number of data points , what is the value of Inertia?

A. Infinity
B. 1
C. 0
D. N

45 Which of the following suggests that a dataset has no cluster tendency (i.e., data is uniformly distributed)?

A. A Hopkins statistic close to 0.5.
B. A Hopkins statistic close to 0 or 1 (depending on definition, usually towards 1 implies clustering).
C. A high Silhouette score.
D. A distinct Elbow in the Inertia plot.

46 In DBSCAN, what is a Border Point?

A. A point with fewer than neighbors but lies within the radius of a Core Point.
B. A point with more than neighbors.
C. A noise point.
D. The geometric center of the cluster.

47 Which distance metric is scale-invariant and accounts for the correlations between variables?

A. Euclidean Distance
B. Mahalanobis Distance
C. Manhattan Distance
D. Chebyshev Distance

48 When interpreting a Silhouette plot, if most bars are short and some have negative scores, the clustering is:

A. Excellent
B. Appropriate
C. Poor/Suboptimal
D. Overfitted

49 Which unsupervised learning technique is best suited for reducing the number of variables while retaining variance, often used before clustering?

A. Principal Component Analysis (PCA)
B. Linear Regression
C. Random Forest
D. DBSCAN

50 If you are using K-Means to cluster customers based on 'Annual Income' (range 10k-100k) and 'Age' (range 20-70), what happens if you do not normalize?

A. Age will dominate the clustering.
B. Annual Income will dominate the clustering.
C. Both will contribute equally.
D. The algorithm will fail to run.