Unit 3 - Practice Quiz

INT423

1 What is the range of the Silhouette Coefficient?

A. 0 to 1
B. -1 to 1
C. 0 to infinity
D. -infinity to 1

2 In the context of the Silhouette Score, what does a value near 0 indicate?

A. The sample is far from other clusters
B. The sample is assigned to the wrong cluster
C. The clusters are overlapping
D. The clustering is perfect

3 Which of the following clustering metrics is an 'Internal' evaluation metric (does not require ground truth labels)?

A. Adjusted Rand Index
B. Davies-Bouldin Index
C. Homogeneity Score
D. V-measure

4 For the Davies-Bouldin Index, which of the following represents a better clustering result?

A. A higher value
B. A value close to 1
C. A lower value
D. A value close to -1

5 How is the Dunn Index calculated?

A. Ratio of maximum inter-cluster distance to minimum intra-cluster distance
B. Ratio of minimum inter-cluster distance to maximum intra-cluster diameter
C. Average distance between all points
D. Sum of squared errors

6 The Adjusted Rand Index (ARI) corrects the Rand Index for:

A. Number of clusters
B. Chance
C. Outliers
D. Data dimensionality

7 What is the maximum possible value for the Adjusted Rand Index (ARI)?

A.
B. 1
C. 100
D. Infinity

8 Which metric is calculated as the harmonic mean of Homogeneity and Completeness?

A. F-measure
B. V-measure
C. Silhouette Score
D. Adjusted Mutual Information

9 A clustering result satisfies 'Homogeneity' if:

A. All members of a given class are assigned to the same cluster
B. Each cluster contains only members of a single class
C. The clusters are spherical
D. The number of clusters equals the number of classes

10 A clustering result satisfies 'Completeness' if:

A. All members of a given class are assigned to the same cluster
B. Each cluster contains only members of a single class
C. The clusters are well separated
D. The entropy of the clusters is zero

11 Normalized Mutual Information (NMI) is a normalization of the Mutual Information (MI) score to scale the results between:

A. -1 and 1
B. 0 and 1
C. 0 and infinity
D. -infinity and infinity

12 Which metric is the geometric mean of the pairwise precision and recall?

A. V-measure
B. Fowlkes-Mallows Index
C. Dunn Index
D. Adjusted Mutual Information

13 What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?

A. It is faster to compute
B. It accounts for chance, especially in small samples or large cluster numbers
C. It does not require ground truth
D. It works better with non-convex clusters

14 In the Silhouette Score formula , what does 'a' represent?

A. The mean distance between a sample and all other points in the same cluster
B. The mean distance between a sample and all points in the nearest neighboring cluster
C. The total number of clusters
D. The variance of the entire dataset

15 Which of the following metrics requires the knowledge of ground truth labels?

A. Silhouette Score
B. Dunn Index
C. Adjusted Rand Index
D. Calinski-Harabasz Index

16 If the Adjusted Rand Index (ARI) is 0.0, what does this imply?

A. Perfect clustering
B. Inverse clustering
C. Random labeling
D. Clustering with 100% error

17 Which index is most sensitive to noise and outliers because it relies on minimum inter-cluster distances and maximum diameters?

A. Silhouette Score
B. Dunn Index
C. V-measure
D. Fowlkes-Mallows Index

18 The Fowlkes-Mallows Index (FMI) ranges from:

A. -1 to 1
B. 0 to 1
C. 0 to 10
D. -infinity to 0

19 If a clustering algorithm produces a Homogeneity score of 1.0 but a Completeness score of 0.5, what does this likely mean?

A. Clusters are pure but classes are split into multiple clusters
B. Classes are mixed but clusters are large
C. The algorithm failed completely
D. There is only one cluster

20 In the Davies-Bouldin Index calculation, the term represents:

A. The ratio of the sum of cluster dispersions to the distance between cluster centroids
B. The product of cluster sizes
C. The absolute difference in cluster densities
D. The distance to the nearest neighbor

21 When is the Mutual Information (MI) between two clusterings equal to 0?

A. When the clusterings are identical
B. When the clusterings are perfectly correlated
C. When the two clusterings are independent
D. When the number of clusters is equal

22 Which of the following statements about V-measure is FALSE?

A. It is symmetric.
B. It requires ground truth labels.
C. It is equivalent to Normalized Mutual Information (arithmetic version).
D. It ranges from -1 to 1.

23 Which metric is generally preferred when you want to compare clustering solutions with different numbers of clusters on the same dataset, to avoid favoring solutions with more clusters?

A. Adjusted Mutual Information (AMI)
B. Raw Mutual Information (MI)
C. Sum of Squared Errors
D. Purity

24 A Silhouette Score of -1 implies that:

A. The sample is in the wrong cluster
B. The sample is in the correct cluster
C. The sample is an outlier
D. The sample is a centroid

25 Which metric is defined using concepts of entropy and conditional entropy?

A. Adjusted Rand Index
B. Silhouette Score
C. Normalized Mutual Information
D. Dunn Index

26 The Adjusted Rand Index (ARI) is symmetric. This means:

A. ARI(A, B) = ARI(B, A)
B. ARI(A, B) = -ARI(B, A)
C. ARI(A, B) = 1 / ARI(B, A)
D. ARI values are always positive

27 In the calculation of Fowlkes-Mallows Index, 'TP' (True Positive) refers to:

A. Pairs of points that are in the same cluster and same class
B. Pairs of points that are in different clusters and different classes
C. Points correctly classified as outliers
D. Centroids correctly identified

28 Which internal metric assumes that clusters are convex and isotropic (spherical)?

A. DBSCAN
B. Silhouette Score
C. Adjusted Rand Index
D. Entropy

29 What is the primary disadvantage of the Davies-Bouldin Index?

A. It requires ground truth
B. It is computationally expensive for small datasets
C. It is limited to spherical clusters
D. It is always negative

30 If the V-measure is used with a Beta value greater than 1, it places more weight on:

A. Homogeneity
B. Completeness
C. Recall
D. Precision

31 Which of the following is NOT a pair-counting based metric?

A. Rand Index
B. Adjusted Rand Index
C. Fowlkes-Mallows Index
D. Normalized Mutual Information

32 For a perfect clustering where predicted clusters exactly match the ground truth classes, the Normalized Mutual Information (NMI) score is:

A. 0.0
B. 0.5
C. 1.0
D. Variable depending on dataset size

33 When computing the Silhouette Score for an entire dataset, one typically takes:

A. The maximum score of any point
B. The minimum score of any point
C. The average of the scores for all samples
D. The median of the scores

34 The Rand Index (RI) is the percentage of:

A. Correct classifications
B. Pairs of data points for which the two clusterings agree
C. Clusters that are pure
D. Information shared

35 Which metric would be most appropriate if the ground truth labels are not available?

A. Adjusted Rand Index
B. Fowlkes-Mallows Index
C. Silhouette Score
D. V-measure

36 A higher Dunn Index indicates:

A. High intra-cluster distance and low inter-cluster distance
B. Low intra-cluster distance and high inter-cluster distance
C. Random clustering
D. Overlapping clusters

37 Which metric is sensitive to the permutation of cluster labels?

A. None of the standard clustering metrics
B. Accuracy (if used naively)
C. Adjusted Rand Index
D. Silhouette Score

38 In the context of Homogeneity and Completeness, if the ground truth consists of a single class, and the clustering algorithm finds 5 clusters:

A. Homogeneity is 1, Completeness is 0
B. Homogeneity is 0, Completeness is 1
C. Homogeneity is 1, Completeness is < 1
D. Both are 1

39 The Adjusted Mutual Information (AMI) is preferred over NMI when:

A. The clusters are very large
B. The number of clusters is small
C. The cluster sizes are unbalanced and small samples are used
D. Computation time is critical

40 Which component of the Silhouette formula corresponds to 'separation'?

A. a (intra-cluster distance)
B. b (nearest-cluster distance)
C. max(a, b)
D. b - a

41 What is the theoretical minimum of the Adjusted Rand Index?

A.
B. -1
C. -0.5
D. It depends on the number of samples

42 Which of the following is a drawback of External Validation metrics like ARI and NMI?

A. They are computationally expensive
B. They require a labeled dataset
C. They cannot handle outliers
D. They are not normalized

43 In the Fowlkes-Mallows Index formula , what is PPV?

A. Precision
B. Recall
C. Accuracy
D. Entropy

44 Which metric essentially measures the similarity between the two partitionings of the data?

A. Silhouette Score
B. Davies-Bouldin Index
C. Adjusted Rand Index
D. Dunn Index

45 If you calculate the Silhouette Score for a dataset with only one cluster, the result is typically defined as:

A.
B. 1
C. -1
D. Undefined or Error

46 The V-measure is to Homogeneity and Completeness as the F1-Score is to:

A. Accuracy and Error
B. Precision and Recall
C. Sensitivity and Specificity
D. TPR and FPR

47 Which clustering metric uses the 'max-min' logic (maximize the minimum distance between clusters)?

A. Dunn Index
B. Davies-Bouldin Index
C. Entropy
D. F-measure

48 Why is the Rand Index (unadjusted) often considered optimistic?

A. It ignores False Positives
B. It does not correct for the agreement that occurs by chance
C. It ranges from 0 to infinity
D. It favors small clusters

49 Completeness score of 1.0 implies:

A. All points in a cluster belong to the same class
B. All points of a specific class are assigned to the same cluster
C. The number of clusters equals the number of classes
D. The clusters are perfectly spherical

50 Which of the following metrics calculates the average similarity between each cluster and its most similar one?

A. Silhouette Score
B. Davies-Bouldin Index
C. Dunn Index
D. Calinski-Harabasz Index