Unit 3 - Practice Quiz

INT423 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the range of the Silhouette Coefficient?

A. 0 to 1
B. 0 to infinity
C. -infinity to 1
D. -1 to 1

2 In the context of the Silhouette Score, what does a value near 0 indicate?

A. The sample is far from other clusters
B. The sample is assigned to the wrong cluster
C. The clusters are overlapping
D. The clustering is perfect

3 Which of the following clustering metrics is an 'Internal' evaluation metric (does not require ground truth labels)?

A. Homogeneity Score
B. V-measure
C. Adjusted Rand Index
D. Davies-Bouldin Index

4 For the Davies-Bouldin Index, which of the following represents a better clustering result?

A. A value close to -1
B. A lower value
C. A value close to 1
D. A higher value

5 How is the Dunn Index calculated?

A. Ratio of maximum inter-cluster distance to minimum intra-cluster distance
B. Average distance between all points
C. Sum of squared errors
D. Ratio of minimum inter-cluster distance to maximum intra-cluster diameter

6 The Adjusted Rand Index (ARI) corrects the Rand Index for:

A. Outliers
B. Number of clusters
C. Chance
D. Data dimensionality

7 What is the maximum possible value for the Adjusted Rand Index (ARI)?

A. 1
B. Infinity
C. 0
D. 100

8 Which metric is calculated as the harmonic mean of Homogeneity and Completeness?

A. V-measure
B. Silhouette Score
C. F-measure
D. Adjusted Mutual Information

9 A clustering result satisfies 'Homogeneity' if:

A. The number of clusters equals the number of classes
B. The clusters are spherical
C. Each cluster contains only members of a single class
D. All members of a given class are assigned to the same cluster

10 A clustering result satisfies 'Completeness' if:

A. Each cluster contains only members of a single class
B. The clusters are well separated
C. The entropy of the clusters is zero
D. All members of a given class are assigned to the same cluster

11 Normalized Mutual Information (NMI) is a normalization of the Mutual Information (MI) score to scale the results between:

A. 0 and 1
B. -1 and 1
C. -infinity and infinity
D. 0 and infinity

12 Which metric is the geometric mean of the pairwise precision and recall?

A. V-measure
B. Adjusted Mutual Information
C. Dunn Index
D. Fowlkes-Mallows Index

13 What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?

A. It works better with non-convex clusters
B. It accounts for chance, especially in small samples or large cluster numbers
C. It does not require ground truth
D. It is faster to compute

14 In the Silhouette Score formula , what does 'a' represent?

A. The mean distance between a sample and all other points in the same cluster
B. The total number of clusters
C. The variance of the entire dataset
D. The mean distance between a sample and all points in the nearest neighboring cluster

15 Which of the following metrics requires the knowledge of ground truth labels?

A. Dunn Index
B. Adjusted Rand Index
C. Calinski-Harabasz Index
D. Silhouette Score

16 If the Adjusted Rand Index (ARI) is 0.0, what does this imply?

A. Perfect clustering
B. Clustering with 100% error
C. Inverse clustering
D. Random labeling

17 Which index is most sensitive to noise and outliers because it relies on minimum inter-cluster distances and maximum diameters?

A. Fowlkes-Mallows Index
B. Silhouette Score
C. V-measure
D. Dunn Index

18 The Fowlkes-Mallows Index (FMI) ranges from:

A. 0 to 1
B. -1 to 1
C. 0 to 10
D. -infinity to 0

19 If a clustering algorithm produces a Homogeneity score of 1.0 but a Completeness score of 0.5, what does this likely mean?

A. The algorithm failed completely
B. Classes are mixed but clusters are large
C. Clusters are pure but classes are split into multiple clusters
D. There is only one cluster

20 In the Davies-Bouldin Index calculation, the term represents:

A. The distance to the nearest neighbor
B. The product of cluster sizes
C. The ratio of the sum of cluster dispersions to the distance between cluster centroids
D. The absolute difference in cluster densities

21 When is the Mutual Information (MI) between two clusterings equal to 0?

A. When the clusterings are perfectly correlated
B. When the clusterings are identical
C. When the two clusterings are independent
D. When the number of clusters is equal

22 Which of the following statements about V-measure is FALSE?

A. It requires ground truth labels.
B. It is equivalent to Normalized Mutual Information (arithmetic version).
C. It ranges from -1 to 1.
D. It is symmetric.

23 Which metric is generally preferred when you want to compare clustering solutions with different numbers of clusters on the same dataset, to avoid favoring solutions with more clusters?

A. Raw Mutual Information (MI)
B. Purity
C. Adjusted Mutual Information (AMI)
D. Sum of Squared Errors

24 A Silhouette Score of -1 implies that:

A. The sample is an outlier
B. The sample is in the correct cluster
C. The sample is a centroid
D. The sample is in the wrong cluster

25 Which metric is defined using concepts of entropy and conditional entropy?

A. Normalized Mutual Information
B. Silhouette Score
C. Dunn Index
D. Adjusted Rand Index

26 The Adjusted Rand Index (ARI) is symmetric. This means:

A. ARI(A, B) = 1 / ARI(B, A)
B. ARI(A, B) = -ARI(B, A)
C. ARI values are always positive
D. ARI(A, B) = ARI(B, A)

27 In the calculation of Fowlkes-Mallows Index, 'TP' (True Positive) refers to:

A. Pairs of points that are in the same cluster and same class
B. Pairs of points that are in different clusters and different classes
C. Centroids correctly identified
D. Points correctly classified as outliers

28 Which internal metric assumes that clusters are convex and isotropic (spherical)?

A. Adjusted Rand Index
B. Entropy
C. Silhouette Score
D. DBSCAN

29 What is the primary disadvantage of the Davies-Bouldin Index?

A. It is always negative
B. It is computationally expensive for small datasets
C. It requires ground truth
D. It is limited to spherical clusters

30 If the V-measure is used with a Beta value greater than 1, it places more weight on:

A. Recall
B. Completeness
C. Homogeneity
D. Precision

31 Which of the following is NOT a pair-counting based metric?

A. Rand Index
B. Adjusted Rand Index
C. Normalized Mutual Information
D. Fowlkes-Mallows Index

32 For a perfect clustering where predicted clusters exactly match the ground truth classes, the Normalized Mutual Information (NMI) score is:

A. Variable depending on dataset size
B. 0.0
C. 0.5
D. 1.0

33 When computing the Silhouette Score for an entire dataset, one typically takes:

A. The minimum score of any point
B. The average of the scores for all samples
C. The maximum score of any point
D. The median of the scores

34 The Rand Index (RI) is the percentage of:

A. Clusters that are pure
B. Pairs of data points for which the two clusterings agree
C. Information shared
D. Correct classifications

35 Which metric would be most appropriate if the ground truth labels are not available?

A. V-measure
B. Silhouette Score
C. Adjusted Rand Index
D. Fowlkes-Mallows Index

36 A higher Dunn Index indicates:

A. Low intra-cluster distance and high inter-cluster distance
B. High intra-cluster distance and low inter-cluster distance
C. Overlapping clusters
D. Random clustering

37 Which metric is sensitive to the permutation of cluster labels?

A. None of the standard clustering metrics
B. Adjusted Rand Index
C. Accuracy (if used naively)
D. Silhouette Score

38 In the context of Homogeneity and Completeness, if the ground truth consists of a single class, and the clustering algorithm finds 5 clusters:

A. Homogeneity is 1, Completeness is < 1
B. Both are 1
C. Homogeneity is 1, Completeness is 0
D. Homogeneity is 0, Completeness is 1

39 The Adjusted Mutual Information (AMI) is preferred over NMI when:

A. The clusters are very large
B. The number of clusters is small
C. Computation time is critical
D. The cluster sizes are unbalanced and small samples are used

40 Which component of the Silhouette formula corresponds to 'separation'?

A. b (nearest-cluster distance)
B. a (intra-cluster distance)
C. max(a, b)
D. b - a

41 What is the theoretical minimum of the Adjusted Rand Index?

A. 0
B. -0.5
C. -1
D. It depends on the number of samples

42 Which of the following is a drawback of External Validation metrics like ARI and NMI?

A. They cannot handle outliers
B. They require a labeled dataset
C. They are not normalized
D. They are computationally expensive

43 In the Fowlkes-Mallows Index formula , what is PPV?

A. Recall
B. Entropy
C. Precision
D. Accuracy

44 Which metric essentially measures the similarity between the two partitionings of the data?

A. Adjusted Rand Index
B. Dunn Index
C. Silhouette Score
D. Davies-Bouldin Index

45 If you calculate the Silhouette Score for a dataset with only one cluster, the result is typically defined as:

A. Undefined or Error
B. -1
C. 0
D. 1

46 The V-measure is to Homogeneity and Completeness as the F1-Score is to:

A. Precision and Recall
B. Sensitivity and Specificity
C. Accuracy and Error
D. TPR and FPR

47 Which clustering metric uses the 'max-min' logic (maximize the minimum distance between clusters)?

A. Entropy
B. Davies-Bouldin Index
C. Dunn Index
D. F-measure

48 Why is the Rand Index (unadjusted) often considered optimistic?

A. It does not correct for the agreement that occurs by chance
B. It ignores False Positives
C. It favors small clusters
D. It ranges from 0 to infinity

49 Completeness score of 1.0 implies:

A. All points of a specific class are assigned to the same cluster
B. The number of clusters equals the number of classes
C. The clusters are perfectly spherical
D. All points in a cluster belong to the same class

50 Which of the following metrics calculates the average similarity between each cluster and its most similar one?

A. Calinski-Harabasz Index
B. Dunn Index
C. Silhouette Score
D. Davies-Bouldin Index