Unit 3 - Practice Quiz

GEO295

1 What is the range of values for the Silhouette Score?

A. 0 to 1
B. -1 to 1
C. 0 to infinity
D. -infinity to infinity

2 Which of the following metrics does NOT require ground truth labels (true class labels) to evaluate clustering performance?

A. Adjusted Rand Index
B. Normalized Mutual Information
C. Silhouette Score
D. Fowlkes-Mallows Index

3 In the context of the Davies-Bouldin Index, a lower value indicates:

A. Better clustering
B. Worse clustering
C. High computational cost
D. Overfitting

4 The Dunn Index is defined as the ratio of:

A. Maximum inter-cluster distance to minimum intra-cluster distance
B. Minimum inter-cluster distance to maximum intra-cluster distance
C. Mean intra-cluster distance to mean inter-cluster distance
D. Variance of clusters to bias of clusters

5 Which clustering metric corrects the Rand Index for chance?

A. Fowlkes-Mallows Index
B. Adjusted Rand Index (ARI)
C. Silhouette Score
D. Completeness Score

6 What does a Homogeneity score of 1.0 imply?

A. All clusters contain data points from only a single class
B. All data points of a specific class are assigned to the same cluster
C. The number of clusters equals the number of samples
D. The clusters overlap significantly

7 If all data points belonging to a given class are elements of the same cluster, which metric is maximized?

A. Homogeneity
B. Completeness
C. Silhouette Score
D. Dunn Index

8 The V-measure is the harmonic mean of which two metrics?

A. Precision and Recall
B. Homogeneity and Completeness
C. Silhouette and Dunn Index
D. ARI and NMI

9 Which metric is calculated as the geometric mean of pairwise precision and pairwise recall?

A. Fowlkes-Mallows Index
B. Adjusted Mutual Information
C. V-measure
D. Davies-Bouldin Index

10 Normalized Mutual Information (NMI) is often preferred over Mutual Information (MI) because:

A. NMI does not require ground truth
B. MI is computationally too expensive
C. NMI scales the result between 0 and 1, making it comparable across datasets
D. MI yields negative values

11 What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?

A. AMI is faster to compute
B. AMI accounts for chance (randomness) in cluster assignment
C. AMI works without ground truth
D. AMI can handle negative values

12 In the Silhouette Score formula s = (b - a) / max(a, b), what does 'a' represent?

A. The distance to the nearest cluster centroid
B. The mean intra-cluster distance (average distance to other points in the same cluster)
C. The mean nearest-cluster distance
D. The maximum diameter of the cluster

13 Which of the following indicates a clustering result where samples have been assigned to the wrong clusters according to the Silhouette Score?

A. Values near +1
B. Values near 0
C. Values near -1
D. Values exactly 0.5

14 The Adjusted Rand Index (ARI) yields a score of approximately 0 when:

A. The clustering is perfect
B. The clustering is identical to the ground truth
C. The clustering is independent/random compared to the ground truth
D. The number of clusters is equal to the number of samples

15 Which metric is most sensitive to noise and outliers because it uses maximum diameters and minimum separations?

A. Silhouette Score
B. Dunn Index
C. V-measure
D. Adjusted Rand Index

16 If a clustering algorithm produces 100 clusters for a dataset of 100 samples (each sample is its own cluster), which metric will naturally maximize to 1.0, potentially giving a misleading impression of quality?

A. Completeness
B. Homogeneity
C. Silhouette Score
D. Dunn Index

17 Conversely, if all samples are assigned to a single cluster, which metric will maximize to 1.0?

A. Homogeneity
B. Completeness
C. Silhouette Score
D. Davies-Bouldin Index

18 What is the beta parameter used for in the V-measure calculation?

A. To adjust for chance
B. To weight the importance of Homogeneity versus Completeness
C. To define the number of clusters
D. To normalize the dataset

19 When calculating the Fowlkes-Mallows Index, 'TP' (True Positive) refers to:

A. Pairs of points that are in the same cluster in both the true labels and predicted labels
B. Pairs of points that are in different clusters in both labels
C. Points correctly classified as noise
D. Clusters that are perfectly pure

20 Which of the following is an 'Internal' clustering validity index?

A. Adjusted Rand Index
B. Davies-Bouldin Index
C. V-measure
D. Normalized Mutual Information

21 What is a major limitation of the Silhouette Score when dealing with non-convex clusters (e.g., ring shapes)?

A. It is computationally cheap
B. It tends to give lower scores to density-based clusters that are not spherical
C. It cannot handle negative values
D. It requires ground truth

22 Which component of the Silhouette Score represents the 'separation' of the cluster?

A. a (intra-cluster distance)
B. b (nearest-cluster distance)
C. max(a, b)
D. b - a

23 In the context of NMI, Entropy is used to measure:

A. The distance between centroids
B. The uncertainty associated with the class or cluster distribution
C. The number of outliers
D. The geometric shape of the cluster

24 Which metric is symmetric (i.e., Metric(A, B) = Metric(B, A))?

A. Homogeneity
B. Completeness
C. Adjusted Rand Index
D. Silhouette Score

25 A Davies-Bouldin Index of 0 indicates:

A. The worst possible clustering
B. Random clustering
C. Ideally separated and compact clusters
D. Infinite variance

26 Which of the following is required to calculate the Adjusted Mutual Information (AMI)?

A. Centroids of the clusters
B. Euclidean distance matrix
C. Ground truth labels and predicted labels
D. Only the predicted labels

27 The Fowlkes-Mallows Index is bounded between:

A. -1 and 1
B. 0 and 1
C. -infinity to +infinity
D. 0 and infinity

28 Why might the Dunn Index be computationally expensive for large datasets?

A. It requires calculating eigenvalues
B. It requires calculating pairwise distances between all points to find min/max distances
C. It requires ground truth labels
D. It involves complex integrals

29 When interpreting Homogeneity (H) and Completeness (C), if H is high and C is low, what does this usually suggest?

A. The algorithm over-segmented the classes (many small clusters for one class)
B. The algorithm merged distinct classes into one cluster
C. The clustering is perfect
D. The data is random noise

30 In the formula for NMI, the Mutual Information I(U, V) is normalized by:

A. The number of samples
B. The arithmetic or geometric mean of the entropies of U and V
C. The variance of U and V
D. The maximum distance in the dataset

31 What happens to the Adjusted Rand Index (ARI) if the class labels are permuted (renamed)?

A. The score changes drastically
B. The score becomes negative
C. The score remains the same
D. The score becomes zero

32 Which external metric suffers less from the 'curse of dimensionality' in its calculation logic (though distances themselves might suffer)?

A. Silhouette Score
B. Dunn Index
C. V-measure
D. Davies-Bouldin Index

33 For a dataset with 'k' ground truth classes, if a clustering algorithm produces 'k' clusters and ARI is 1.0, this means:

A. The clusters are random
B. The clusters perfectly match the ground truth (up to permutation)
C. The clusters are disjoint but incorrect
D. There are outlier points

34 Which metric would be most appropriate to select the optimal number of clusters 'k' in K-Means clustering when true labels are unknown?

A. Adjusted Rand Index
B. Silhouette Score
C. Homogeneity
D. NMI

35 In the calculation of the Davies-Bouldin Index, 'scatter' refers to:

A. The distance between cluster centroids
B. The average distance of points in a cluster to their centroid
C. The total number of points
D. The entropy of the cluster

36 The Rand Index (RI) calculates the percentage of:

A. Correctly classified centroids
B. Decisions where pairs of data points are correctly agreed upon (together or apart)
C. Clusters that have zero entropy
D. Points with positive silhouette scores

37 Does the V-measure prefer a specific number of clusters?

A. No, it is independent of cluster count
B. Yes, it favors a large number of clusters if not adjusted
C. Yes, it favors a single cluster
D. It only works for k=2

38 Which metric is based on the idea that good clusters should be highly similar internally and highly dissimilar externally?

A. Silhouette Score
B. ARI
C. NMI
D. FMI

39 If two different clustering algorithms produce the exact same partition of data, the AMI score between them will be:

A.
B. 0.5
C. 1
D. Infinity

40 Which of the following metrics is NOT bounded by 1 (i.e., can it be greater than 1)?

A. Davies-Bouldin Index
B. Silhouette Score
C. V-measure
D. Adjusted Rand Index

41 Homogeneity is equivalent to which classification metric when applied to clusters?

A. Recall
B. Precision
C. Accuracy
D. F1 Score

42 Completeness is equivalent to which classification metric when applied to clusters?

A. Recall
B. Precision
C. Accuracy
D. Specificity

43 Which metric assumes that the best clustering has the minimum sum of similarities between each cluster and its most similar one?

A. Davies-Bouldin Index
B. Dunn Index
C. Silhouette Score
D. Calinski-Harabasz Index

44 When using the Silhouette Score, a value of 0 implies:

A. The sample is on or very close to the decision boundary between two neighboring clusters
B. The sample is far away from all clusters
C. The clustering is perfect
D. The sample is an outlier

45 Why is 'Adjusted' Mutual Information preferred over 'Normalized' Mutual Information in many comparative studies?

A. It corrects for the bias toward clusters with many partitions (high k)
B. It is easier to calculate
C. It is always positive
D. It does not use logarithms

46 The Fowlkes-Mallows index is generally higher when:

A. The clustering and ground truth are highly correlated
B. The number of clusters is very large
C. The number of clusters is 1
D. The dataset is very small

47 In the Dunn Index, the 'diameter' of a cluster usually refers to:

A. The maximum distance between any two points in the cluster
B. The average distance to the centroid
C. The radius of the cluster
D. The distance to the nearest neighbor

48 Which metric is strictly an Information Theoretic measure?

A. Silhouette Score
B. Davies-Bouldin Index
C. Normalized Mutual Information (NMI)
D. Dunn Index

49 If a dataset has highly imbalanced classes, which pair of metrics gives a good view of cluster purity and class coverage?

A. Homogeneity and Completeness
B. Silhouette and DBI
C. Dunn and Variance
D. Mean and Median

50 A negative value for the Adjusted Rand Index (ARI) implies:

A. The clustering is worse than random assignment
B. The clustering is random
C. The clustering is perfect
D. ARI cannot be negative