Unit 3 - Practice Quiz

GEO295 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the range of values for the Silhouette Score?

A. 0 to 1
B. 0 to infinity
C. -1 to 1
D. -infinity to infinity

2 Which of the following metrics does NOT require ground truth labels (true class labels) to evaluate clustering performance?

A. Silhouette Score
B. Normalized Mutual Information
C. Adjusted Rand Index
D. Fowlkes-Mallows Index

3 In the context of the Davies-Bouldin Index, a lower value indicates:

A. Worse clustering
B. Overfitting
C. Better clustering
D. High computational cost

4 The Dunn Index is defined as the ratio of:

A. Mean intra-cluster distance to mean inter-cluster distance
B. Maximum inter-cluster distance to minimum intra-cluster distance
C. Minimum inter-cluster distance to maximum intra-cluster distance
D. Variance of clusters to bias of clusters

5 Which clustering metric corrects the Rand Index for chance?

A. Completeness Score
B. Fowlkes-Mallows Index
C. Adjusted Rand Index (ARI)
D. Silhouette Score

6 What does a Homogeneity score of 1.0 imply?

A. The clusters overlap significantly
B. The number of clusters equals the number of samples
C. All clusters contain data points from only a single class
D. All data points of a specific class are assigned to the same cluster

7 If all data points belonging to a given class are elements of the same cluster, which metric is maximized?

A. Homogeneity
B. Completeness
C. Silhouette Score
D. Dunn Index

8 The V-measure is the harmonic mean of which two metrics?

A. Precision and Recall
B. Silhouette and Dunn Index
C. ARI and NMI
D. Homogeneity and Completeness

9 Which metric is calculated as the geometric mean of pairwise precision and pairwise recall?

A. V-measure
B. Davies-Bouldin Index
C. Adjusted Mutual Information
D. Fowlkes-Mallows Index

10 Normalized Mutual Information (NMI) is often preferred over Mutual Information (MI) because:

A. NMI does not require ground truth
B. NMI scales the result between 0 and 1, making it comparable across datasets
C. MI yields negative values
D. MI is computationally too expensive

11 What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?

A. AMI works without ground truth
B. AMI can handle negative values
C. AMI accounts for chance (randomness) in cluster assignment
D. AMI is faster to compute

12 In the Silhouette Score formula s = (b - a) / max(a, b), what does 'a' represent?

A. The maximum diameter of the cluster
B. The mean intra-cluster distance (average distance to other points in the same cluster)
C. The distance to the nearest cluster centroid
D. The mean nearest-cluster distance

13 Which of the following indicates a clustering result where samples have been assigned to the wrong clusters according to the Silhouette Score?

A. Values near -1
B. Values exactly 0.5
C. Values near +1
D. Values near 0

14 The Adjusted Rand Index (ARI) yields a score of approximately 0 when:

A. The number of clusters is equal to the number of samples
B. The clustering is perfect
C. The clustering is independent/random compared to the ground truth
D. The clustering is identical to the ground truth

15 Which metric is most sensitive to noise and outliers because it uses maximum diameters and minimum separations?

A. Silhouette Score
B. V-measure
C. Adjusted Rand Index
D. Dunn Index

16 If a clustering algorithm produces 100 clusters for a dataset of 100 samples (each sample is its own cluster), which metric will naturally maximize to 1.0, potentially giving a misleading impression of quality?

A. Completeness
B. Homogeneity
C. Dunn Index
D. Silhouette Score

17 Conversely, if all samples are assigned to a single cluster, which metric will maximize to 1.0?

A. Homogeneity
B. Davies-Bouldin Index
C. Silhouette Score
D. Completeness

18 What is the beta parameter used for in the V-measure calculation?

A. To weight the importance of Homogeneity versus Completeness
B. To adjust for chance
C. To normalize the dataset
D. To define the number of clusters

19 When calculating the Fowlkes-Mallows Index, 'TP' (True Positive) refers to:

A. Pairs of points that are in the same cluster in both the true labels and predicted labels
B. Points correctly classified as noise
C. Pairs of points that are in different clusters in both labels
D. Clusters that are perfectly pure

20 Which of the following is an 'Internal' clustering validity index?

A. Davies-Bouldin Index
B. Normalized Mutual Information
C. Adjusted Rand Index
D. V-measure

21 What is a major limitation of the Silhouette Score when dealing with non-convex clusters (e.g., ring shapes)?

A. It requires ground truth
B. It cannot handle negative values
C. It is computationally cheap
D. It tends to give lower scores to density-based clusters that are not spherical

22 Which component of the Silhouette Score represents the 'separation' of the cluster?

A. a (intra-cluster distance)
B. b - a
C. max(a, b)
D. b (nearest-cluster distance)

23 In the context of NMI, Entropy is used to measure:

A. The geometric shape of the cluster
B. The uncertainty associated with the class or cluster distribution
C. The distance between centroids
D. The number of outliers

24 Which metric is symmetric (i.e., Metric(A, B) = Metric(B, A))?

A. Homogeneity
B. Silhouette Score
C. Adjusted Rand Index
D. Completeness

25 A Davies-Bouldin Index of 0 indicates:

A. Random clustering
B. Infinite variance
C. Ideally separated and compact clusters
D. The worst possible clustering

26 Which of the following is required to calculate the Adjusted Mutual Information (AMI)?

A. Ground truth labels and predicted labels
B. Euclidean distance matrix
C. Only the predicted labels
D. Centroids of the clusters

27 The Fowlkes-Mallows Index is bounded between:

A. 0 and infinity
B. -1 and 1
C. -infinity to +infinity
D. 0 and 1

28 Why might the Dunn Index be computationally expensive for large datasets?

A. It involves complex integrals
B. It requires calculating eigenvalues
C. It requires calculating pairwise distances between all points to find min/max distances
D. It requires ground truth labels

29 When interpreting Homogeneity (H) and Completeness (C), if H is high and C is low, what does this usually suggest?

A. The algorithm over-segmented the classes (many small clusters for one class)
B. The clustering is perfect
C. The algorithm merged distinct classes into one cluster
D. The data is random noise

30 In the formula for NMI, the Mutual Information I(U, V) is normalized by:

A. The variance of U and V
B. The maximum distance in the dataset
C. The arithmetic or geometric mean of the entropies of U and V
D. The number of samples

31 What happens to the Adjusted Rand Index (ARI) if the class labels are permuted (renamed)?

A. The score changes drastically
B. The score remains the same
C. The score becomes negative
D. The score becomes zero

32 Which external metric suffers less from the 'curse of dimensionality' in its calculation logic (though distances themselves might suffer)?

A. V-measure
B. Silhouette Score
C. Dunn Index
D. Davies-Bouldin Index

33 For a dataset with 'k' ground truth classes, if a clustering algorithm produces 'k' clusters and ARI is 1.0, this means:

A. The clusters are random
B. The clusters are disjoint but incorrect
C. The clusters perfectly match the ground truth (up to permutation)
D. There are outlier points

34 Which metric would be most appropriate to select the optimal number of clusters 'k' in K-Means clustering when true labels are unknown?

A. Homogeneity
B. Silhouette Score
C. Adjusted Rand Index
D. NMI

35 In the calculation of the Davies-Bouldin Index, 'scatter' refers to:

A. The average distance of points in a cluster to their centroid
B. The distance between cluster centroids
C. The entropy of the cluster
D. The total number of points

36 The Rand Index (RI) calculates the percentage of:

A. Clusters that have zero entropy
B. Decisions where pairs of data points are correctly agreed upon (together or apart)
C. Points with positive silhouette scores
D. Correctly classified centroids

37 Does the V-measure prefer a specific number of clusters?

A. It only works for k=2
B. Yes, it favors a large number of clusters if not adjusted
C. No, it is independent of cluster count
D. Yes, it favors a single cluster

38 Which metric is based on the idea that good clusters should be highly similar internally and highly dissimilar externally?

A. ARI
B. FMI
C. Silhouette Score
D. NMI

39 If two different clustering algorithms produce the exact same partition of data, the AMI score between them will be:

A. 1
B. 0.5
C. Infinity
D. 0

40 Which of the following metrics is NOT bounded by 1 (i.e., can it be greater than 1)?

A. Adjusted Rand Index
B. Silhouette Score
C. Davies-Bouldin Index
D. V-measure

41 Homogeneity is equivalent to which classification metric when applied to clusters?

A. Precision
B. F1 Score
C. Recall
D. Accuracy

42 Completeness is equivalent to which classification metric when applied to clusters?

A. Specificity
B. Accuracy
C. Recall
D. Precision

43 Which metric assumes that the best clustering has the minimum sum of similarities between each cluster and its most similar one?

A. Calinski-Harabasz Index
B. Silhouette Score
C. Davies-Bouldin Index
D. Dunn Index

44 When using the Silhouette Score, a value of 0 implies:

A. The sample is an outlier
B. The sample is on or very close to the decision boundary between two neighboring clusters
C. The sample is far away from all clusters
D. The clustering is perfect

45 Why is 'Adjusted' Mutual Information preferred over 'Normalized' Mutual Information in many comparative studies?

A. It does not use logarithms
B. It is always positive
C. It is easier to calculate
D. It corrects for the bias toward clusters with many partitions (high k)

46 The Fowlkes-Mallows index is generally higher when:

A. The dataset is very small
B. The number of clusters is 1
C. The clustering and ground truth are highly correlated
D. The number of clusters is very large

47 In the Dunn Index, the 'diameter' of a cluster usually refers to:

A. The maximum distance between any two points in the cluster
B. The radius of the cluster
C. The distance to the nearest neighbor
D. The average distance to the centroid

48 Which metric is strictly an Information Theoretic measure?

A. Silhouette Score
B. Davies-Bouldin Index
C. Normalized Mutual Information (NMI)
D. Dunn Index

49 If a dataset has highly imbalanced classes, which pair of metrics gives a good view of cluster purity and class coverage?

A. Homogeneity and Completeness
B. Dunn and Variance
C. Mean and Median
D. Silhouette and DBI

50 A negative value for the Adjusted Rand Index (ARI) implies:

A. The clustering is worse than random assignment
B. ARI cannot be negative
C. The clustering is random
D. The clustering is perfect