Unit 3 - Practice Quiz

GEO295 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the range of values for the Silhouette Score?

A. 0 to infinity
B. 0 to 1
C. -infinity to infinity
D. -1 to 1

2 Which of the following metrics does NOT require ground truth labels (true class labels) to evaluate clustering performance?

A. Normalized Mutual Information
B. Adjusted Rand Index
C. Silhouette Score
D. Fowlkes-Mallows Index

3 In the context of the Davies-Bouldin Index, a lower value indicates:

A. High computational cost
B. Better clustering
C. Overfitting
D. Worse clustering

4 The Dunn Index is defined as the ratio of:

A. Maximum inter-cluster distance to minimum intra-cluster distance
B. Minimum inter-cluster distance to maximum intra-cluster distance
C. Mean intra-cluster distance to mean inter-cluster distance
D. Variance of clusters to bias of clusters

5 Which clustering metric corrects the Rand Index for chance?

A. Silhouette Score
B. Fowlkes-Mallows Index
C. Adjusted Rand Index (ARI)
D. Completeness Score

6 What does a Homogeneity score of 1.0 imply?

A. The clusters overlap significantly
B. The number of clusters equals the number of samples
C. All clusters contain data points from only a single class
D. All data points of a specific class are assigned to the same cluster

7 If all data points belonging to a given class are elements of the same cluster, which metric is maximized?

A. Completeness
B. Homogeneity
C. Silhouette Score
D. Dunn Index

8 The V-measure is the harmonic mean of which two metrics?

A. ARI and NMI
B. Homogeneity and Completeness
C. Precision and Recall
D. Silhouette and Dunn Index

9 Which metric is calculated as the geometric mean of pairwise precision and pairwise recall?

A. Davies-Bouldin Index
B. Fowlkes-Mallows Index
C. Adjusted Mutual Information
D. V-measure

10 Normalized Mutual Information (NMI) is often preferred over Mutual Information (MI) because:

A. NMI does not require ground truth
B. MI yields negative values
C. MI is computationally too expensive
D. NMI scales the result between 0 and 1, making it comparable across datasets

11 What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?

A. AMI accounts for chance (randomness) in cluster assignment
B. AMI is faster to compute
C. AMI works without ground truth
D. AMI can handle negative values

12 In the Silhouette Score formula s = (b - a) / max(a, b), what does 'a' represent?

A. The maximum diameter of the cluster
B. The mean intra-cluster distance (average distance to other points in the same cluster)
C. The mean nearest-cluster distance
D. The distance to the nearest cluster centroid

13 Which of the following indicates a clustering result where samples have been assigned to the wrong clusters according to the Silhouette Score?

A. Values exactly 0.5
B. Values near 0
C. Values near -1
D. Values near +1

14 The Adjusted Rand Index (ARI) yields a score of approximately 0 when:

A. The clustering is perfect
B. The clustering is identical to the ground truth
C. The clustering is independent/random compared to the ground truth
D. The number of clusters is equal to the number of samples

15 Which metric is most sensitive to noise and outliers because it uses maximum diameters and minimum separations?

A. V-measure
B. Dunn Index
C. Silhouette Score
D. Adjusted Rand Index

16 If a clustering algorithm produces 100 clusters for a dataset of 100 samples (each sample is its own cluster), which metric will naturally maximize to 1.0, potentially giving a misleading impression of quality?

A. Homogeneity
B. Silhouette Score
C. Dunn Index
D. Completeness

17 Conversely, if all samples are assigned to a single cluster, which metric will maximize to 1.0?

A. Davies-Bouldin Index
B. Homogeneity
C. Silhouette Score
D. Completeness

18 What is the beta parameter used for in the V-measure calculation?

A. To define the number of clusters
B. To weight the importance of Homogeneity versus Completeness
C. To adjust for chance
D. To normalize the dataset

19 When calculating the Fowlkes-Mallows Index, 'TP' (True Positive) refers to:

A. Clusters that are perfectly pure
B. Points correctly classified as noise
C. Pairs of points that are in different clusters in both labels
D. Pairs of points that are in the same cluster in both the true labels and predicted labels

20 Which of the following is an 'Internal' clustering validity index?

A. Davies-Bouldin Index
B. Normalized Mutual Information
C. V-measure
D. Adjusted Rand Index

21 What is a major limitation of the Silhouette Score when dealing with non-convex clusters (e.g., ring shapes)?

A. It tends to give lower scores to density-based clusters that are not spherical
B. It cannot handle negative values
C. It is computationally cheap
D. It requires ground truth

22 Which component of the Silhouette Score represents the 'separation' of the cluster?

A. max(a, b)
B. a (intra-cluster distance)
C. b - a
D. b (nearest-cluster distance)

23 In the context of NMI, Entropy is used to measure:

A. The distance between centroids
B. The uncertainty associated with the class or cluster distribution
C. The geometric shape of the cluster
D. The number of outliers

24 Which metric is symmetric (i.e., Metric(A, B) = Metric(B, A))?

A. Adjusted Rand Index
B. Homogeneity
C. Silhouette Score
D. Completeness

25 A Davies-Bouldin Index of 0 indicates:

A. Ideally separated and compact clusters
B. Random clustering
C. The worst possible clustering
D. Infinite variance

26 Which of the following is required to calculate the Adjusted Mutual Information (AMI)?

A. Ground truth labels and predicted labels
B. Euclidean distance matrix
C. Only the predicted labels
D. Centroids of the clusters

27 The Fowlkes-Mallows Index is bounded between:

A. -1 and 1
B. 0 and infinity
C. 0 and 1
D. -infinity to +infinity

28 Why might the Dunn Index be computationally expensive for large datasets?

A. It requires calculating pairwise distances between all points to find min/max distances
B. It involves complex integrals
C. It requires calculating eigenvalues
D. It requires ground truth labels

29 When interpreting Homogeneity (H) and Completeness (C), if H is high and C is low, what does this usually suggest?

A. The data is random noise
B. The algorithm merged distinct classes into one cluster
C. The algorithm over-segmented the classes (many small clusters for one class)
D. The clustering is perfect

30 In the formula for NMI, the Mutual Information I(U, V) is normalized by:

A. The arithmetic or geometric mean of the entropies of U and V
B. The variance of U and V
C. The number of samples
D. The maximum distance in the dataset

31 What happens to the Adjusted Rand Index (ARI) if the class labels are permuted (renamed)?

A. The score becomes negative
B. The score remains the same
C. The score becomes zero
D. The score changes drastically

32 Which external metric suffers less from the 'curse of dimensionality' in its calculation logic (though distances themselves might suffer)?

A. V-measure
B. Silhouette Score
C. Dunn Index
D. Davies-Bouldin Index

33 For a dataset with 'k' ground truth classes, if a clustering algorithm produces 'k' clusters and ARI is 1.0, this means:

A. The clusters are random
B. The clusters are disjoint but incorrect
C. The clusters perfectly match the ground truth (up to permutation)
D. There are outlier points

34 Which metric would be most appropriate to select the optimal number of clusters 'k' in K-Means clustering when true labels are unknown?

A. NMI
B. Homogeneity
C. Silhouette Score
D. Adjusted Rand Index

35 In the calculation of the Davies-Bouldin Index, 'scatter' refers to:

A. The entropy of the cluster
B. The average distance of points in a cluster to their centroid
C. The distance between cluster centroids
D. The total number of points

36 The Rand Index (RI) calculates the percentage of:

A. Points with positive silhouette scores
B. Clusters that have zero entropy
C. Decisions where pairs of data points are correctly agreed upon (together or apart)
D. Correctly classified centroids

37 Does the V-measure prefer a specific number of clusters?

A. Yes, it favors a large number of clusters if not adjusted
B. No, it is independent of cluster count
C. Yes, it favors a single cluster
D. It only works for k=2

38 Which metric is based on the idea that good clusters should be highly similar internally and highly dissimilar externally?

A. Silhouette Score
B. ARI
C. FMI
D. NMI

39 If two different clustering algorithms produce the exact same partition of data, the AMI score between them will be:

A. 0
B. Infinity
C. 0.5
D. 1

40 Which of the following metrics is NOT bounded by 1 (i.e., can it be greater than 1)?

A. Silhouette Score
B. V-measure
C. Adjusted Rand Index
D. Davies-Bouldin Index

41 Homogeneity is equivalent to which classification metric when applied to clusters?

A. Recall
B. F1 Score
C. Precision
D. Accuracy

42 Completeness is equivalent to which classification metric when applied to clusters?

A. Accuracy
B. Specificity
C. Precision
D. Recall

43 Which metric assumes that the best clustering has the minimum sum of similarities between each cluster and its most similar one?

A. Dunn Index
B. Davies-Bouldin Index
C. Calinski-Harabasz Index
D. Silhouette Score

44 When using the Silhouette Score, a value of 0 implies:

A. The sample is on or very close to the decision boundary between two neighboring clusters
B. The clustering is perfect
C. The sample is an outlier
D. The sample is far away from all clusters

45 Why is 'Adjusted' Mutual Information preferred over 'Normalized' Mutual Information in many comparative studies?

A. It is always positive
B. It corrects for the bias toward clusters with many partitions (high k)
C. It is easier to calculate
D. It does not use logarithms

46 The Fowlkes-Mallows index is generally higher when:

A. The dataset is very small
B. The number of clusters is very large
C. The clustering and ground truth are highly correlated
D. The number of clusters is 1

47 In the Dunn Index, the 'diameter' of a cluster usually refers to:

A. The maximum distance between any two points in the cluster
B. The radius of the cluster
C. The average distance to the centroid
D. The distance to the nearest neighbor

48 Which metric is strictly an Information Theoretic measure?

A. Dunn Index
B. Silhouette Score
C. Normalized Mutual Information (NMI)
D. Davies-Bouldin Index

49 If a dataset has highly imbalanced classes, which pair of metrics gives a good view of cluster purity and class coverage?

A. Silhouette and DBI
B. Homogeneity and Completeness
C. Dunn and Variance
D. Mean and Median

50 A negative value for the Adjusted Rand Index (ARI) implies:

A. The clustering is worse than random assignment
B. The clustering is perfect
C. The clustering is random
D. ARI cannot be negative