1What is the range of values for the Silhouette Score?
A.0 to 1
B.-1 to 1
C.0 to infinity
D.-infinity to infinity
Correct Answer: -1 to 1
Explanation:The Silhouette Score ranges from -1 to 1, where 1 indicates well-separated clusters, 0 indicates overlapping clusters, and -1 indicates incorrect clustering.
Incorrect! Try again.
2Which of the following metrics does NOT require ground truth labels (true class labels) to evaluate clustering performance?
A.Adjusted Rand Index
B.Normalized Mutual Information
C.Silhouette Score
D.Fowlkes-Mallows Index
Correct Answer: Silhouette Score
Explanation:The Silhouette Score is an internal evaluation metric that calculates performance based on the distance between points within a cluster and points in the nearest neighbor cluster, without needing ground truth.
Incorrect! Try again.
3In the context of the Davies-Bouldin Index, a lower value indicates:
A.Better clustering
B.Worse clustering
C.High computational cost
D.Overfitting
Correct Answer: Better clustering
Explanation:The Davies-Bouldin Index measures the average similarity between clusters. A lower score means clusters are compact and well-separated, indicating better clustering.
Incorrect! Try again.
4The Dunn Index is defined as the ratio of:
A.Maximum inter-cluster distance to minimum intra-cluster distance
B.Minimum inter-cluster distance to maximum intra-cluster distance
C.Mean intra-cluster distance to mean inter-cluster distance
D.Variance of clusters to bias of clusters
Correct Answer: Minimum inter-cluster distance to maximum intra-cluster distance
Explanation:The Dunn Index aims to identify dense and well-separated clusters. It is calculated as the minimum distance between points in different clusters divided by the maximum diameter of a cluster.
Incorrect! Try again.
5Which clustering metric corrects the Rand Index for chance?
A.Fowlkes-Mallows Index
B.Adjusted Rand Index (ARI)
C.Silhouette Score
D.Completeness Score
Correct Answer: Adjusted Rand Index (ARI)
Explanation:The Adjusted Rand Index (ARI) is a variation of the Rand Index that adjusts for the probability that a random assignment of labels would result in a high score, yielding a score near 0 for random labeling.
Incorrect! Try again.
6What does a Homogeneity score of 1.0 imply?
A.All clusters contain data points from only a single class
B.All data points of a specific class are assigned to the same cluster
C.The number of clusters equals the number of samples
D.The clusters overlap significantly
Correct Answer: All clusters contain data points from only a single class
Explanation:Homogeneity assesses whether each cluster contains only members of a single class. A score of 1.0 means perfect homogeneity.
Incorrect! Try again.
7If all data points belonging to a given class are elements of the same cluster, which metric is maximized?
A.Homogeneity
B.Completeness
C.Silhouette Score
D.Dunn Index
Correct Answer: Completeness
Explanation:Completeness measures if all members of a given class are assigned to the same cluster. If this condition is met, the Completeness score is 1.0.
Incorrect! Try again.
8The V-measure is the harmonic mean of which two metrics?
A.Precision and Recall
B.Homogeneity and Completeness
C.Silhouette and Dunn Index
D.ARI and NMI
Correct Answer: Homogeneity and Completeness
Explanation:The V-measure is calculated as the harmonic mean of Homogeneity and Completeness, similar to how the F1-score is the harmonic mean of Precision and Recall.
Incorrect! Try again.
9Which metric is calculated as the geometric mean of pairwise precision and pairwise recall?
A.Fowlkes-Mallows Index
B.Adjusted Mutual Information
C.V-measure
D.Davies-Bouldin Index
Correct Answer: Fowlkes-Mallows Index
Explanation:The Fowlkes-Mallows Index (FMI) is defined as the geometric mean of the pairwise precision and pairwise recall.
Incorrect! Try again.
10Normalized Mutual Information (NMI) is often preferred over Mutual Information (MI) because:
A.NMI does not require ground truth
B.MI is computationally too expensive
C.NMI scales the result between 0 and 1, making it comparable across datasets
D.MI yields negative values
Correct Answer: NMI scales the result between 0 and 1, making it comparable across datasets
Explanation:Standard Mutual Information is not bounded. Normalizing it (NMI) ensures the score is between 0 and 1, facilitating comparison between different clustering results.
Incorrect! Try again.
11What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?
A.AMI is faster to compute
B.AMI accounts for chance (randomness) in cluster assignment
C.AMI works without ground truth
D.AMI can handle negative values
Correct Answer: AMI accounts for chance (randomness) in cluster assignment
Explanation:AMI adjusts the Mutual Information score to account for the fact that the MI is generally higher for two clusterings with a larger number of clusters, normalizing against expected MI by chance.
Incorrect! Try again.
12In the Silhouette Score formula s = (b - a) / max(a, b), what does 'a' represent?
A.The distance to the nearest cluster centroid
B.The mean intra-cluster distance (average distance to other points in the same cluster)
C.The mean nearest-cluster distance
D.The maximum diameter of the cluster
Correct Answer: The mean intra-cluster distance (average distance to other points in the same cluster)
Explanation:In the Silhouette formula, 'a' represents the mean distance between a sample and all other points in the same class (cohesion).
Incorrect! Try again.
13Which of the following indicates a clustering result where samples have been assigned to the wrong clusters according to the Silhouette Score?
A.Values near +1
B.Values near 0
C.Values near -1
D.Values exactly 0.5
Correct Answer: Values near -1
Explanation:A Silhouette Score near -1 implies that data points are, on average, closer to a neighboring cluster than to their own cluster.
Incorrect! Try again.
14The Adjusted Rand Index (ARI) yields a score of approximately 0 when:
A.The clustering is perfect
B.The clustering is identical to the ground truth
C.The clustering is independent/random compared to the ground truth
D.The number of clusters is equal to the number of samples
Correct Answer: The clustering is independent/random compared to the ground truth
Explanation:ARI is adjusted such that a random labeling (independent of the true labels) will result in a score close to 0.0.
Incorrect! Try again.
15Which metric is most sensitive to noise and outliers because it uses maximum diameters and minimum separations?
A.Silhouette Score
B.Dunn Index
C.V-measure
D.Adjusted Rand Index
Correct Answer: Dunn Index
Explanation:Since the Dunn Index relies on the minimum inter-cluster distance and maximum intra-cluster diameter, a single outlier can drastically alter these values, making it sensitive to noise.
Incorrect! Try again.
16If a clustering algorithm produces 100 clusters for a dataset of 100 samples (each sample is its own cluster), which metric will naturally maximize to 1.0, potentially giving a misleading impression of quality?
A.Completeness
B.Homogeneity
C.Silhouette Score
D.Dunn Index
Correct Answer: Homogeneity
Explanation:If every sample is its own cluster, every cluster contains only members of a single class (itself). Therefore, Homogeneity is perfectly 1.0, even though the clustering is useless.
Incorrect! Try again.
17Conversely, if all samples are assigned to a single cluster, which metric will maximize to 1.0?
A.Homogeneity
B.Completeness
C.Silhouette Score
D.Davies-Bouldin Index
Correct Answer: Completeness
Explanation:If there is only one cluster, all members of any specific class are technically in that same cluster. Thus, Completeness is 1.0.
Incorrect! Try again.
18What is the beta parameter used for in the V-measure calculation?
A.To adjust for chance
B.To weight the importance of Homogeneity versus Completeness
C.To define the number of clusters
D.To normalize the dataset
Correct Answer: To weight the importance of Homogeneity versus Completeness
Explanation:The V-measure is a weighted harmonic mean. The beta parameter controls the weight: beta > 1 weights completeness more, while beta < 1 weights homogeneity more.
Incorrect! Try again.
19When calculating the Fowlkes-Mallows Index, 'TP' (True Positive) refers to:
A.Pairs of points that are in the same cluster in both the true labels and predicted labels
B.Pairs of points that are in different clusters in both labels
C.Points correctly classified as noise
D.Clusters that are perfectly pure
Correct Answer: Pairs of points that are in the same cluster in both the true labels and predicted labels
Explanation:In pair-counting metrics like FMI, TP represents the number of pairs of points that belong to the same cluster in the ground truth and are also assigned to the same cluster in the prediction.
Incorrect! Try again.
20Which of the following is an 'Internal' clustering validity index?
A.Adjusted Rand Index
B.Davies-Bouldin Index
C.V-measure
D.Normalized Mutual Information
Correct Answer: Davies-Bouldin Index
Explanation:Internal indices evaluate the clustering structure using only the data itself (without ground truth). Davies-Bouldin is internal; the others listed require external ground truth.
Incorrect! Try again.
21What is a major limitation of the Silhouette Score when dealing with non-convex clusters (e.g., ring shapes)?
A.It is computationally cheap
B.It tends to give lower scores to density-based clusters that are not spherical
C.It cannot handle negative values
D.It requires ground truth
Correct Answer: It tends to give lower scores to density-based clusters that are not spherical
Explanation:The Silhouette Score relies on Euclidean distances and assumes roughly spherical, convex clusters. It often penalizes correct clustering of complex shapes like rings or moons.
Incorrect! Try again.
22Which component of the Silhouette Score represents the 'separation' of the cluster?
A.a (intra-cluster distance)
B.b (nearest-cluster distance)
C.max(a, b)
D.b - a
Correct Answer: b (nearest-cluster distance)
Explanation:'b' is the mean distance between a sample and all points in the nearest neighboring cluster, representing how well separated the sample is from the next closest cluster.
Incorrect! Try again.
23In the context of NMI, Entropy is used to measure:
A.The distance between centroids
B.The uncertainty associated with the class or cluster distribution
C.The number of outliers
D.The geometric shape of the cluster
Correct Answer: The uncertainty associated with the class or cluster distribution
Explanation:NMI is based on Information Theory. Entropy measures the amount of information or uncertainty in the distribution of the labels.
Incorrect! Try again.
24Which metric is symmetric (i.e., Metric(A, B) = Metric(B, A))?
A.Homogeneity
B.Completeness
C.Adjusted Rand Index
D.Silhouette Score
Correct Answer: Adjusted Rand Index
Explanation:The Adjusted Rand Index compares two labelings and is symmetric; swapping the predicted labels and ground truth labels yields the same score. Homogeneity and Completeness are not symmetric.
Incorrect! Try again.
25A Davies-Bouldin Index of 0 indicates:
A.The worst possible clustering
B.Random clustering
C.Ideally separated and compact clusters
D.Infinite variance
Correct Answer: Ideally separated and compact clusters
Explanation:The minimum score for DBI is 0, which represents the best possible clustering (perfectly compact and separated).
Incorrect! Try again.
26Which of the following is required to calculate the Adjusted Mutual Information (AMI)?
A.Centroids of the clusters
B.Euclidean distance matrix
C.Ground truth labels and predicted labels
D.Only the predicted labels
Correct Answer: Ground truth labels and predicted labels
Explanation:AMI is an external metric that compares the mutual information between the true labels and the predicted labels.
Incorrect! Try again.
27The Fowlkes-Mallows Index is bounded between:
A.-1 and 1
B.0 and 1
C.-infinity to +infinity
D.0 and infinity
Correct Answer: 0 and 1
Explanation:FMI is a geometric mean of probabilities/ratios (Precision and Recall), so it ranges from 0 to 1.
Incorrect! Try again.
28Why might the Dunn Index be computationally expensive for large datasets?
A.It requires calculating eigenvalues
B.It requires calculating pairwise distances between all points to find min/max distances
C.It requires ground truth labels
D.It involves complex integrals
Correct Answer: It requires calculating pairwise distances between all points to find min/max distances
Explanation:To find the diameter of clusters and the distance between them, the Dunn Index often requires computing the distance between every pair of points, which is O(N^2).
Incorrect! Try again.
29When interpreting Homogeneity (H) and Completeness (C), if H is high and C is low, what does this usually suggest?
A.The algorithm over-segmented the classes (many small clusters for one class)
B.The algorithm merged distinct classes into one cluster
C.The clustering is perfect
D.The data is random noise
Correct Answer: The algorithm over-segmented the classes (many small clusters for one class)
Explanation:High Homogeneity means clusters are pure (one class), but low Completeness means classes are split across multiple clusters. This is over-segmentation.
Incorrect! Try again.
30In the formula for NMI, the Mutual Information I(U, V) is normalized by:
A.The number of samples
B.The arithmetic or geometric mean of the entropies of U and V
C.The variance of U and V
D.The maximum distance in the dataset
Correct Answer: The arithmetic or geometric mean of the entropies of U and V
Explanation:To bound the Mutual Information between 0 and 1, it is divided by a generalized mean (arithmetic, geometric, min, or max) of the entropies of the two label assignments.
Incorrect! Try again.
31What happens to the Adjusted Rand Index (ARI) if the class labels are permuted (renamed)?
A.The score changes drastically
B.The score becomes negative
C.The score remains the same
D.The score becomes zero
Correct Answer: The score remains the same
Explanation:ARI measures the agreement between partitions, not the specific integer values of the labels. Permuting label names does not change the grouping structure.
Incorrect! Try again.
32Which external metric suffers less from the 'curse of dimensionality' in its calculation logic (though distances themselves might suffer)?
A.Silhouette Score
B.Dunn Index
C.V-measure
D.Davies-Bouldin Index
Correct Answer: V-measure
Explanation:V-measure depends on counting matching labels (contingency table), not on calculating Euclidean distances in high-dimensional space, unlike Silhouette, Dunn, or DBI.
Incorrect! Try again.
33For a dataset with 'k' ground truth classes, if a clustering algorithm produces 'k' clusters and ARI is 1.0, this means:
A.The clusters are random
B.The clusters perfectly match the ground truth (up to permutation)
C.The clusters are disjoint but incorrect
D.There are outlier points
Correct Answer: The clusters perfectly match the ground truth (up to permutation)
Explanation:An ARI of 1.0 indicates that the two clusterings (predicted and true) are identical partitions of the data.
Incorrect! Try again.
34Which metric would be most appropriate to select the optimal number of clusters 'k' in K-Means clustering when true labels are unknown?
A.Adjusted Rand Index
B.Silhouette Score
C.Homogeneity
D.NMI
Correct Answer: Silhouette Score
Explanation:Without true labels (internal validation), the Silhouette Score is commonly used to find the 'k' that maximizes cluster separation and cohesion.
Incorrect! Try again.
35In the calculation of the Davies-Bouldin Index, 'scatter' refers to:
A.The distance between cluster centroids
B.The average distance of points in a cluster to their centroid
C.The total number of points
D.The entropy of the cluster
Correct Answer: The average distance of points in a cluster to their centroid
Explanation:Intra-cluster scatter measures the compactness of a cluster, typically defined as the average distance of points to the cluster centroid.
Incorrect! Try again.
36The Rand Index (RI) calculates the percentage of:
A.Correctly classified centroids
B.Decisions where pairs of data points are correctly agreed upon (together or apart)
C.Clusters that have zero entropy
D.Points with positive silhouette scores
Correct Answer: Decisions where pairs of data points are correctly agreed upon (together or apart)
Explanation:RI measures the fraction of pairs of samples that are either in the same cluster in both partitions or in different clusters in both partitions.
Incorrect! Try again.
37Does the V-measure prefer a specific number of clusters?
A.No, it is independent of cluster count
B.Yes, it favors a large number of clusters if not adjusted
C.Yes, it favors a single cluster
D.It only works for k=2
Correct Answer: Yes, it favors a large number of clusters if not adjusted
Explanation:Like raw Homogeneity, V-measure can be artificially inflated if the number of clusters is very large (approaching the number of samples), unless compared against a baseline.
Incorrect! Try again.
38Which metric is based on the idea that good clusters should be highly similar internally and highly dissimilar externally?
A.Silhouette Score
B.ARI
C.NMI
D.FMI
Correct Answer: Silhouette Score
Explanation:The Silhouette definition is literally derived from internal similarity (cohesion) versus external dissimilarity (separation).
Incorrect! Try again.
39If two different clustering algorithms produce the exact same partition of data, the AMI score between them will be:
A.
B.0.5
C.1
D.Infinity
Correct Answer: 1
Explanation:Identical partitions imply maximum mutual information, normalized to 1.
Incorrect! Try again.
40Which of the following metrics is NOT bounded by 1 (i.e., can it be greater than 1)?
A.Davies-Bouldin Index
B.Silhouette Score
C.V-measure
D.Adjusted Rand Index
Correct Answer: Davies-Bouldin Index
Explanation:The Davies-Bouldin Index is a ratio of distances. While 0 is the minimum, there is no theoretical upper bound (it can be > 1).
Incorrect! Try again.
41Homogeneity is equivalent to which classification metric when applied to clusters?
A.Recall
B.Precision
C.Accuracy
D.F1 Score
Correct Answer: Precision
Explanation:Homogeneity checks if a cluster contains only a specific class. This is analogous to Precision (of the cluster predicting the class).
Incorrect! Try again.
42Completeness is equivalent to which classification metric when applied to clusters?
A.Recall
B.Precision
C.Accuracy
D.Specificity
Correct Answer: Recall
Explanation:Completeness checks if all members of a class are in the same cluster. This is analogous to Recall (finding all instances of the class).
Incorrect! Try again.
43Which metric assumes that the best clustering has the minimum sum of similarities between each cluster and its most similar one?
A.Davies-Bouldin Index
B.Dunn Index
C.Silhouette Score
D.Calinski-Harabasz Index
Correct Answer: Davies-Bouldin Index
Explanation:DBI is calculated as the average similarity of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. We want to minimize this.
Incorrect! Try again.
44When using the Silhouette Score, a value of 0 implies:
A.The sample is on or very close to the decision boundary between two neighboring clusters
B.The sample is far away from all clusters
C.The clustering is perfect
D.The sample is an outlier
Correct Answer: The sample is on or very close to the decision boundary between two neighboring clusters
Explanation:A score of 0 occurs when the intra-cluster distance (a) is equal to the nearest-cluster distance (b), implying the point lies on the boundary.
Incorrect! Try again.
45Why is 'Adjusted' Mutual Information preferred over 'Normalized' Mutual Information in many comparative studies?
A.It corrects for the bias toward clusters with many partitions (high k)
B.It is easier to calculate
C.It is always positive
D.It does not use logarithms
Correct Answer: It corrects for the bias toward clusters with many partitions (high k)
Explanation:NMI tends to increase as the number of clusters increases, even for random partitions. AMI corrects this expected chance agreement.
Incorrect! Try again.
46The Fowlkes-Mallows index is generally higher when:
A.The clustering and ground truth are highly correlated
B.The number of clusters is very large
C.The number of clusters is 1
D.The dataset is very small
Correct Answer: The clustering and ground truth are highly correlated
Explanation:Since FMI is the geometric mean of precision and recall regarding pair assignments, high correlation between prediction and truth yields a high index.
Incorrect! Try again.
47In the Dunn Index, the 'diameter' of a cluster usually refers to:
A.The maximum distance between any two points in the cluster
B.The average distance to the centroid
C.The radius of the cluster
D.The distance to the nearest neighbor
Correct Answer: The maximum distance between any two points in the cluster
Explanation:The standard definition of cluster diameter in the Dunn Index is the maximum distance between any two points within that cluster.
Incorrect! Try again.
48Which metric is strictly an Information Theoretic measure?
A.Silhouette Score
B.Davies-Bouldin Index
C.Normalized Mutual Information (NMI)
D.Dunn Index
Correct Answer: Normalized Mutual Information (NMI)
Explanation:NMI is derived from Shannon Entropy and Mutual Information, which are core concepts of Information Theory.
Incorrect! Try again.
49If a dataset has highly imbalanced classes, which pair of metrics gives a good view of cluster purity and class coverage?
A.Homogeneity and Completeness
B.Silhouette and DBI
C.Dunn and Variance
D.Mean and Median
Correct Answer: Homogeneity and Completeness
Explanation:These two metrics decouple the requirement of 'pure' clusters (Homogeneity) from 'whole' classes (Completeness), providing insight even if class sizes vary.
Incorrect! Try again.
50A negative value for the Adjusted Rand Index (ARI) implies:
A.The clustering is worse than random assignment
B.The clustering is random
C.The clustering is perfect
D.ARI cannot be negative
Correct Answer: The clustering is worse than random assignment
Explanation:While ARI is centered at 0 for random labeling, it can technically be negative if the agreement is less than what is expected by chance (systematic disagreement).
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.