Explanation:The Silhouette Coefficient ranges from -1 to 1, where 1 indicates that the sample is far away from neighboring clusters, 0 indicates the sample is on or very close to the decision boundary, and -1 indicates incorrect clustering.
Incorrect! Try again.
2In the context of the Silhouette Score, what does a value near 0 indicate?
A.The sample is far from other clusters
B.The sample is assigned to the wrong cluster
C.The clusters are overlapping
D.The clustering is perfect
Correct Answer: The clusters are overlapping
Explanation:A Silhouette Score near 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters, implying overlapping.
Incorrect! Try again.
3Which of the following clustering metrics is an 'Internal' evaluation metric (does not require ground truth labels)?
A.Adjusted Rand Index
B.Davies-Bouldin Index
C.Homogeneity Score
D.V-measure
Correct Answer: Davies-Bouldin Index
Explanation:The Davies-Bouldin Index is an internal metric that evaluates clustering based on the data's inherent properties (scatter and separation) without needing external ground truth labels.
Incorrect! Try again.
4For the Davies-Bouldin Index, which of the following represents a better clustering result?
A.A higher value
B.A value close to 1
C.A lower value
D.A value close to -1
Correct Answer: A lower value
Explanation:The Davies-Bouldin Index measures the average similarity between clusters. Lower values indicate better clustering, where clusters are compact and well-separated.
Incorrect! Try again.
5How is the Dunn Index calculated?
A.Ratio of maximum inter-cluster distance to minimum intra-cluster distance
B.Ratio of minimum inter-cluster distance to maximum intra-cluster diameter
C.Average distance between all points
D.Sum of squared errors
Correct Answer: Ratio of minimum inter-cluster distance to maximum intra-cluster diameter
Explanation:The Dunn Index is defined as the ratio of the smallest distance between observations not in the same cluster to the largest intra-cluster diameter. Higher values are desired.
Incorrect! Try again.
6The Adjusted Rand Index (ARI) corrects the Rand Index for:
A.Number of clusters
B.Chance
C.Outliers
D.Data dimensionality
Correct Answer: Chance
Explanation:The ARI adjusts the Rand Index by accounting for the expected similarity of all pairwise comparisons between clusterings specified by a random model (correcting for chance).
Incorrect! Try again.
7What is the maximum possible value for the Adjusted Rand Index (ARI)?
A.
B.1
C.100
D.Infinity
Correct Answer: 1
Explanation:The ARI is bounded above by 1, which indicates a perfect match between the clustering and the ground truth. Values near 0 indicate random labeling.
Incorrect! Try again.
8Which metric is calculated as the harmonic mean of Homogeneity and Completeness?
A.F-measure
B.V-measure
C.Silhouette Score
D.Adjusted Mutual Information
Correct Answer: V-measure
Explanation:The V-measure is the harmonic mean between homogeneity and completeness, similar to how the F1-score is the harmonic mean of precision and recall.
Incorrect! Try again.
9A clustering result satisfies 'Homogeneity' if:
A.All members of a given class are assigned to the same cluster
B.Each cluster contains only members of a single class
C.The clusters are spherical
D.The number of clusters equals the number of classes
Correct Answer: Each cluster contains only members of a single class
Explanation:Homogeneity is satisfied if all of its clusters contain only data points which are members of a single class.
Incorrect! Try again.
10A clustering result satisfies 'Completeness' if:
A.All members of a given class are assigned to the same cluster
B.Each cluster contains only members of a single class
C.The clusters are well separated
D.The entropy of the clusters is zero
Correct Answer: All members of a given class are assigned to the same cluster
Explanation:Completeness is satisfied if all the data points that are members of a given class are elements of the same cluster.
Incorrect! Try again.
11Normalized Mutual Information (NMI) is a normalization of the Mutual Information (MI) score to scale the results between:
A.-1 and 1
B.0 and 1
C.0 and infinity
D.-infinity and infinity
Correct Answer: 0 and 1
Explanation:NMI normalizes Mutual Information to the range [0, 1], where 1 indicates perfect correlation between the clustering and the ground truth labels.
Incorrect! Try again.
12Which metric is the geometric mean of the pairwise precision and recall?
A.V-measure
B.Fowlkes-Mallows Index
C.Dunn Index
D.Adjusted Mutual Information
Correct Answer: Fowlkes-Mallows Index
Explanation:The Fowlkes-Mallows Index is defined as the geometric mean of the pairwise precision and pairwise recall.
Incorrect! Try again.
13What is the primary advantage of Adjusted Mutual Information (AMI) over Normalized Mutual Information (NMI)?
A.It is faster to compute
B.It accounts for chance, especially in small samples or large cluster numbers
C.It does not require ground truth
D.It works better with non-convex clusters
Correct Answer: It accounts for chance, especially in small samples or large cluster numbers
Explanation:AMI corrects the Mutual Information for agreement solely due to chance, which NMI does not strictly do. This is important when the number of clusters is large relative to the sample size.
Incorrect! Try again.
14In the Silhouette Score formula , what does 'a' represent?
A.The mean distance between a sample and all other points in the same cluster
B.The mean distance between a sample and all points in the nearest neighboring cluster
C.The total number of clusters
D.The variance of the entire dataset
Correct Answer: The mean distance between a sample and all other points in the same cluster
Explanation:'a' is the mean intra-cluster distance (the average distance between the sample and all other points in the same cluster).
Incorrect! Try again.
15Which of the following metrics requires the knowledge of ground truth labels?
A.Silhouette Score
B.Dunn Index
C.Adjusted Rand Index
D.Calinski-Harabasz Index
Correct Answer: Adjusted Rand Index
Explanation:The Adjusted Rand Index is an external validity index, meaning it compares the clustering result against a known ground truth classification.
Incorrect! Try again.
16If the Adjusted Rand Index (ARI) is 0.0, what does this imply?
A.Perfect clustering
B.Inverse clustering
C.Random labeling
D.Clustering with 100% error
Correct Answer: Random labeling
Explanation:An ARI of 0 indicates that the clustering performance is equivalent to random assignment.
Incorrect! Try again.
17Which index is most sensitive to noise and outliers because it relies on minimum inter-cluster distances and maximum diameters?
A.Silhouette Score
B.Dunn Index
C.V-measure
D.Fowlkes-Mallows Index
Correct Answer: Dunn Index
Explanation:The Dunn Index uses the minimum distance between clusters and maximum diameter. A single outlier can drastically change the diameter or the inter-cluster distance, making it sensitive to noise.
Incorrect! Try again.
18The Fowlkes-Mallows Index (FMI) ranges from:
A.-1 to 1
B.0 to 1
C.0 to 10
D.-infinity to 0
Correct Answer: 0 to 1
Explanation:FMI is a probability-based index derived from precision and recall, ranging from 0 to 1. A higher value indicates greater similarity between clusters and ground truth.
Incorrect! Try again.
19If a clustering algorithm produces a Homogeneity score of 1.0 but a Completeness score of 0.5, what does this likely mean?
A.Clusters are pure but classes are split into multiple clusters
B.Classes are mixed but clusters are large
C.The algorithm failed completely
D.There is only one cluster
Correct Answer: Clusters are pure but classes are split into multiple clusters
Explanation:Homogeneity = 1 means each cluster contains only one class. Low Completeness means members of a specific class are distributed across multiple clusters.
Incorrect! Try again.
20In the Davies-Bouldin Index calculation, the term represents:
A.The ratio of the sum of cluster dispersions to the distance between cluster centroids
B.The product of cluster sizes
C.The absolute difference in cluster densities
D.The distance to the nearest neighbor
Correct Answer: The ratio of the sum of cluster dispersions to the distance between cluster centroids
Explanation: in Davies-Bouldin is a measure of similarity between cluster and , calculated as , where is dispersion and is distance.
Incorrect! Try again.
21When is the Mutual Information (MI) between two clusterings equal to 0?
A.When the clusterings are identical
B.When the clusterings are perfectly correlated
C.When the two clusterings are independent
D.When the number of clusters is equal
Correct Answer: When the two clusterings are independent
Explanation:Mutual Information quantifies the information shared between two variables. If the clusterings are independent, sharing no information, MI is 0.
Incorrect! Try again.
22Which of the following statements about V-measure is FALSE?
A.It is symmetric.
B.It requires ground truth labels.
C.It is equivalent to Normalized Mutual Information (arithmetic version).
D.It ranges from -1 to 1.
Correct Answer: It ranges from -1 to 1.
Explanation:The V-measure ranges from 0 to 1. It does not go to -1.
Incorrect! Try again.
23Which metric is generally preferred when you want to compare clustering solutions with different numbers of clusters on the same dataset, to avoid favoring solutions with more clusters?
A.Adjusted Mutual Information (AMI)
B.Raw Mutual Information (MI)
C.Sum of Squared Errors
D.Purity
Correct Answer: Adjusted Mutual Information (AMI)
Explanation:Unadjusted metrics like MI tend to increase as the number of clusters increases. AMI adjusts for chance, allowing fair comparison across different numbers of clusters.
Incorrect! Try again.
24A Silhouette Score of -1 implies that:
A.The sample is in the wrong cluster
B.The sample is in the correct cluster
C.The sample is an outlier
D.The sample is a centroid
Correct Answer: The sample is in the wrong cluster
Explanation:A score of -1 occurs when the average distance to points in the neighboring cluster is much smaller than the distance to points in the assigned cluster, meaning it is misclassified.
Incorrect! Try again.
25Which metric is defined using concepts of entropy and conditional entropy?
A.Adjusted Rand Index
B.Silhouette Score
C.Normalized Mutual Information
D.Dunn Index
Correct Answer: Normalized Mutual Information
Explanation:NMI is based on Information Theory, specifically using the entropy of the cluster assignments and the class labels.
Incorrect! Try again.
26The Adjusted Rand Index (ARI) is symmetric. This means:
A.ARI(A, B) = ARI(B, A)
B.ARI(A, B) = -ARI(B, A)
C.ARI(A, B) = 1 / ARI(B, A)
D.ARI values are always positive
Correct Answer: ARI(A, B) = ARI(B, A)
Explanation:Symmetry means swapping the predicted labels and true labels does not change the score.
Incorrect! Try again.
27In the calculation of Fowlkes-Mallows Index, 'TP' (True Positive) refers to:
A.Pairs of points that are in the same cluster and same class
B.Pairs of points that are in different clusters and different classes
C.Points correctly classified as outliers
D.Centroids correctly identified
Correct Answer: Pairs of points that are in the same cluster and same class
Explanation:In pair-counting metrics, TP refers to pairs of points that belong to the same cluster in the predicted set and the same class in the ground truth.
Incorrect! Try again.
28Which internal metric assumes that clusters are convex and isotropic (spherical)?
A.DBSCAN
B.Silhouette Score
C.Adjusted Rand Index
D.Entropy
Correct Answer: Silhouette Score
Explanation:While Silhouette is a metric, not an algorithm, it relies on distances. It generally yields higher scores for convex, separated clusters and may fail to correctly rate complex, non-convex shapes like rings.
Incorrect! Try again.
29What is the primary disadvantage of the Davies-Bouldin Index?
A.It requires ground truth
B.It is computationally expensive for small datasets
C.It is limited to spherical clusters
D.It is always negative
Correct Answer: It is limited to spherical clusters
Explanation:Because Davies-Bouldin uses centroid distances and dispersions, it assumes a spherical distribution of points. It may not evaluate density-based clusters (like half-moons) correctly.
Incorrect! Try again.
30If the V-measure is used with a Beta value greater than 1, it places more weight on:
A.Homogeneity
B.Completeness
C.Recall
D.Precision
Correct Answer: Completeness
Explanation:The weighted V-measure uses Beta to weigh the components. weights Completeness more heavily.
Incorrect! Try again.
31Which of the following is NOT a pair-counting based metric?
A.Rand Index
B.Adjusted Rand Index
C.Fowlkes-Mallows Index
D.Normalized Mutual Information
Correct Answer: Normalized Mutual Information
Explanation:NMI is an information-theoretic metric based on entropy, whereas RI, ARI, and FMI are based on counting pairs of points (TP, FP, etc.).
Incorrect! Try again.
32For a perfect clustering where predicted clusters exactly match the ground truth classes, the Normalized Mutual Information (NMI) score is:
A.0.0
B.0.5
C.1.0
D.Variable depending on dataset size
Correct Answer: 1.0
Explanation:NMI is normalized such that a perfect match yields a score of 1.0.
Incorrect! Try again.
33When computing the Silhouette Score for an entire dataset, one typically takes:
A.The maximum score of any point
B.The minimum score of any point
C.The average of the scores for all samples
D.The median of the scores
Correct Answer: The average of the scores for all samples
Explanation:The overall Silhouette Score for a clustering configuration is the mean of the silhouette coefficients for each sample.
Incorrect! Try again.
34The Rand Index (RI) is the percentage of:
A.Correct classifications
B.Pairs of data points for which the two clusterings agree
C.Clusters that are pure
D.Information shared
Correct Answer: Pairs of data points for which the two clusterings agree
Explanation:RI measures the fraction of pairs of elements that are either in the same cluster in both partitions or in different clusters in both partitions (agreement).
Incorrect! Try again.
35Which metric would be most appropriate if the ground truth labels are not available?
A.Adjusted Rand Index
B.Fowlkes-Mallows Index
C.Silhouette Score
D.V-measure
Correct Answer: Silhouette Score
Explanation:Silhouette Score is an internal metric and does not require ground truth labels.
Incorrect! Try again.
36A higher Dunn Index indicates:
A.High intra-cluster distance and low inter-cluster distance
B.Low intra-cluster distance and high inter-cluster distance
C.Random clustering
D.Overlapping clusters
Correct Answer: Low intra-cluster distance and high inter-cluster distance
Explanation:The Dunn Index maximizes the separation (inter-cluster) and minimizes the compactness (intra-cluster). Therefore, compact and well-separated clusters yield a higher index.
Incorrect! Try again.
37Which metric is sensitive to the permutation of cluster labels?
A.None of the standard clustering metrics
B.Accuracy (if used naively)
C.Adjusted Rand Index
D.Silhouette Score
Correct Answer: Accuracy (if used naively)
Explanation:Standard clustering metrics like ARI, NMI, and V-measure are permutation invariant. Simple accuracy is not, because cluster '0' might correspond to class '1'.
Incorrect! Try again.
38In the context of Homogeneity and Completeness, if the ground truth consists of a single class, and the clustering algorithm finds 5 clusters:
A.Homogeneity is 1, Completeness is 0
B.Homogeneity is 0, Completeness is 1
C.Homogeneity is 1, Completeness is < 1
D.Both are 1
Correct Answer: Homogeneity is 1, Completeness is < 1
Explanation:Homogeneity is 1 because every cluster contains only members of the single class (it's impossible to have mixed classes if there is only one). Completeness is low because the class is split into 5 clusters.
Incorrect! Try again.
39The Adjusted Mutual Information (AMI) is preferred over NMI when:
A.The clusters are very large
B.The number of clusters is small
C.The cluster sizes are unbalanced and small samples are used
D.Computation time is critical
Correct Answer: The cluster sizes are unbalanced and small samples are used
Explanation:AMI corrects for the bias of MI/NMI towards solutions with more clusters or in scenarios with small sample sizes where random partitions share information by chance.
Incorrect! Try again.
40Which component of the Silhouette formula corresponds to 'separation'?
A.a (intra-cluster distance)
B.b (nearest-cluster distance)
C.max(a, b)
D.b - a
Correct Answer: b (nearest-cluster distance)
Explanation:'b' measures the distance to the nearest cluster that the point is not a part of, representing separation.
Incorrect! Try again.
41What is the theoretical minimum of the Adjusted Rand Index?
A.
B.-1
C.-0.5
D.It depends on the number of samples
Correct Answer: -1
Explanation:While ARI is usually positive, the range is theoretically [-1, 1]. Negative values indicate agreement less than that expected by chance (anti-correlation).
Incorrect! Try again.
42Which of the following is a drawback of External Validation metrics like ARI and NMI?
A.They are computationally expensive
B.They require a labeled dataset
C.They cannot handle outliers
D.They are not normalized
Correct Answer: They require a labeled dataset
Explanation:The main limitation is practically requiring ground truth labels, which are often unavailable in real-world unsupervised learning tasks.
Incorrect! Try again.
43In the Fowlkes-Mallows Index formula , what is PPV?
A.Precision
B.Recall
C.Accuracy
D.Entropy
Correct Answer: Precision
Explanation:PPV stands for Positive Predictive Value, which is synonymous with Precision in this context.
Incorrect! Try again.
44Which metric essentially measures the similarity between the two partitionings of the data?
A.Silhouette Score
B.Davies-Bouldin Index
C.Adjusted Rand Index
D.Dunn Index
Correct Answer: Adjusted Rand Index
Explanation:ARI measures the similarity between the clustering assignment and the ground truth assignment.
Incorrect! Try again.
45If you calculate the Silhouette Score for a dataset with only one cluster, the result is typically defined as:
A.
B.1
C.-1
D.Undefined or Error
Correct Answer: Undefined or Error
Explanation:The Silhouette Score requires at least two clusters to calculate the inter-cluster distance 'b'. Implementations like Scikit-Learn may return an error or require >1 label.
Incorrect! Try again.
46The V-measure is to Homogeneity and Completeness as the F1-Score is to:
A.Accuracy and Error
B.Precision and Recall
C.Sensitivity and Specificity
D.TPR and FPR
Correct Answer: Precision and Recall
Explanation:V-measure is the harmonic mean of Homogeneity and Completeness, exactly analogous to F1 being the harmonic mean of Precision and Recall.
Incorrect! Try again.
47Which clustering metric uses the 'max-min' logic (maximize the minimum distance between clusters)?
A.Dunn Index
B.Davies-Bouldin Index
C.Entropy
D.F-measure
Correct Answer: Dunn Index
Explanation:The Dunn Index maximizes the ratio involving the minimum inter-cluster distance.
Incorrect! Try again.
48Why is the Rand Index (unadjusted) often considered optimistic?
A.It ignores False Positives
B.It does not correct for the agreement that occurs by chance
C.It ranges from 0 to infinity
D.It favors small clusters
Correct Answer: It does not correct for the agreement that occurs by chance
Explanation:The standard Rand Index will yield a positive non-zero value even for random labelings, especially as the number of clusters increases, making it 'optimistic' compared to ARI.
Incorrect! Try again.
49Completeness score of 1.0 implies:
A.All points in a cluster belong to the same class
B.All points of a specific class are assigned to the same cluster
C.The number of clusters equals the number of classes
D.The clusters are perfectly spherical
Correct Answer: All points of a specific class are assigned to the same cluster
Explanation:This is the definition of Completeness. No member of a given class is split across different clusters.
Incorrect! Try again.
50Which of the following metrics calculates the average similarity between each cluster and its most similar one?
A.Silhouette Score
B.Davies-Bouldin Index
C.Dunn Index
D.Calinski-Harabasz Index
Correct Answer: Davies-Bouldin Index
Explanation:The Davies-Bouldin index averages the worst-case similarity (max ratio of dispersion to separation) for each cluster with its neighbors.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.