Unit 3 - Notes
INT423
Unit 3: Clustering Metrics
Clustering evaluation is critical in determining the quality of the clusters generated by an algorithm. Unlike supervised learning, clustering often lacks "ground truth" labels. Therefore, metrics are divided into two primary categories:
- Internal Evaluation: Used when true labels are unknown (Unsupervised). Evaluates based on cluster cohesion and separation.
- External Evaluation: Used when true class labels are known (Supervised). Evaluates how well the clustering matches the pre-existing structure.
Part 1: Internal Evaluation Metrics
Used when ground truth labels are not available.
1. Silhouette Score
The Silhouette Score measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation).
-
Mechanism:
For each point :- Calculate : The average distance between and all other points in the same cluster (Intra-cluster distance).
- Calculate : The average distance between and all points in the nearest neighboring cluster (Inter-cluster distance).
- The Silhouette Coefficient is:
-
Interpretation:
- Range:
- +1: The point is perfectly clustered (far from neighbors, close to its own).
- 0: The point is on or very close to the decision boundary between two clusters.
- -1: The point has likely been assigned to the wrong cluster.
-
Pros: Easy to interpret; visualizes cluster density and separation.
-
Cons: Computationally expensive (); tends to score higher for convex (spherical) clusters and lower for density-based structures (like DBSCAN results).
Python Implementation:
from sklearn.metrics import silhouette_score
# X is the dataset, labels are the predicted cluster assignments
score = silhouette_score(X, labels)
2. Davies-Bouldin Index (DBI)
The Davies-Bouldin Index represents the average "similarity" between clusters, where similarity is a measure that compares the distance between clusters with the size of the clusters themselves.
-
Mechanism:
- : Average distance between each point of cluster and the centroid of cluster (Cluster diameter/dispersion).
- : Distance between cluster centroids and .
- : Similarity ratio .
- The index is the average of the maximum for each cluster.
-
Interpretation:
- Range:
- Lower Score: Better clustering. A score of 0 indicates perfectly separated clusters with zero scatter.
- Logic: We want low intra-cluster scatter () and high inter-cluster distance ().
-
Pros: Simpler computation than Silhouette Score.
-
Cons: Limits analysis to centroid-based distances; generally assumes spherical clusters.
Python Implementation:
from sklearn.metrics import davies_bouldin_score
score = davies_bouldin_score(X, labels)
3. Dunn Index
The Dunn Index aims to identify dense and well-separated clusters. It is defined as the ratio between the minimal inter-cluster distance to the maximal intra-cluster distance.
-
Formula:
- Numerator: Minimum distance between any two different clusters (separation).
- Denominator: Maximum diameter of any cluster (compactness).
-
Interpretation:
- Range:
- Higher Score: Better clustering. It implies clusters are compact (small denominator) and far apart (large numerator).
-
Pros: Does not rely on centroids/means.
-
Cons: Highly sensitive to noise (outliers can artificially increase diameter or decrease separation); computationally expensive.
Part 2: External Evaluation Metrics
Used when ground truth labels are available.
4. Adjusted Rand Index (ARI)
The Rand Index (RI) computes a similarity measure between two clusterings by considering all pairs of samples and counting pairs that are assigned in the same or different clusters in the predicted and true clusterings. The ARI "adjusts" the RI for chance.
-
Components:
- a (TP): Number of pairs of elements that are in the same set in True Labels and in the same set in Predicted Labels.
- b (TN): Pairs in different sets in True and different in Predicted.
- RI:
- ARI: Adjusted to ensure a random clustering gets a score of 0.
-
Interpretation:
- Range:
- +1: Perfect labeling.
- 0: Random labeling (independent of number of clusters).
- Negative: Worse than random (independent clustering).
-
Pros: Interpretable; works well when the number of clusters is large; symmetric.
Python Implementation:
from sklearn.metrics import adjusted_rand_score
score = adjusted_rand_score(labels_true, labels_pred)
5. Normalized Mutual Information (NMI)
Based on the concept of Entropy () and Mutual Information (). MI measures the reduction in uncertainty about one clustering given the knowledge of the other.
-
Formula:
Where is Mutual Information and is Entropy. -
Interpretation:
- Range:
- 1: Perfect correlation between true labels and predicted clusters.
- 0: No mutual information (independent).
-
Pros: Independent of the absolute values of the labels (only structure matters).
-
Cons: Not adjusted for chance (unlike AMI), meaning it tends to increase as the number of clusters increases, even for random data.
Python Implementation:
from sklearn.metrics import normalized_mutual_info_score
score = normalized_mutual_info_score(labels_true, labels_pred)
6. Homogeneity, 7. Completeness, and 8. V-measure
These three metrics are derived from conditional entropy analysis and are often analyzed together.
6. Homogeneity
- Definition: A clustering result satisfies homogeneity if all of its clusters contain only data points which are members of a single class.
- Concept: "Each cluster should be pure."
- Formula:
7. Completeness
- Definition: A clustering result satisfies completeness if all the data points that are members of a given class are assigned to the same cluster.
- Concept: "All members of class A should be in cluster 1."
- Formula:
8. V-measure
-
Definition: The harmonic mean of Homogeneity and Completeness. It is equivalent to the F1-score in classification but for clustering.
-
Formula:
(Note: Weighted variants exist, denoted as , to prioritize homogeneity or completeness). -
Interpretation: All range from . 1 is perfect.
-
Pros: Intuitive breakdown of error types (is the cluster mixed? or is the class split?).
-
Cons: Like NMI, V-measure is not adjusted for chance.
Python Implementation:
from sklearn.metrics import homogeneity_completeness_v_measure
h, c, v = homogeneity_completeness_v_measure(labels_true, labels_pred)
9. Fowlkes-Mallows Index (FMI)
The Fowlkes-Mallows score is defined as the geometric mean of the pairwise precision and recall.
-
Mechanism:
- TP (True Positive): Pairs of points that belong to the same cluster in both True and Predicted labels.
- FP (False Positive): Pairs in the same Predicted cluster but different True clusters.
- FN (False Negative): Pairs in different Predicted clusters but the same True cluster.
-
Formula:
-
Interpretation:
- Range:
- 1: Perfect agreement.
- 0: Purely random assignment.
-
Pros: Very effective when class sizes are unbalanced.
Python Implementation:
from sklearn.metrics import fowlkes_mallows_score
score = fowlkes_mallows_score(labels_true, labels_pred)
10. Adjusted Mutual Information (AMI)
AMI is an adjustment of the Mutual Information (MI) score to account for chance. It is similar to ARI but based on information theory rather than pair counting.
-
Why Adjustment is Needed:
Standard NMI tends to favor clusterings with a higher number of clusters. If you have as many clusters as data points, NMI is 1.0, which is misleading. AMI corrects this bias. -
Formula:
Where is the expected mutual information under a random permutation model. -
Interpretation:
- Range: (Though typically 0 to 1).
- 1: Perfect match.
- 0: Random clustering.
-
Pros: Best metric for comparing clustering algorithms with different numbers of clusters; robust against cluster size imbalance.
-
Cons: Computationally more intensive than NMI.
Python Implementation:
from sklearn.metrics import adjusted_mutual_info_score
score = adjusted_mutual_info_score(labels_true, labels_pred)
Summary Comparison Table
| Metric | Type | Ground Truth Req? | Range | Best for... |
|---|---|---|---|---|
| Silhouette | Internal | No | -1 to 1 | Determining optimal ; spherical clusters. |
| Davies-Bouldin | Internal | No | 0 to | Fast evaluation; centroid-based clusters. |
| Dunn | Internal | No | 0 to | Detecting compact, well-separated clusters. |
| ARI | External | Yes | -1 to 1 | General purpose; comparing algorithms. |
| NMI | External | Yes | 0 to 1 | Text clustering; entropy-based analysis. |
| V-Measure | External | Yes | 0 to 1 | Analyzing homogeneity vs completeness balance. |
| Fowlkes-Mallows | External | Yes | 0 to 1 | Unbalanced class sizes. |
| AMI | External | Yes | 0 to 1 | Comparing solutions with different . |