1 $What is the primary goal of clustering in unsupervised learning?$

Clustering fundamentals and assumptions Easy

A.

To predict a continuous target variable

B.

To classify data into pre-labeled categories

C.

To group similar data points together

D.

To reduce the dimensionality of the data

2 $Which of the following is a core assumption of partition-based clustering algorithms like k-Means?$

Clustering fundamentals and assumptions Easy

A.

Each cluster can be represented by a central point or centroid.

B.

The data must have a nested, hierarchical tree structure.

C.

The algorithm requires a completely labeled training dataset.

D.

Clusters must overlap completely to be valid.

3 $In hard clustering, how are data points assigned to clusters?$

Hard vs. Soft clustering Easy

A.

Each data point belongs to all clusters equally.

B.

Each data point belongs to exactly one cluster.

C.

Data points are not assigned to any clusters.

D.

Each data point has a probability of belonging to multiple clusters.

4 $Which of the following is an example of a soft clustering algorithm?$

Hard vs. Soft clustering Easy

A.

k-Medoids

B.

DBSCAN

C.

Standard k-Means

D.

Fuzzy C-Means

5 $What does the k-Means objective function aim to minimize?$

k-Means algorithm: Objective function Easy

A.

The distance between the different cluster centroids

B.

Within-Cluster Sum of Squares (WCSS)

C.

The total number of clusters

D.

Between-Cluster Sum of Squares (BCSS)

6 $In the k-Means algorithm, how is a cluster's centroid updated during the iteration process?$

k-Means algorithm: Objective function Easy

A.

By calculating the mean of all data points currently assigned to that cluster

B.

By selecting the point farthest away from the current centroid

C.

By finding the median of the entire dataset

D.

By picking a random data point from the cluster

7 $What is a major disadvantage of using basic random initialization in k-Means?$

Initialization strategies (Random, k-Means++) Easy

A.

It can converge to poor, sub-optimal local minima.

B.

It always places centroids outside the bounds of the dataset.

C.

It guarantees finding the global minimum every time.

D.

It requires calculating the distance between all pairs of points first.

8 $How does the k-Means++ initialization strategy improve upon standard random initialization?$

Initialization strategies (Random, k-Means++) Easy

A.

It selects initial centroids that are probabilistically farther away from each other.

B.

It selects all centroids at random from the very center of the dataset.

C.

It starts with and increments until the WCSS reaches zero.

D.

It assigns initial centroids based on user-provided class labels.

9 $When does the standard k-Means algorithm typically stop iterating?$

Convergence and limitations Easy

A.

When the cluster assignments of the data points no longer change

B.

When the centroids reach the origin point

C.

After exactly 10 iterations

D.

When all data points merge into a single large cluster

10 $Which of the following is a known limitation of the k-Means algorithm?$

Convergence and limitations Easy

A.

It performs extremely slowly on very small datasets.

B.

It can only handle categorical string data.

C.

It requires a labeled training dataset to function.

D.

It struggles to correctly identify non-spherical or complexly shaped clusters.

11 $What is a "medoid" in the context of the k-Medoids algorithm?$

k-Medoids (PAM) vs. k-Means Easy

A.

An actual data point from the dataset that acts as the center of the cluster

B.

The calculated mathematical average of all points in the cluster

C.

The boundary line dividing two adjacent clusters

D.

A randomly generated coordinate located outside the dataset

12 $Why might k-Medoids be preferred over k-Means in some applications?$

k-Medoids (PAM) vs. k-Means Easy

A.

k-Medoids calculates the arithmetic mean instead of using data points.

B.

k-Medoids is always computationally faster than k-Means.

C.

k-Medoids can automatically determine the optimal number of clusters.

D.

k-Medoids is more robust to outliers and noise.

13 $Why is it important to standardize or scale data features before applying k-Means?$

Data standardization and scaling impact Easy

A.

Because standardizing data automatically sets the correct number of clusters .

B.

Because unscaled data causes the algorithm to encounter divide-by-zero errors.

C.

Because k-Means relies on distance metrics (like Euclidean) which are highly sensitive to the scale of features.

D.

Because k-Means can only accept numerical values strictly between 0 and 1.

14 $What is the primary advantage of using MiniBatch k-Means over standard k-Means?$

MiniBatch k-Means for large-scale datasets Easy

A.

It is completely immune to the effects of outliers.

B.

It significantly reduces computation time for very large datasets.

C.

It guarantees a lower WCSS than standard k-Means.

D.

It does not require the user to specify the number of clusters .

15 $How does MiniBatch k-Means achieve faster execution times?$

MiniBatch k-Means for large-scale datasets Easy

A.

By completely ignoring the distance calculations between points

B.

By running standard k-Means strictly on a single CPU core

C.

By only processing the first 100 rows of any dataset

D.

By updating centroids using small, random subsets of data at each iteration

16 $In cluster validation, what exactly does Inertia (WCSS) measure?$

Cluster Validation: Inertia (WCSS) Easy

A.

The amount of time the algorithm takes to converge

B.

The total number of data points inside the largest cluster

C.

The sum of squared distances from each data point to its assigned cluster centroid

D.

The squared distance between the centroids of different clusters

17 $What happens to the Inertia (WCSS) metric as the number of clusters increases towards the total number of data points?$

Cluster Validation: Inertia (WCSS) Easy

A.

It becomes a negative number.

B.

It increases towards infinity.

C.

It decreases towards zero.

D.

It remains completely constant.

18 $What is the possible range of values for a Silhouette Coefficient?$

Silhouette Coefficient Easy

A.

to

B.

$0$ to

C.

to

D.

$0$ to $100$

19 $When evaluating cluster quality using the Davies-Bouldin Index, what does a lower score indicate?$

Davies–Bouldin Index Easy

A.

Better clustering with well-separated and dense clusters

B.

That the algorithm failed to converge

C.

Worse clustering with highly overlapping clusters

D.

That the optimal number of clusters is zero

20 $What is a common pitfall of relying purely on the Elbow method to choose the number of clusters ?$

Elbow method pitfalls Easy

A.

It requires a fully labeled dataset to plot the curve.

B.

It always universally suggests choosing .

C.

The "elbow" point is often visually ambiguous and not clearly defined.

D.

It can only be used for hierarchical clustering, not partition-based.

21 $Which of the following best describes an underlying geometric assumption made by the standard -Means algorithm?$

Clustering fundamentals and assumptions Medium

A.

Clusters are connected components defined strictly by a dense region of points.

B.

Clusters are convex, isotropic (spherical), and have roughly similar variance.

C.

Clusters are arbitrary in shape and can be nested within one another.

D.

Clusters have exactly the same number of data points.

22 $In the context of clustering algorithms, what is the primary distinction between Hard and Soft clustering?$

Hard vs. Soft clustering Medium

A.

Hard clustering requires pre-defining the number of clusters, while soft clustering determines automatically.

B.

Hard clustering assigns each data point exclusively to one cluster, while soft clustering assigns a probability or membership weight of a point belonging to each cluster.

C.

Hard clustering algorithms are robust to outliers, while soft clustering algorithms are highly sensitive to them.

D.

Hard clustering uses distance metrics like Euclidean distance, whereas soft clustering only uses probabilistic distributions.

23 $The objective function of -Means minimizes the Within-Cluster Sum of Squares (WCSS). How does the algorithm guarantee convergence?$

k-Means algorithm: Objective function Medium

A.

By utilizing a learning rate that decays to zero over time, ensuring stable centroids.

B.

Because both the assignment step and the centroid update step are guaranteed to monotonically decrease or maintain the WCSS.

C.

By strictly evaluating all possible partitions and selecting the global minimum.

D.

Because the WCSS is a strictly convex function with respect to the data points, guaranteeing a single global minimum.

24 $How does the -Means++ initialization strategy select the next centroid after the first one is randomly chosen?$

Initialization strategies (Random, k-Means++) Medium

A.

By calculating the global mean of the dataset and selecting the point furthest from the mean.

B.

By choosing the point with the maximum Euclidean distance from the nearest existing centroid.

C.

By randomly sampling points with a uniform probability distribution.

D.

By selecting a point with a probability proportional to its squared distance from the nearest existing centroid.

25 $Which of the following datasets would standard -Means most likely fail to cluster correctly?$

Convergence and limitations Medium

A.

A dataset containing two concentric circular clusters (a smaller circle inside a larger ring).

B.

A dataset of two distinct blobs with roughly equal variance.

C.

A dataset where clusters are perfectly linearly separable.

D.

A dataset consisting of three distant, equally sized spherical clusters.

26 $Why is -Medoids (Partitioning Around Medoids) generally considered more robust to noise and outliers than -Means?$

k-Medoids (PAM) vs. k-Means Medium

A.

-Medoids automatically detects and removes outliers before clustering begins.

B.

-Medoids restricts the cluster centers to be actual data points, preventing an outlier from easily dragging the center into empty space.

C.

-Medoids uses a probabilistic assignment which down-weights the influence of outliers.

D.

-Medoids minimizes the maximum variance within clusters rather than the sum of squared distances.

27 $If a dataset has two features: measured in millimeters (ranging 0 to 1000) and measured in kilometers (ranging 0 to 0.001), what is the likely outcome if -Means is applied without standardizing the data?$

Data standardization and scaling impact Medium

A.

The algorithm will fail to converge because of the varying scales.

B.

Feature will dominate because kilometers are a physically larger unit.

C.

-Means naturally adjusts for variance, so the results will be identical to standardized data.

D.

Feature will disproportionately dominate the distance calculations, making almost irrelevant.

28 $When comparing MiniBatch -Means to standard -Means, which of the following trade-offs is generally true?$

MiniBatch k-Means for large-scale datasets Medium

A.

MiniBatch -Means offers significantly faster computation times at the cost of a slightly worse (higher) Inertia.

B.

MiniBatch -Means is slower per iteration but converges in far fewer iterations.

C.

MiniBatch -Means converges to the exact same global optimum as standard -Means but requires more memory.

D.

MiniBatch -Means produces a lower Inertia than standard -Means but struggles with high-dimensional data.

29 $Why is Inertia (Within-Cluster Sum of Squares) alone an insufficient metric for determining the optimal number of clusters ?$

Cluster Validation: Inertia (WCSS) Medium

A.

Because Inertia always decreases or stays the same as increases, reaching zero when equals the number of data points.

B.

Because Inertia can only be computed for soft clustering algorithms.

C.

Because Inertia increases exponentially as increases, leading to a computational bottleneck.

D.

Because Inertia is completely insensitive to the scale of the data.

30 $The Silhouette Coefficient for a point is defined as . What do and represent?$

Cluster Validation: Silhouette Coefficient Medium

A.

is the variance of the assigned cluster, and is the variance of the closest neighboring cluster.

B.

is the distance to the nearest cluster centroid, and is the distance to the farthest cluster centroid.

C.

is the maximum distance to any point in the same cluster, and is the minimum distance to a point in another cluster.

D.

is the mean intra-cluster distance, and is the mean nearest-cluster distance for the point.

31 $When evaluating clusters using the Davies-Bouldin (DB) Index, which of the following indicates a better clustering partition?$

Davies–Bouldin Index Medium

A.

A lower DB Index, as it signifies low intra-cluster distances and high inter-cluster separation.

B.

A higher DB Index, as it indicates maximum inter-cluster separation.

C.

A DB Index exactly equal to 1, indicating perfectly spherical clusters.

D.

A DB Index close to the total number of clusters .

32 $A data scientist plots the Inertia against the number of clusters to use the Elbow method. However, the curve descends smoothly without a distinct 'elbow' or bend. What is the most reasonable conclusion?$

Elbow method pitfalls Medium

A.

The dataset contains perfectly spherical clusters that are well separated.

B.

The data requires standardizing, as the lack of an elbow indicates scale disparity.

C.

The -Means algorithm failed to converge at any value of .

D.

The data lacks distinct, well-separated cluster structures, or clusters heavily overlap.

33 $Which problem associated with purely random initialization does -Means++ explicitly aim to solve?$

Initialization strategies (Random, k-Means++) Medium

A.

Random initialization might place initial centroids entirely outside the bounding box of the dataset.

B.

Random initialization scales poorly with the number of features, leading to complexity.

C.

Random initialization can lead to centroids being initialized in the same cluster, causing poor local optima.

D.

Random initialization causes the algorithm to compute distances using Manhattan metric rather than Euclidean.

34 $What is the primary computational disadvantage of the standard Partitioning Around Medoids (PAM) algorithm compared to -Means?$

k-Medoids (PAM) vs. k-Means Medium

A.

PAM must invert a covariance matrix at each step.

B.

PAM has a higher time complexity per iteration, typically, making it inefficient for large datasets.

C.

PAM requires the exact number of clusters to be re-evaluated continuously during iterations.

D.

PAM relies on gradient descent, requiring extensive hyperparameter tuning for learning rates.

35 $If a data point has a Silhouette Coefficient of roughly 0, what does this indicate about its placement?$

Cluster Validation: Silhouette Coefficient Medium

A.

The point has been assigned to a cluster by mistake and drastically increases Inertia.

B.

The point is located near the core (center) of its assigned cluster.

C.

The point is located exactly on or near the decision boundary between two neighboring clusters.

D.

The point is an extreme outlier and belongs to no cluster.

36 $Is standard Lloyd's algorithm (-Means) guaranteed to find the absolute lowest possible WCSS (global optimum)?$

Convergence and limitations Medium

A.

No, it rarely converges and often loops infinitely between two identical states.

B.

No, it is only guaranteed to converge to a local optimum, which is why multiple random restarts are used.

C.

Yes, because WCSS is a strictly convex function globally.

D.

Yes, provided that -Means++ initialization is used.

37 $During the update step in MiniBatch -Means, how are the cluster centroids updated?$

MiniBatch k-Means for large-scale datasets Medium

A.

By calculating the exact overall mean of the entire dataset at the end of each batch.

B.

By moving the centroid directly to the single data point in the batch that minimizes the distance.

C.

By completely replacing the old centroid with the mean of the points in the current batch.

D.

By taking a convex combination (using a learning rate) of the old centroid and the mean of the newly assigned points in the batch.

38 $Which of the following scenarios best justifies the use of Soft Clustering over Hard Clustering?$

Hard vs. Soft clustering Medium

A.

When boundaries between clusters are ambiguous and documents/points may exhibit traits of multiple clusters simultaneously.

B.

When the dataset is highly sparse and contains only binary categorical variables.

C.

When the number of clusters is completely unknown and must be derived automatically.

D.

When computing resources are strictly limited and the algorithm must run in time.

39 $If you multiply all features of a dataset by a scalar (where), how will the WCSS (Inertia) of the optimal -Means clustering change compared to the original data?$

k-Means algorithm: Objective function Medium

A.

It will be divided by .

B.

It will remain unchanged because -Means is scale-invariant.

C.

It will be multiplied by .

D.

It will be multiplied by .

40 $Which of the following is a common limitation of relying solely on the Elbow Method for determining ?$

Elbow method pitfalls Medium

A.

It requires computing a distance matrix of size, which is infeasible for large data.

B.

It can only be applied when using the -Medoids algorithm, not -Means.

C.

It always points to regardless of the dataset's underlying structure.

D.

The identification of the 'elbow' is often subjective, and different evaluators might choose different values of .

41 $Which of the following best describes the structural limitation imposed on clusters by the implicit assumptions of standard Euclidean k-Means?$

Clustering fundamentals and assumptions Hard

A.

It assumes clusters are generated from anisotropic distributions with identical covariance matrices.

B.

It assumes clusters have uniform density throughout the feature space, making it robust to variations in cluster volume.

C.

It forces clusters to take a convex, isotropic spatial form, essentially modeling the data as identically sized hyper-spheres.

D.

It requires the underlying data to be linearly separable in a projected subspace, independently of the variance.

42 $Consider a dataset where Cluster A has 10,000 points and Cluster B has 100 points. Both are distinct and spherical. If standard k-means (with) is applied, what is the most likely pathological outcome?$

Clustering fundamentals and assumptions Hard

A.

The centroid of Cluster B will be pulled aggressively toward Cluster A due to the gravitational pull of its higher density.

B.

The algorithm will fail to converge because the variance ratio violates the strict homoscedasticity assumption.

C.

The algorithm may split Cluster A into two clusters and absorb Cluster B into one of them, to minimize overall intra-cluster variance.

D.

The algorithm will perfectly separate Cluster A and Cluster B because they are both spherical.

43 $Gaussian Mixture Models (GMMs) perform soft clustering via the Expectation-Maximization (EM) algorithm. Under what specific mathematical condition does the EM algorithm for GMMs strictly reduce to the hard k-Means algorithm?$

Hard vs. Soft clustering Hard

A.

When the covariance matrices of all components are restricted to be, and we take the limit as .

B.

When all covariance matrices are set to zero ().

C.

When the posterior probabilities are modeled as a uniform distribution across all .

D.

When the mixing coefficients are fixed to and the covariance matrices are allowed to vary independently.

44 $The objective function of k-Means minimizes the Within-Cluster Sum of Squares (WCSS). Let be the Total Sum of Squares and be the Between-Cluster Sum of Squares. Which identity proves that minimizing WCSS is mathematically equivalent to maximizing the separation between cluster centroids?$

k-Means algorithm: Objective function Hard

A.

B.

C.

D.

45 $Suppose we modify the standard k-Means objective function by adding a penalty term: . What is the primary effect of the penalty on the cluster centroids?$

k-Means algorithm: Objective function Hard

A.

It shrinks the cluster centroids exactly to the global mean of the dataset, regardless of .

B.

It makes the objective function strictly convex, guaranteeing a global optimum.

C.

It introduces sparsity in the centroid coordinates, effectively performing feature selection for the cluster centers.

D.

It forces the cluster centroids to be mutually orthogonal.

46 $In the k-Means++ initialization strategy, the next centroid is chosen from the remaining data points with a probability proportional to, where is the distance to the nearest existing centroid. If this probability were instead proportional to (not squared), what would be the most significant consequence?$

Initialization strategies (Random, k-Means++) Hard

A.

The initialization would have an increased likelihood of selecting outlier points as centroids.

B.

The algorithm would guarantee competitive bounds instead of bounds.

C.

The algorithm would provide weaker suppression of points near already-chosen centroids, increasing the risk of suboptimal local minima.

D.

The algorithm would degenerate into completely random initialization.

47 $Which of the following is an established theoretical guarantee provided by the k-Means++ initialization algorithm?$

Initialization strategies (Random, k-Means++) Hard

A.

It guarantees that the subsequent Lloyd's algorithm will converge in exactly one iteration.

B.

It ensures that no two initial centroids will ever share the same Voronoi cell boundaries.

C.

It yields an expected initial WCSS that is within an factor of the optimal global WCSS.

D.

It guarantees finding the absolute global minimum of the k-Means objective function in iterations.

48 $Standard k-Means (Lloyd's algorithm) uses an iterative two-step process. Which of the following statements rigorously explains why k-Means is guaranteed to converge in a finite number of steps?$

Convergence and limitations Hard

A.

The state space is continuous, and the objective function is strongly convex, requiring the gradients to vanish at a unique global minimum.

B.

The distance metric satisfies the triangle inequality, which forces the centroids to move by monotonically decreasing amounts.

C.

The algorithm projects the data into a lower-dimensional simplex where the number of extreme points is strictly bounded by .

D.

The WCSS strictly decreases or stays constant at each step, and there are a finite number () of possible cluster assignments.

49 $In extremely high-dimensional spaces, standard k-Means often produces poorly defined clusters. Aside from the increased computational cost, what is the primary geometric reason for this failure?$

Convergence and limitations Hard

A.

The covariance matrices of the clusters become singular, preventing the calculation of the centroid.

B.

The optimization landscape becomes perfectly flat, meaning the gradient of the WCSS is zero everywhere.

C.

The L2 norm becomes non-subadditive in high dimensions, breaking the underlying metric space properties.

D.

The ratio of the variance of distances to the mean distance between points converges to zero, making all points seem equidistant.

50 $Consider the Partitioning Around Medoids (PAM) algorithm. During the swap phase, what is the worst-case time complexity per iteration for a dataset of points and clusters, assuming a pre-computed distance matrix?$

k-Medoids (PAM) vs. k-Means Hard

A.

B.

C.

D.

51 $Which of the following describes the key theoretical advantage of k-Medoids over k-Means regarding the breakdown point when handling adversarial outliers?$

k-Medoids (PAM) vs. k-Means Hard

A.

k-Medoids is restricted to using actual data points as centers, preventing an extreme outlier from shifting a center to an arbitrary location in empty space.

B.

k-Medoids optimizes the L1 norm instead of the L2 norm, completely neutralizing the influence of outliers on the objective function.

C.

k-Medoids automatically drops clusters if their intra-cluster distance exceeds a predefined breakdown threshold.

D.

k-Medoids achieves a breakdown point of exactly 0.5 because it uses the median absolute deviation.

52 $A dataset has two features: (variance = 1000) and (variance = 1). If k-Means is applied without standardization, the cluster boundaries will predominantly be perpendicular to which axis, and why?$

Data standardization and scaling impact Hard

A.

Perpendicular to, because the centroids will align themselves along the axis of maximum variance.

B.

Perpendicular to, because the distance calculations are overwhelmingly dominated by the variance in .

C.

Diagonal to both, because k-Means intrinsically performs PCA rotation before assigning clusters.

D.

Perpendicular to, because lower variance features are more heavily weighted in the Euclidean distance.

53 $Suppose you apply Z-score standardization to a dataset heavily corrupted by extreme outliers, followed by k-Means clustering. What is the most likely detrimental effect on the resulting clusters?$

Data standardization and scaling impact Hard

A.

The outliers will cause the standard deviation of the features to be artificially large, heavily compressing the inliers into a dense clump and rendering k-Means unable to separate the underlying natural clusters.

B.

The outliers will force the covariance matrix to become singular, crashing the k-Means update step.

C.

Z-score standardization forces k-Means to converge to a single cluster due to the scaling of variances to 1.

D.

The standardization shifts the outliers to the mean, making them indistinguishable from normal points.

54 $In MiniBatch k-Means, centroid updates are performed using a stochastic gradient descent-like approach. Let be the count of points assigned to a centroid up to iteration . How is the learning rate dynamically adjusted for a newly arriving point assigned to ?$

MiniBatch k-Means for large-scale datasets Hard

A.

The learning rate decays inversely proportional to the number of points in the current batch.

B.

The learning rate increases exponentially to prioritize newly arriving data in non-stationary streams.

C.

The learning rate decays as, making the update a true moving average of all points assigned to that centroid.

D.

The learning rate is constant, governed by a hyperparameter .

55 $What is the primary theoretical trade-off regarding the final objective function (WCSS) when using MiniBatch k-Means instead of standard Lloyd's algorithm on a stationary dataset?$

MiniBatch k-Means for large-scale datasets Hard

A.

MiniBatch k-Means generally converges to a slightly worse (higher) WCSS due to the stochastic noise in gradient estimates, though the degradation is empirically bounded.

B.

MiniBatch k-Means produces an asymptotically identical WCSS, but requires exponentially more iterations.

C.

MiniBatch k-Means guarantees a lower WCSS because stochasticity helps escape local minima.

D.

MiniBatch k-Means introduces a systematic bias that strictly forces the WCSS to be exactly twice that of standard k-Means.

56 $Why is Inertia (WCSS) fundamentally unsuitable as a standalone metric for determining the absolute true number of clusters () without using heuristics like the Elbow method?$

Cluster Validation: Inertia (WCSS) Hard

A.

Inertia relies on absolute distance, which is undefined for non-Euclidean spaces.

B.

Inertia cannot be computed if the clusters contain a varying number of data points.

C.

Inertia strictly decreases monotonically as increases, reaching zero when .

D.

Inertia scales quadratically with the number of dimensions, making it invalid for .

57 $The Silhouette Coefficient for a point is defined as . In the edge case where a cluster contains exactly one data point, standard implementations (like scikit-learn) typically handle in what way to avoid mathematical inconsistency?$

Silhouette Coefficient Hard

A.

It is set to, punishing the algorithm for creating an isolated cluster.

B.

It is set to $1$, because the intra-cluster distance is $0$.

C.

It triggers an automatic merge with the nearest cluster to satisfy .

D.

It is set to $0$, reflecting neither a well-clustered nor badly-clustered point.

58 $A data scientist observes that an entire cluster in their k-Means model yields predominantly negative Silhouette Coefficients. What does this geometrically imply about that specific cluster?$

Silhouette Coefficient Hard

A.

The intra-cluster distance is negative due to a violation of the metric space triangle inequality.

B.

The points in the cluster are, on average, closer to the centroid of a different cluster than they are to other points in their assigned cluster.

C.

The cluster's centroid exactly overlaps with the centroid of an adjacent cluster.

D.

The cluster is perfectly spherical and dense, but too far away from the global mean.

59 $The Davies-Bouldin (DB) Index is defined as . If a clustering algorithm heavily minimizes the DB Index, what implicit bias does it have regarding cluster structure?$

Davies–Bouldin Index Hard

A.

It rewards clusters with high density regardless of their geometric proximity to neighboring clusters.

B.

It penalizes models that create clusters of equal variance, strictly favoring hierarchical layouts.

C.

It strongly favors clusters that are both compact (low intra-cluster dispersion) and far apart from each other.

D.

It is biased towards clusters that are widely separated but allows them to be highly elongated and overlapping.

60 $Suppose points are drawn from a uniform distribution over a -dimensional hypercube (meaning no true underlying clusters exist). If you plot the Inertia (WCSS) versus to apply the Elbow Method, what will the curve look like, and what pitfall does this demonstrate?$

Elbow method pitfalls Hard

A.

The curve will decay smoothly without a clear elbow, potentially leading practitioners to force a subjective, arbitrary choice of on non-clustered data.

B.

The curve will exhibit a sharp, distinct elbow at, fooling the practitioner into believing there are natural clusters.

C.

The curve will be completely flat (slope of 0), wrongly suggesting is optimal.

D.

The curve will oscillate chaotically, indicating that the uniform distribution violates the algorithm's convergence properties.

Unit 2 - Practice Quiz