1 $What is the primary goal of unsupervised learning?$

Role of unsupervised learning Easy

A.

To train an agent to make decisions through rewards and punishments.

B.

To predict a target variable based on labeled input features.

C.

To find hidden patterns and structures in unlabeled data.

D.

To classify data into predefined categories.

2 $What is the key difference between supervised and unsupervised learning?$

Differences between supervised and unsupervised learning Easy

A.

The presence of labeled output data.

B.

The type of computer hardware required.

C.

The complexity of the algorithms used.

D.

The size of the dataset.

3 $Which distance metric calculates the straight-line distance between two points in a Euclidean space?$

Euclidean, Manhattan, Cosine distances Easy

A.

Minkowski distance with p=1

B.

Cosine distance

C.

Manhattan distance

D.

Euclidean distance

4 $Manhattan distance is calculated as the:$

Euclidean, Manhattan, Cosine distances Easy

A.

Square root of the sum of squared differences of the coordinates.

B.

Shortest path between two points.

C.

Angle between two vectors.

D.

Sum of the absolute differences of the coordinates.

5 $What does a cosine similarity of 1 between two vectors indicate?$

Euclidean, Manhattan, Cosine distances Easy

A.

The vectors are perpendicular to each other.

B.

The vectors have the same orientation.

C.

The vectors have the same magnitude.

D.

The vectors point in opposite directions.

6 $What is the fundamental objective of clustering algorithms?$

Clustering Algorithms Easy

A.

To group similar data points together while keeping dissimilar points in different groups.

B.

To classify data points into predefined labels.

C.

To predict a continuous value for new data points.

D.

To reduce the number of features in a dataset.

7 $In the K-Means algorithm, what does the 'K' represent?$

K-Means Clustering Easy

A.

The number of clusters to be formed.

B.

The number of features for each data point.

C.

The number of data points in the dataset.

D.

The number of iterations the algorithm will run.

8 $How is the centroid of a cluster updated in each iteration of the K-Means algorithm?$

K-Means Clustering Easy

A.

By choosing a random data point from the cluster.

B.

By taking the median of all data points in the cluster.

C.

By calculating the mean of all data points assigned to that cluster.

D.

By selecting the data point closest to the old centroid.

9 $What is the primary purpose of the Elbow method in the context of K-Means clustering?$

Elbow method Easy

A.

To initialize the cluster centroids.

B.

To measure the final accuracy of the clustering.

C.

To determine the optimal number of clusters (K).

D.

To select the distance metric for the algorithm.

10 $How does a 'medoid' in the K-Medoids algorithm differ from a 'centroid' in K-Means?$

K-Medoids Easy

A.

A medoid is an actual data point in the cluster, while a centroid is the mean of the points.

B.

There is no difference; the terms are interchangeable.

C.

A medoid is the mean of the points, while a centroid is an actual data point.

D.

A medoid is used for categorical data, while a centroid is for numerical data.

11 $What is the typical visual output of a hierarchical clustering algorithm?$

Hierarchical Clustering Easy

A.

A scatter plot

B.

A line graph

C.

A dendrogram

D.

A confusion matrix

12 $Agglomerative hierarchical clustering follows which approach?$

Hierarchical Clustering Easy

A.

A 'top-down' approach, starting with one big cluster and splitting it.

B.

A 'bottom-up' approach, starting with individual points and merging them into clusters.

C.

A 'random-start' approach, similar to K-Means.

D.

A 'density-based' approach, connecting dense regions.

13 $In hierarchical clustering, what does 'complete linkage' measure?$

linkage criteria Easy

A.

The maximum distance between any two points in two different clusters.

B.

The distance between the centroids of two clusters.

C.

The average distance between all pairs of points in two different clusters.

D.

The minimum distance between any two points in two different clusters.

14 $What is a key advantage of density-based clustering algorithms like DBSCAN?$

Density-Based Clustering Easy

A.

They can discover clusters of arbitrary shapes.

B.

They always require the number of clusters to be specified beforehand.

C.

They work best on datasets with very few points.

D.

They are the fastest type of clustering algorithm.

15 $What is the primary objective of anomaly detection?$

Anomaly Detection Easy

A.

To group all data points into a predefined number of clusters.

B.

To predict a future value based on historical data.

C.

To summarize the main features of a dataset.

D.

To identify data points that deviate significantly from the rest of the data.

16 $In the context of K-Means, what does Inertia measure?$

Evaluating clustering algorithms using Inertia Easy

A.

The number of misclassified points.

B.

The total number of clusters.

C.

The sum of squared distances of samples to their closest cluster center.

D.

The ratio of between-cluster to within-cluster variance.

17 $What does a Silhouette Score close to +1 for a data point indicate?$

Silhouette Score Easy

A.

The point is an outlier.

B.

The point is well-matched to its own cluster and poorly matched to neighboring clusters.

C.

The point may have been assigned to the wrong cluster.

D.

The point is on the border between two clusters.

18 $A Silhouette Score close to 0 for a data point suggests that:$

Silhouette Score Easy

A.

The point is an outlier that doesn't belong to any cluster.

B.

The point is in a very dense and well-separated cluster.

C.

The point is very close to the boundary between two clusters.

D.

The clustering is perfect.

19 $When evaluating clustering performance with the Davies-Bouldin Index (DBI), what kind of value is desirable?$

Davies–Bouldin Index Easy

A.

A high value.

B.

A low value.

C.

A negative value.

D.

A value close to zero.

20 $For which type of data is cosine similarity often a more effective distance metric than Euclidean distance?$

Choosing appropriate distance metrics Easy

A.

Low-dimensional geographical data (e.g., GPS coordinates).

B.

Image pixel intensity data.

C.

Customer age and income data.

D.

High-dimensional text data (e.g., document analysis).

21 $A financial services company wants to segment its customer base into distinct groups based on their transaction history and demographic data. The company does not have any predefined labels for these groups. The goal is to discover these segments to create targeted marketing campaigns. Which machine learning paradigm is most appropriate for this task and why?$

Differences between supervised and unsupervised learning Medium

A.

Semi-supervised learning, because some customers are already categorized.

B.

Supervised learning, because the goal is to predict a specific outcome.

C.

Reinforcement learning, because the system needs to learn from rewards and penalties.

D.

Unsupervised learning, because the goal is to find inherent patterns and structures in unlabeled data.

22 $Consider two data points in a 2D space: P1 = (2, 5) and P2 = (6, 2). Calculate the Euclidean distance () and Manhattan distance () between them.$

Euclidean, Manhattan, Cosine distances Medium

A.

,

B.

,

C.

,

D.

,

23 $In the K-Means algorithm, after assigning all data points to the nearest centroid, what is the immediate next step in an iteration?$

K-Means Clustering Medium

A.

Recalculate the position of each centroid to be the mean of all points assigned to it.

B.

Calculate the total inertia and stop if it has converged.

C.

Re-assign each data point to the new closest centroid.

D.

Select a new set of random centroids to start the next iteration.

24 $You have generated an elbow plot to find the optimal number of clusters for K-Means. The plot shows the Sum of Squared Errors (Inertia) for K values from 1 to 10. The inertia drops sharply from K=1 to K=3, then shows a much smaller, almost linear decrease for K > 3. What is the most appropriate choice for K?$

Elbow method Medium

A.

K = 10, because it has the lowest possible inertia among the options.

B.

K = 5, as a compromise between model complexity and inertia.

C.

K = 1, because it is the simplest model.

D.

K = 3, because it's the 'elbow point' where adding more clusters yields diminishing returns.

25 $What is the primary advantage of the K-Medoids (PAM) algorithm over K-Means, especially in datasets with outliers?$

K-Medoids Medium

A.

K-Medoids does not require the number of clusters (K) to be specified beforehand.

B.

K-Medoids is computationally faster than K-Means for large datasets.

C.

K-Medoids can find non-spherical clusters, unlike K-Means.

D.

K-Medoids is more robust to outliers because its centroids (medoids) must be actual data points.

26 $After performing agglomerative hierarchical clustering, you are examining the resulting dendrogram. If you draw a horizontal line that intersects three vertical lines of the dendrogram, what does this signify?$

Hierarchical Clustering Medium

A.

It indicates that the optimal number of clusters for the dataset is exactly 3.

B.

It shows that the algorithm ran for 3 iterations before stopping.

C.

It suggests that the data can be meaningfully partitioned into 3 clusters at that level of dissimilarity.

D.

It means that there are 3 outliers in the dataset.

27 $In agglomerative hierarchical clustering, which linkage criterion tends to produce long, chain-like clusters and is sensitive to noise between clusters?$

linkage criteria Medium

A.

Average linkage

B.

Single linkage

C.

Ward's linkage

D.

Complete linkage

28 $You are using the DBSCAN algorithm with parameters eps = 1.0 and MinPts = 4. A data point P has exactly 3 other data points within its eps -radius. How will DBSCAN classify point P ?$

Density-Based Clustering Medium

A.

As a border point.

B.

As a core point.

C.

This scenario is impossible in DBSCAN.

D.

As a noise point.

29 $A clustering result yields an average Silhouette Score of -0.2. What is the most accurate interpretation of this score?$

Silhouette Score Medium

A.

The number of clusters chosen is likely the optimal number.

B.

The clustering is excellent, with well-defined and dense clusters.

C.

The clusters are too sparse to be evaluated correctly.

D.

The clustering is poor, and many data points may have been assigned to the wrong clusters.

30 $You are comparing clustering performance for different values of K. For K=3, the Davies-Bouldin Index (DBI) is 0.75. For K=4, the DBI is 1.2. Based solely on this metric, which result is better and why?$

Davies–Bouldin Index Medium

A.

Both are equally good as the DBI is only for comparing different algorithms, not different K values.

B.

Neither is good, as a DBI score must be less than 0 for a valid clustering.

C.

K=3 is better because a lower Davies-Bouldin Index indicates better-defined, more separated clusters.

D.

K=4 is better because a higher Davies-Bouldin Index indicates higher inter-cluster variance.

31 $You are tasked with clustering a dataset of user profiles, where features are a mix of numerical (age, income) and categorical (city, profession). The numerical features are on vastly different scales. Which distance metric is most appropriate for this scenario?$

Choosing appropriate distance metrics Medium

A.

Cosine Distance, because it focuses on the orientation of the feature vectors.

B.

Manhattan Distance, because it is more robust to outliers than Euclidean distance.

C.

Euclidean Distance, because it is the most common and works well for all data types.

D.

Gower's Distance, because it can handle mixed data types by using different measures for each type.

32 $Which of the following scenarios is a canonical example of unsupervised anomaly detection?$

Anomaly Detection Medium

A.

Predicting the stock price of a company based on its historical performance.

B.

Building a model to classify emails as 'spam' or 'not spam' based on a large dataset of emails already labeled by users.

C.

Grouping a set of news articles into topics like 'sports', 'politics', and 'technology'.

D.

Identifying unusual credit card transactions from a large volume of normal transactions without pre-existing fraud labels.

33 $Why is Inertia (Within-Cluster Sum of Squares) often considered an insufficient metric on its own for evaluating the quality of a clustering result?$

Evaluating clustering algorithms using Inertia Medium

A.

Because a lower inertia always corresponds to a better real-world clustering performance.

B.

Because inertia assumes that clusters are non-spherical and of varying sizes.

C.

Because inertia will always decrease as the number of clusters (K) increases, making it biased towards larger K values.

D.

Because inertia cannot be calculated for clusters found by density-based algorithms like DBSCAN.

34 $Two documents are represented by word frequency vectors V1 = [10, 20, 5] and V2 = [2, 4, 1]. How would the Cosine Distance between these two vectors be calculated, and what does its value imply?$

Cosine distances Medium

A.

The distance will be 0, because V1 is a scalar multiple of V2, meaning they point in the exact same direction.

B.

The distance cannot be calculated because the vectors are not normalized.

C.

The distance depends on the Euclidean separation between the vectors, not their angle.

D.

The distance will be large, because the magnitudes of the vectors are very different.

35 $A data scientist is working on a high-dimensional dataset with 100 features. To improve the performance and interpretability of a subsequent supervised classification model, they first apply Principal Component Analysis (PCA) to reduce the feature space to 10 principal components. This initial step is an application of:$

Role of unsupervised learning Medium

A.

Unsupervised learning for dimensionality reduction.

B.

Semi-supervised learning for label propagation.

C.

Reinforcement learning for policy optimization.

D.

Supervised learning for feature selection.

36 $A key advantage of DBSCAN over K-Means is its ability to find arbitrarily shaped clusters and identify noise. What core assumption of K-Means does DBSCAN avoid, allowing it to have this flexibility?$

Density-Based Clustering Medium

A.

The assumption that the number of clusters, K, must be known in advance.

B.

The assumption that clusters have a similar number of data points.

C.

The assumption that every data point must belong to a cluster.

D.

The assumption that clusters are convex and isotropic (spherical).

37 $When analyzing a silhouette plot for a K-Means result with K=4, you notice that one of the clusters has a significantly lower average silhouette score than the others, and many of its points have negative scores. What is the most likely interpretation?$

Silhouette Score Medium

A.

The value of K=4 is the optimal number of clusters for this dataset.

B.

This cluster is the most dense and well-separated cluster of the four.

C.

This particular cluster is poorly defined, and its points might be closer to a neighboring cluster.

D.

The points in this cluster are outliers that should be removed.

38 $The initialization of centroids in K-Means is known to be a critical step. If a poor initialization places two centroids very close to each other within a single, large natural cluster, what is a likely outcome of the algorithm's execution?$

K-Means Clustering Medium

A.

The algorithm will automatically merge the two close centroids into one.

B.

One of the two centroids will be classified as an outlier and ignored.

C.

The algorithm will fail to converge and run indefinitely.

D.

The single large cluster may be incorrectly split into two, leading to a suboptimal local minimum.

39 $You are comparing the results of hierarchical clustering on a dataset with two distinct, well-separated spherical clusters of similar size. Which two linkage criteria are most likely to produce very similar, high-quality results for this specific dataset?$

Hierarchical Clustering Medium

A.

Single linkage and Complete linkage

B.

Average linkage and Single linkage

C.

Complete linkage and Ward's linkage

D.

Ward's linkage and Single linkage

40 $A dataset consists of customer purchase histories, which is very high-dimensional and sparse (most customers have not bought most products). Which clustering algorithm and distance metric combination would be most suitable?$

Clustering Algorithms Medium

A.

K-Means with Cosine Distance

B.

K-Medoids with Manhattan Distance

C.

Hierarchical Clustering with Ward's Linkage

D.

DBSCAN with Euclidean Distance

41 $In a standard K-Means implementation, the objective is to minimize the within-cluster sum of squares (WCSS). Consider a scenario where you replace the standard centroid update step (calculating the mean of points in a cluster) with a step that finds the geometric median of the points. What is the primary implication of this change on the algorithm's objective and behavior?$

K-Means Clustering Hard

A.

The algorithm would fail to converge because the geometric median does not guarantee a decrease in the WCSS objective function.

B.

The algorithm would now implicitly minimize the sum of Euclidean distances (not squared) to the cluster center, making it more robust to outliers.

C.

The algorithm would now implicitly minimize the sum of Manhattan distances to the cluster center, making it more robust to outliers.

D.

The algorithm's performance would be identical to K-Medoids, as the geometric median is always one of the data points.

42 $You are performing agglomerative hierarchical clustering on a dataset with 1,000,000 points. Your primary constraint is computational efficiency, specifically memory usage for storing the distance matrix. Which linkage method, when implemented naively, is most problematic, and which of the following provides the best alternative strategy?$

Hierarchical Clustering Hard

A.

Complete linkage is most problematic because it requires scanning all pairs of points between clusters. A better strategy is to use a density-based method like DBSCAN instead.

B.

Ward's linkage is most problematic because its objective function is complex. A better strategy is to use a divisive hierarchical algorithm like DIANA.

C.

Single linkage is most problematic due to its chaining tendency. A better strategy is to use Ward's linkage which creates more balanced clusters.

D.

All linkage methods are equally problematic due to the space complexity. The best strategy is to first apply K-Means to create 10,000 micro-clusters and then perform hierarchical clustering on their centroids.

43 $Consider using DBSCAN on a dataset containing clusters of significantly different densities. For instance, Cluster A is very dense, and Cluster B is much sparser. If you choose the eps and min_samples parameters to correctly identify Cluster A, what is the most likely outcome for Cluster B?$

Density-Based Clustering Hard

A.

The points in Cluster B will form their own, correct cluster, as DBSCAN is robust to density variations.

B.

The points in Cluster B will be fragmented into multiple, smaller clusters.

C.

The points in Cluster B will likely be classified as noise points.

D.

The points in Cluster B will be merged with Cluster A into a single large cluster.

44 $A clustering algorithm produces two clusters on a dataset. The resulting average Silhouette Score is -0.1. What is the most precise and accurate interpretation of this result?$

Silhouette Score Hard

A.

The number of clusters chosen is incorrect, and increasing the number of clusters will certainly improve the score.

B.

On average, points are closer to the centroid of a neighboring cluster than to their own cluster's centroid.

C.

The clustering is completely incorrect, and a random assignment of points would be better.

D.

The majority of points have likely been assigned to the wrong cluster, suggesting the clusters are overlapping significantly.

45 $You are working with a dataset of user movie ratings on a scale of 1-5, where many users have rated only a small fraction of the available movies (i.e., the data is sparse). You want to cluster users based on their taste. Why is Euclidean distance a poor choice of metric compared to Cosine similarity in this specific context?$

Choosing appropriate distance metrics Hard

A.

Euclidean distance cannot handle the high dimensionality of the data, while Cosine similarity is specifically designed for high dimensions.

B.

Euclidean distance treats the absence of a rating (a zero in the vector) as a low rating (e.g., a rating of 0), which incorrectly penalizes users who have rated different sets of movies.

C.

Euclidean distance is computationally more expensive than Cosine similarity for sparse vectors.

D.

Euclidean distance is not a valid metric for integer-valued data, whereas Cosine similarity is.

46 $You are comparing two clustering results, A and B, on the same dataset. Result A has a Davies-Bouldin Index (DBI) of 0.8. Result B has a DBI of 1.5. However, visually inspecting the clusters, result B appears to have more distinct and well-separated clusters than A. Which characteristic of the dataset and clustering B is the most likely cause for this counter-intuitive DBI result?$

Davies–Bouldin Index Hard

A.

Result B has a much larger number of clusters than Result A, and DBI always penalizes a higher number of clusters.

B.

The calculation for result A must be erroneous, as a DBI of 0.8 indicates near-perfect clustering which contradicts the visual inspection.

C.

Result B has clusters that are non-convex (e.g., crescent-shaped), which DBI cannot evaluate correctly.

D.

Result B has clusters of highly varying sizes and densities; a very large, sparse cluster paired with a small, dense cluster can lead to a high (poor) DBI score even if they are well-separated.

47 $You apply K-Means clustering for a range of values from 2 to 20 on a dataset and plot the WCSS (Inertia) to use the Elbow method. The resulting plot shows a smooth, convex curve with no clear 'elbow'. Which of the following is the most plausible characteristic of the underlying data distribution?$

Elbow method Hard

A.

The data contains exactly two, non-spherical clusters (e.g., two interlocking moons).

B.

The data forms a single, large, globular (Gaussian-like) distribution with no distinct sub-clusters.

C.

The data is uniformly distributed within a square or hypercube.

D.

The data consists of several very dense, well-separated, spherical clusters.

48 $K-Medoids (using Partitioning Around Medoids - PAM) is often preferred over K-Means in the presence of outliers. While its robustness is due to using an actual data point (medoid) as the cluster center, what is the primary trade-off associated with this choice, especially on large datasets?$

K-Medoids Hard

A.

The computational complexity of the medoid update step in PAM is significantly higher than the centroid update step in K-Means.

B.

K-Medoids is guaranteed to find the global optimum, whereas K-Means gets stuck in local minima.

C.

K-Medoids cannot be used with arbitrary distance metrics, unlike K-Means.

D.

K-Medoids is more sensitive to the initial choice of medoids than K-Means is to the initial choice of centroids.

49 $Two different K-Means clustering runs are performed on the same dataset with . Run A results in an Inertia of 500. Run B results in an Inertia of 700. Both runs use standardized data (zero mean, unit variance). Which conclusion can be drawn with the highest degree of certainty?$

Inertia Hard

A.

Run B likely converged to a poor local minimum, and Run A is closer to the global minimum of the WCSS objective.

B.

The clustering from Run A is guaranteed to be qualitatively better and more useful for the business problem.

C.

The clusters in Run A are more spherical and equally sized than the clusters in Run B.

D.

The data was not scaled correctly for Run B, leading to the higher Inertia.

50 $You are using a density-based clustering algorithm like DBSCAN for anomaly detection, where points classified as 'noise' are considered anomalies. In which of the following scenarios would this approach most likely fail, leading to a high number of false negatives (i.e., missing actual anomalies)?$

Anomaly Detection Hard

A.

The normal data forms several distinct, dense, well-separated clusters.

B.

The anomalies are context-dependent and are only anomalous relative to their local neighborhood.

C.

A small, tight, but isolated cluster of anomalous events occurs.

D.

The dataset has many global anomalies that are far from any dense region.

51 $Consider three 2D points:,, and . We want to find which point, A or B, is 'closer' to P. How does the choice of distance metric affect the outcome?$

Euclidean, Manhattan, Cosine distances Hard

A.

A and B are equidistant from P under Euclidean distance.

B.

A is closer to P under Manhattan distance, while B is closer under Euclidean distance.

C.

B is closer to P under both Euclidean and Manhattan distances.

D.

A is closer to P under Euclidean distance, while B is closer under Manhattan distance.

52 $In agglomerative hierarchical clustering, which linkage criterion is most sensitive to outliers and also has a tendency to produce long, chain-like clusters, and why?$

linkage criteria Hard

A.

Average linkage, because outliers skew the average distance, pulling disparate clusters together into chains.

B.

Single linkage, because the cluster distance is defined by the single closest pair of points, allowing outliers to act as a 'bridge' between distinct groups.

C.

Ward's linkage, because it aims to minimize variance, and outliers dramatically increase variance, causing them to form their own chains.

D.

Complete linkage, because it considers the furthest points, which are often outliers, thus forcing clusters to be compact.

53 $A financial institution wants to build a system to detect fraudulent credit card transactions. They have a large dataset of transactions, but only a very small fraction (<<0.1%) have been reliably labeled as fraudulent by investigators. An analyst proposes using K-Means clustering on the entire dataset first and then treating any small, isolated clusters as potentially fraudulent. What is the primary conceptual challenge of this semi-supervised approach?$

Differences between supervised and unsupervised learning Hard

A.

This is a pure supervised learning problem; clustering is entirely inappropriate and will yield no useful results.

B.

K-Means is not designed for anomaly detection; a density-based algorithm like LOF would be the only valid unsupervised approach.

C.

The approach is sound, but K-Means will be too slow for a large financial dataset.

D.

The clustering algorithm's objective (minimizing variance) is not aligned with the goal of separating fraudulent from non-fraudulent transactions; normal but rare transaction patterns might be isolated as 'anomalies'.

54 $An engineer is building a supervised classification model to predict customer churn. The dataset has over 500 features. To reduce dimensionality, they first apply K-Means clustering with k=20 to the feature set. They then replace the 500 features with a single new feature: the cluster ID (an integer from 0 to 19) for each customer. Why is this a fundamentally flawed feature engineering approach?$

Role of unsupervised learning Hard

A.

The cluster IDs are categorical, but treating them as a single ordinal feature (0 < 1 < ... < 19) introduces an arbitrary and misleading ordering that the supervised model will interpret as a meaningful scale.

B.

Using 20 clusters is too many; a smaller k like 5 would be more effective.

C.

This approach is perfectly valid and is a standard technique called 'cluster-based feature engineering'.

D.

K-Means is the wrong algorithm; Principal Component Analysis (PCA) should have been used for dimensionality reduction.

55 $You are given a dataset that contains two dense, crescent-shaped clusters that are interlocking. There is also a significant amount of uniform random noise scattered throughout the space. Which clustering algorithm and parameterization philosophy would be most effective at correctly identifying the two crescent shapes while ignoring the noise?$

Clustering Algorithms Hard

A.

DBSCAN, with eps set to capture the density of the crescents and min_samples set high enough to ignore the sparser random noise.

B.

Gaussian Mixture Model (GMM) with 2 components, as it can model the probability distribution of each cluster.

C.

Agglomerative Hierarchical Clustering with Ward's linkage, as it minimizes variance and will separate the noise.

D.

K-Means with k=2, as it is efficient and will find the two dominant centroids.

56 $A K-Means clustering with is performed on a dataset, yielding an average Silhouette Score of 0.75. The same dataset is then clustered with, and the score drops to 0.60. However, the Inertia (WCSS) for the clustering is significantly lower than for . What is the most likely reason for this divergence between the two metrics?$

Silhouette Score Hard

A.

The Silhouette Score is malfunctioning; a lower Inertia should always result in a better or equal score.

B.

The fifth cluster introduced in the solution likely split a natural, well-formed cluster from the solution, increasing the number of points that are now close to a neighboring cluster.

C.

This indicates that the true number of clusters is actually 3 or less.

D.

The solution must have been a poor local minimum, giving an artificially high Silhouette Score.

57 $How does the OPTICS algorithm fundamentally differ from its predecessor, DBSCAN, in addressing the challenge of clustering data with varying densities?$

Density-Based Clustering Hard

A.

OPTICS does not produce a single flat clustering, but rather a 'reachability plot' that represents the density-based structure, from which clusterings for any eps can be extracted.

B.

OPTICS does not require the eps parameter, only min_samples, making it easier to tune.

C.

OPTICS is a partitioning algorithm, while DBSCAN is hierarchical, allowing OPTICS to handle varied densities.

D.

OPTICS uses a different distance metric than DBSCAN, which is more robust to density changes.

58 $You are performing K-Means on a dataset with features on vastly different scales (e.g., 'age' in years and 'income' in dollars). You forget to apply feature scaling. How does this oversight specifically affect the Inertia calculation and the resulting cluster shapes?$

Inertia Hard

A.

It has no effect, as K-Means is invariant to feature scaling.

B.

The Inertia value will be artificially low, but the cluster shapes will be correct.

C.

The Inertia calculation will be dominated by the feature with the largest scale (e.g., income), causing the algorithm to produce clusters that are primarily separated along that feature's axis, ignoring others.

D.

It will cause the K-Means algorithm to fail to converge.

59 $The Isolation Forest algorithm detects anomalies by assuming they are 'few and different'. How does its method of partitioning the data exploit this assumption, and why is this more efficient than distance-based methods for high-dimensional data?$

Anomaly Detection Hard

A.

It calculates the pairwise distance between all points and identifies anomalies as those with the largest average distance to their k-nearest neighbors.

B.

It models the data as a Gaussian mixture and flags points in low-probability regions as anomalies, which is efficient as it only requires a few components.

C.

It creates a one-class SVM to learn a boundary around the normal data, and anomalies are points outside this boundary. This is efficient because it avoids modeling the anomalies themselves.

D.

It builds an ensemble of decision trees where, at each node, a random feature and a random split point are chosen. Anomalies, being different, are expected to be isolated in a path with a shorter length from the root.

60 $A key component of the Davies-Bouldin Index (DBI) is the similarity measure between two clusters and, defined as, where is a measure of scatter within a cluster and is the distance between clusters. A lower DBI is better. Which of the following cluster configurations would result in the lowest (best) possible DBI value?$

Davies–Bouldin Index Hard

A.

Clusters that are very compact (small) and whose centroids are very far from each other (large).

B.

A single, massive cluster containing all the data points.

C.

Each data point is its own cluster.

D.

Clusters that are elongated and chain-like but are far from each other.

Unit 6 - Practice Quiz