1Which of the following best describes the primary goal of unsupervised learning?
A.To predict a continuous target variable based on input features
B.To discover hidden patterns or structures in unlabeled data
C.To classify data points into predefined categories using labeled training data
D.To optimize a reward signal in an interactive environment
Correct Answer: To discover hidden patterns or structures in unlabeled data
Explanation:Unsupervised learning deals with unlabeled data, where the algorithm tries to find structure, clusters, or representations within the input without explicit guidance or target labels.
Incorrect! Try again.
2What is the fundamental difference between supervised and unsupervised learning regarding input data?
B.Supervised learning requires data with target labels (), whereas unsupervised learning uses only input features ().
C.Supervised learning is only for regression, while unsupervised learning is only for classification.
D.There is no difference; the terms refer to the algorithm's complexity.
Correct Answer: Supervised learning requires data with target labels (), whereas unsupervised learning uses only input features ().
Explanation:In supervised learning, the model learns a mapping from inputs to known outputs (labels). In unsupervised learning, the algorithm works only with input features to find intrinsic structures.
Incorrect! Try again.
3Given two points and , which formula represents the Euclidean distance?
A.
B.
C.
D.
Correct Answer:
Explanation:Euclidean distance is the straight-line distance between two points, derived from the Pythagorean theorem, calculated as the square root of the sum of squared differences.
Incorrect! Try again.
4Which distance metric is also known as the norm or 'Taxicab' geometry?
A.Euclidean Distance
B.Mahalanobis Distance
C.Manhattan Distance
D.Cosine Distance
Correct Answer: Manhattan Distance
Explanation:Manhattan distance (or City Block distance) calculates the distance between two points by summing the absolute differences of their coordinates, resembling a path along a grid.
Incorrect! Try again.
5In which scenario is Cosine Distance (or Similarity) generally preferred over Euclidean Distance?
A.When calculating physical distances between locations on a map
B.When analyzing high-dimensional sparse data like text documents where magnitude matters less than orientation
C.When features are continuous variables with low dimensionality
D.When the data follows a strict Gaussian distribution
Correct Answer: When analyzing high-dimensional sparse data like text documents where magnitude matters less than orientation
Explanation:Cosine distance measures the cosine of the angle between vectors. It is highly effective for text clustering (e.g., TF-IDF vectors) because it measures similarity in direction (content) rather than magnitude (length of document).
Incorrect! Try again.
6The K-Means clustering algorithm attempts to minimize which of the following objective functions?
A.The sum of absolute differences between points
B.The Within-Cluster Sum of Squares (WCSS) or Inertia
C.The distance between the farthest points in different clusters
D.The silhouette coefficient
Correct Answer: The Within-Cluster Sum of Squares (WCSS) or Inertia
Explanation:K-Means aims to minimize Inertia (WCSS), which is the sum of squared distances between each data point and its assigned cluster centroid: .
Incorrect! Try again.
7Which of the following is a key limitation of the K-Means algorithm?
A.It requires the number of clusters to be specified in advance.
B.It can only handle text data.
C.It is computationally more expensive than hierarchical clustering for small datasets.
D.It always finds the global optimum regardless of initialization.
Correct Answer: It requires the number of clusters to be specified in advance.
Explanation:A major drawback of K-Means is that the user must define the hyperparameter (number of clusters) before running the algorithm. Additionally, it is sensitive to initialization and assumes spherical clusters.
Incorrect! Try again.
8In the Elbow Method for choosing optimal , what feature of the plot indicates the best ?
A.The point where the curve reaches zero
B.The point where the curve peaks (maximum value)
C.The point of inflection where the rate of decrease in Inertia slows down significantly
D.The point where the curve becomes a straight vertical line
Correct Answer: The point of inflection where the rate of decrease in Inertia slows down significantly
Explanation:The 'elbow' corresponds to the value where adding more clusters no longer provides a significant reduction in WCSS (Inertia), suggesting a balance between compactness and computational cost.
Incorrect! Try again.
9What is the primary difference between K-Means and K-Medoids?
A.K-Means uses actual data points as centers; K-Medoids uses the mean.
B.K-Means uses the mean of points as the centroid; K-Medoids restricts the center to be an actual data point.
C.K-Means works better with outliers than K-Medoids.
D.K-Medoids uses Euclidean distance exclusively, while K-Means uses Manhattan.
Correct Answer: K-Means uses the mean of points as the centroid; K-Medoids restricts the center to be an actual data point.
Explanation:In K-Medoids (e.g., PAM algorithm), the center of a cluster is a representative object (medoid) from the dataset itself, whereas K-Means calculates the mathematical average (centroid), which may not be a real data point.
Incorrect! Try again.
10Why is K-Medoids generally considered more robust to outliers than K-Means?
A.Because it minimizes squared errors which penalize outliers heavily.
B.Because it uses medoids (actual points) and often Manhattan distance, reducing the influence of extreme values compared to means and squared errors.
C.Because it automatically removes outliers during processing.
D.Because it requires a density parameter.
Correct Answer: Because it uses medoids (actual points) and often Manhattan distance, reducing the influence of extreme values compared to means and squared errors.
Explanation:K-Means minimizes squared Euclidean distance, so distinct outliers can pull the centroid significantly. K-Medoids minimizes a sum of dissimilarities (often absolute differences) to an actual point, making it less sensitive to extreme outliers.
Incorrect! Try again.
11Which of the following describes Agglomerative Hierarchical Clustering?
A.A top-down approach where all points start in one cluster and are recursively split.
B.A bottom-up approach where each point starts as its own cluster and pairs are merged iteratively.
C.A density-based approach relying on epsilon neighborhoods.
D.A centroid-based approach requiring a fixed .
Correct Answer: A bottom-up approach where each point starts as its own cluster and pairs are merged iteratively.
Explanation:Agglomerative clustering initializes every data point as a singleton cluster and iteratively merges the two closest clusters until a single cluster (or a stopping criterion) remains.
Incorrect! Try again.
12In hierarchical clustering, what does the Single Linkage criterion measure to determine the distance between two clusters?
A.The distance between the centroids of the two clusters.
B.The maximum distance between any single point in one cluster and any point in the other.
C.The minimum distance between the closest pair of points, one from each cluster.
D.The average distance between all pairs of points in the two clusters.
Correct Answer: The minimum distance between the closest pair of points, one from each cluster.
Explanation:Single Linkage uses the shortest distance between a point in cluster A and a point in cluster B. This can lead to the 'chaining' phenomenon.
Incorrect! Try again.
13Which linkage criterion in hierarchical clustering tends to produce compact, spherical clusters by minimizing the increase in variance when merging?
A.Single Linkage
B.Complete Linkage
C.Average Linkage
D.Ward's Method
Correct Answer: Ward's Method
Explanation:Ward's method minimizes the total within-cluster variance. At each step, it merges the pair of clusters that leads to the minimum increase in total within-cluster sum of squares.
Incorrect! Try again.
14What is a Dendrogram?
A.A 3D scatter plot of clusters.
B.A tree-like diagram that records the sequences of merges or splits in hierarchical clustering.
C.A graph showing the Elbow method.
D.A density map for DBSCAN.
Correct Answer: A tree-like diagram that records the sequences of merges or splits in hierarchical clustering.
Explanation:A dendrogram visualizes the arrangement of the clusters produced by hierarchical clustering. The height of the branches typically represents the distance at which clusters are merged.
Incorrect! Try again.
15What are the two main hyperparameters required for the DBSCAN algorithm?
A.Number of clusters () and random seed
B.Epsilon () and Minimum Points ()
C.Learning rate and batch size
D.Number of iterations and tolerance
Correct Answer: Epsilon () and Minimum Points ()
Explanation:DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requires (the radius of the neighborhood) and (the minimum number of points required in that neighborhood to form a core point).
Incorrect! Try again.
16In DBSCAN, a point is classified as a Core Point if:
A.It is reachable from another point but has fewer than neighbors.
B.It has at least neighbors within radius .
C.It is the centroid of the cluster.
D.It is the farthest point from the center.
Correct Answer: It has at least neighbors within radius .
Explanation:A core point implies a dense region. By definition in DBSCAN, a point is a core point if its -neighborhood contains at least points (including itself).
Incorrect! Try again.
17One significant advantage of DBSCAN over K-Means is:
A.It is faster for high-dimensional data.
B.It can discover clusters of arbitrary shapes and identify outliers.
C.It works without any hyperparameters.
D.It always assigns every point to a cluster.
Correct Answer: It can discover clusters of arbitrary shapes and identify outliers.
Explanation:K-Means assumes convex/spherical clusters. DBSCAN relies on density, allowing it to find 'moons', 'rings', or irregular shapes, and it explicitly labels points in low-density regions as noise/outliers.
Incorrect! Try again.
18In the context of Anomaly Detection, what does an 'anomaly' or 'outlier' typically represent?
A.A data point that perfectly fits the trend.
B.A data point that deviates significantly from the majority of the data.
C.A missing value in the dataset.
D.The cluster center.
Correct Answer: A data point that deviates significantly from the majority of the data.
Explanation:Anomalies are observations that arouse suspicions by differing significantly from the majority of the data, often indicating rare events like fraud, defects, or errors.
Incorrect! Try again.
19Which of the following is a density-based assumption often used in Anomaly Detection?
A.Anomalies lie in high-density regions.
B.Normal data lies in low-density regions.
C.Anomalies lie in low-density regions compared to normal data.
D.Density is irrelevant for anomaly detection.
Correct Answer: Anomalies lie in low-density regions compared to normal data.
Explanation:Most density-based anomaly detection methods assume that normal data points occur in dense neighborhoods, whereas anomalies are isolated in sparse (low-density) regions.
Incorrect! Try again.
20The Silhouette Score ranges between:
A.0 and 1
B.-1 and 1
C.0 and
D.-100 and 100
Correct Answer: -1 and 1
Explanation:The Silhouette Score is bounded between -1 and 1. A score near +1 indicates good clustering, 0 indicates overlapping clusters, and negative values indicate points assigned to the wrong cluster.
Incorrect! Try again.
21If a data point has a Silhouette Coefficient close to -1, what does this indicate?
A.The point is well-clustered.
B.The point is on the decision boundary.
C.The point is likely assigned to the wrong cluster.
D.The point is the centroid.
Correct Answer: The point is likely assigned to the wrong cluster.
Explanation:A negative silhouette coefficient implies that the average distance to points in the neighboring cluster is smaller than the average distance to points in its own cluster, suggesting misclassification.
Incorrect! Try again.
22The Davies–Bouldin Index evaluates clustering algorithms based on:
A.Only the compactness of clusters.
B.Only the separation between clusters.
C.The ratio of within-cluster scatter to between-cluster separation.
D.The absolute number of clusters created.
Correct Answer: The ratio of within-cluster scatter to between-cluster separation.
Explanation:The Davies–Bouldin Index measures the average 'similarity' between clusters, where similarity is a ratio of within-cluster dispersion to between-cluster separation. Lower values indicate better clustering.
Incorrect! Try again.
23When comparing two clustering models using the Davies–Bouldin Index, which model is preferred?
A.The model with the higher index value.
B.The model with the lower index value.
C.The model with the value closest to 1.
D.The index cannot be used to compare models.
Correct Answer: The model with the lower index value.
Explanation:A lower Davies-Bouldin Index signifies that clusters are compact (low intra-cluster variance) and far apart (high inter-cluster distance), which is the desired outcome.
Incorrect! Try again.
24Calculate the Manhattan distance between points and .
A.3
B.6
C.
D.9
Correct Answer: 6
Explanation:Manhattan distance .
Incorrect! Try again.
25What is the Curse of Dimensionality regarding distance metrics in unsupervised learning?
B.As dimensions increase, all points tend to become equidistant, making distance metrics less meaningful.
C.High dimensions make K-Means converge instantly.
D.Dimensions effectively reduce to 2D automatically.
Correct Answer: As dimensions increase, all points tend to become equidistant, making distance metrics less meaningful.
Explanation:In high-dimensional spaces, the volume of the space increases so fast that the available data becomes sparse. The ratio of the distance to the nearest neighbor vs. the farthest neighbor approaches 1, making distance-based clustering difficult.
Incorrect! Try again.
26Which clustering algorithm would be most appropriate for a dataset containing two concentric circles (a ring inside another ring)?
A.K-Means
B.K-Medoids
C.DBSCAN
D.Gaussian Mixture Models (standard)
Correct Answer: DBSCAN
Explanation:Concentric circles are non-convex shapes. K-Means and GMMs typically look for convex/spherical shapes and would fail to separate the rings correctly. DBSCAN, being density-based, can trace the shape of the rings.
Incorrect! Try again.
27In Hierarchical Clustering, the Complete Linkage criterion is susceptible to:
A.Chaining effect
B.Sensitivity to outliers
C.Producing very large clusters
D.Ignoring the number of points
Correct Answer: Sensitivity to outliers
Explanation:Complete linkage uses the maximum distance between points in two clusters. If an outlier is far away, it can artificially inflate the distance between clusters, delaying their merger.
Incorrect! Try again.
28What is the definition of Inertia in the context of clustering evaluation?
A.Sum of squared distances of samples to their closest cluster center.
B.Sum of distances between cluster centers.
C.Ratio of intra-cluster distance to inter-cluster distance.
D.The density of the densest cluster.
Correct Answer: Sum of squared distances of samples to their closest cluster center.
Explanation:Inertia measures how coherent the clusters are. It is the sum of squared errors (SSE) from each point to its assigned centroid.
Incorrect! Try again.
29Why is Inertia not a normalized metric?
A.It ranges from 0 to 1.
B.It decreases as the number of clusters increases, potentially reaching zero if .
C.It is unaffected by the scale of the data.
D.It increases as the model gets better.
Correct Answer: It decreases as the number of clusters increases, potentially reaching zero if .
Explanation:Inertia is not normalized; lower is better, but it naturally decreases as you add more clusters. If every point is its own cluster, inertia is 0. Thus, manual inspection (e.g., Elbow method) is needed.
Incorrect! Try again.
30Which of the following requires Feature Scaling (Normalization/Standardization) the most?
A.Decision Trees
B.Random Forests
C.K-Means Clustering
D.Naive Bayes
Correct Answer: K-Means Clustering
Explanation:K-Means relies heavily on Euclidean distance. If one feature has a range of 0-1000 and another 0-1, the distance will be dominated by the first feature. Scaling ensures all features contribute equally.
Incorrect! Try again.
31In the K-Means algorithm, the 'assignment' step involves:
A.Moving the centroids to the average of the points.
B.Assigning each point to the nearest centroid.
C.Randomly picking points.
D.Calculating the silhouette score.
Correct Answer: Assigning each point to the nearest centroid.
Explanation:The K-Means loop consists of two steps: 1. Assignment (points are assigned to the closest centroid) and 2. Update (centroids are recalculated as the mean of assigned points).
Incorrect! Try again.
32What is a 'Noise point' in DBSCAN?
A.A point with more than neighbors.
B.A point that is reachable from a core point but has few neighbors itself.
C.A point that is neither a core point nor reachable from a core point.
D.The initial starting point of the algorithm.
Correct Answer: A point that is neither a core point nor reachable from a core point.
Explanation:Noise points (outliers) are points that do not satisfy the core point condition and are not within the -neighborhood of any core point.
Incorrect! Try again.
33The Minkowski Distance is a generalization of which distances?
A.Only Euclidean
B.Only Manhattan
C.Euclidean () and Manhattan ()
D.Cosine and Jaccard
Correct Answer: Euclidean () and Manhattan ()
Explanation:The Minkowski distance of order is . When , it is Manhattan distance. When , it is Euclidean distance.
Incorrect! Try again.
34Which of these is a divisive hierarchical clustering method?
A.DIANA (Divisive Analysis)
B.AGNES (Agglomerative Nesting)
C.K-Means
D.DBSCAN
Correct Answer: DIANA (Divisive Analysis)
Explanation:DIANA (Divisive Analysis) works top-down. It starts with one giant cluster containing all data and recursively splits the most heterogeneous cluster until singleton clusters remain.
Incorrect! Try again.
35For a silhouette score , what does represent?
A.The distance to the nearest cluster centroid.
B.The mean distance between point and all other points in the same cluster.
C.The mean distance between point and all points in the nearest neighboring cluster.
D.The total variance of the dataset.
Correct Answer: The mean distance between point and all other points in the same cluster.
Explanation: is the mean intra-cluster distance (cohesion). is the mean nearest-cluster distance (separation). The score compares these two.
Incorrect! Try again.
36What is the main advantage of K-Means++ over standard random initialization in K-Means?
A.It avoids the need to select .
B.It spreads out the initial centroids to speed up convergence and avoid poor local optima.
C.It allows the algorithm to find non-convex clusters.
D.It uses Manhattan distance instead of Euclidean.
Correct Answer: It spreads out the initial centroids to speed up convergence and avoid poor local optima.
Explanation:K-Means++ initializes centroids probabilistically proportional to the distance squared from existing centroids, ensuring initial centers are well-separated, which improves results and convergence speed.
Incorrect! Try again.
37If the Dunn Index is used for evaluation, a higher value indicates:
A.Better clustering (compact and well-separated).
B.Worse clustering (overlapping and dispersed).
C.More clusters.
D.Fewer clusters.
Correct Answer: Better clustering (compact and well-separated).
Explanation:The Dunn Index is the ratio of the minimum inter-cluster distance to the maximum intra-cluster diameter. We want the numerator high (separation) and denominator low (compactness), so a higher index is better.
Incorrect! Try again.
38Which algorithm is an example of Soft Clustering (where points belong to clusters with probabilities)?
A.K-Means
B.DBSCAN
C.Gaussian Mixture Models (GMM)
D.Single Linkage Hierarchical Clustering
Correct Answer: Gaussian Mixture Models (GMM)
Explanation:While K-Means performs hard assignment (point belongs to Cluster A or B), GMM assigns a probability of belonging to each cluster (e.g., 70% Cluster A, 30% Cluster B).
Incorrect! Try again.
39In hierarchical clustering, if we want to stop merging when clusters are too far apart, we can:
A.Increase the learning rate.
B.Cut the dendrogram at a specific height (distance threshold).
C.Set .
D.Use the Elbow method.
Correct Answer: Cut the dendrogram at a specific height (distance threshold).
Explanation:A horizontal cut across the dendrogram at a specific vertical height determines the number of clusters. Clusters linked above this height are not merged.
Incorrect! Try again.
40Which of the following is an application of Anomaly Detection?
A.Customer segmentation for marketing.
B.Credit card fraud detection.
C.Document classification.
D.Predicting house prices.
Correct Answer: Credit card fraud detection.
Explanation:Fraud detection relies on identifying transactions that deviate significantly from typical user behavior patterns, which is the core definition of anomaly detection.
Incorrect! Try again.
41Calculate the squared Euclidean distance between and .
A.1
B.
C.3
D.9
Correct Answer: 3
Explanation:Squared Euclidean distance .
Incorrect! Try again.
42How does the Average Linkage criterion calculate distance between Cluster A and Cluster B?
A.Distance between centroids.
B.Average of all pairwise distances between points in A and points in B.
C.Average of the minimum and maximum distances.
D.Distance between the two most central points.
Correct Answer: Average of all pairwise distances between points in A and points in B.
Explanation:Average linkage considers all pairs of points (one from each cluster) and computes the average of these distances. It is a compromise between Single and Complete linkage.
Incorrect! Try again.
43Which of the following distance metrics satisfies the Triangle Inequality?
A.Euclidean Distance
B.Squared Euclidean Distance
C.Kullback-Leibler Divergence
D.Cosine Similarity
Correct Answer: Euclidean Distance
Explanation:Euclidean distance is a true metric satisfying non-negativity, identity, symmetry, and triangle inequality (). Squared Euclidean and Cosine Similarity do not satisfy triangle inequality.
Incorrect! Try again.
44In the context of K-Means, if equals the number of data points , what is the value of Inertia?
A.Infinity
B.1
C.0
D.N
Correct Answer: 0
Explanation:If every point is its own cluster centroid, the distance from every point to its centroid is 0. Therefore, the sum of squared distances (Inertia) is 0.
Incorrect! Try again.
45Which of the following suggests that a dataset has no cluster tendency (i.e., data is uniformly distributed)?
A.A Hopkins statistic close to 0.5.
B.A Hopkins statistic close to 0 or 1 (depending on definition, usually towards 1 implies clustering).
C.A high Silhouette score.
D.A distinct Elbow in the Inertia plot.
Correct Answer: A Hopkins statistic close to 0.5.
Explanation:The Hopkins statistic measures cluster tendency. Values near 0 or 1 (depending on implementation specifics, often near 1 means highly clustered) indicate structure. A value of 0.5 indicates the data is uniformly distributed (random).
Incorrect! Try again.
46In DBSCAN, what is a Border Point?
A.A point with fewer than neighbors but lies within the radius of a Core Point.
B.A point with more than neighbors.
C.A noise point.
D.The geometric center of the cluster.
Correct Answer: A point with fewer than neighbors but lies within the radius of a Core Point.
Explanation:A border point is not dense enough to be a core point itself () but falls within the -neighborhood of a core point, so it is included in the cluster.
Incorrect! Try again.
47Which distance metric is scale-invariant and accounts for the correlations between variables?
A.Euclidean Distance
B.Mahalanobis Distance
C.Manhattan Distance
D.Chebyshev Distance
Correct Answer: Mahalanobis Distance
Explanation:Mahalanobis distance measures distance relative to the centroid and the covariance matrix of the data. It effectively transforms the data to standardized uncorrelated variables before calculating Euclidean distance.
Incorrect! Try again.
48When interpreting a Silhouette plot, if most bars are short and some have negative scores, the clustering is:
A.Excellent
B.Appropriate
C.Poor/Suboptimal
D.Overfitted
Correct Answer: Poor/Suboptimal
Explanation:Short bars indicate low cohesion/separation, and negative scores indicate misclassification. This suggests the configuration (number of clusters or algorithm) is poor.
Incorrect! Try again.
49Which unsupervised learning technique is best suited for reducing the number of variables while retaining variance, often used before clustering?
A.Principal Component Analysis (PCA)
B.Linear Regression
C.Random Forest
D.DBSCAN
Correct Answer: Principal Component Analysis (PCA)
Explanation:PCA is a dimensionality reduction technique. It is often used before clustering to reduce noise and the curse of dimensionality by projecting data onto principal components.
Incorrect! Try again.
50If you are using K-Means to cluster customers based on 'Annual Income' (range 10k-100k) and 'Age' (range 20-70), what happens if you do not normalize?
A.Age will dominate the clustering.
B.Annual Income will dominate the clustering.
C.Both will contribute equally.
D.The algorithm will fail to run.
Correct Answer: Annual Income will dominate the clustering.
Explanation:Since Euclidean distance sums squared differences, the feature with the larger magnitude (Income ~100,000) will yield much larger distances than Age (~70), causing the algorithm to cluster primarily based on Income.