1What is the primary characteristic of Unsupervised Learning?
A.The algorithm trains on labeled data
B.The algorithm trains on data without labels
C.The algorithm uses a feedback loop for rewards
D.The algorithm predicts a continuous numerical value
Correct Answer: The algorithm trains on data without labels
Explanation:Unsupervised learning deals with input data that does not have corresponding output labels, aiming to find hidden structures.
Incorrect! Try again.
2Which of the following is a primary goal of clustering?
A.To predict future values based on past trends
B.To group similar data points together
C.To classify images into predefined categories
D.To reduce the noise in a signal
Correct Answer: To group similar data points together
Explanation:Clustering aims to partition data such that points within a group are similar to each other and different from points in other groups.
Incorrect! Try again.
3In the context of K-Means, what does 'K' represent?
A.The number of data points
B.The number of iterations
C.The number of clusters
D.The dimension of the features
Correct Answer: The number of clusters
Explanation:K represents the pre-defined number of clusters the algorithm attempts to identify in the dataset.
Incorrect! Try again.
4What kind of problem is K-Means designed to solve?
A.Regression
B.Classification
C.Clustering
D.Reinforcement Learning
Correct Answer: Clustering
Explanation:K-Means is a popular unsupervised algorithm used for clustering analysis.
Incorrect! Try again.
5What is a 'centroid' in the K-Means algorithm?
A.An outlier in the dataset
B.The geometric center of a cluster
C.The boundary line between clusters
D.The data point furthest from the center
Correct Answer: The geometric center of a cluster
Explanation:A centroid represents the mean position of all the data points belonging to a specific cluster.
Incorrect! Try again.
6Which distance metric is most commonly used in standard K-Means?
A.Manhattan distance
B.Euclidean distance
C.Cosine similarity
D.Hamming distance
Correct Answer: Euclidean distance
Explanation:Standard K-Means typically minimizes the within-cluster sum of squared Euclidean distances.
Incorrect! Try again.
7What is the first step of the K-Means algorithm?
A.Assign points to the nearest cluster
B.Update the centroids
C.Initialize cluster centroids
D.Calculate the total error
Correct Answer: Initialize cluster centroids
Explanation:The algorithm begins by selecting K initial centroids, either randomly or using specific heuristics.
Incorrect! Try again.
8During the assignment step of K-Means, how is a data point assigned to a cluster?
A.To the cluster with the most points
B.To the cluster with the closest centroid
C.Randomly
D.To the cluster with the highest variance
Correct Answer: To the cluster with the closest centroid
Explanation:Each data point is assigned to the cluster whose centroid is nearest to it based on the distance metric.
Incorrect! Try again.
9What happens during the update step of the K-Means algorithm?
A.New data points are added
B.Centroids are moved to the mean of their assigned points
C.The number of clusters (K) is increased
D.Points are reassigned to different clusters
Correct Answer: Centroids are moved to the mean of their assigned points
Explanation:After points are assigned, the new position of the centroid is calculated as the average (mean) of all points currently in that cluster.
Incorrect! Try again.
10When does the K-Means algorithm stop iterating?
A.When the centroids do not change significantly
B.After exactly 10 iterations
C.When the training error is zero
D.When K is equal to N
Correct Answer: When the centroids do not change significantly
Explanation:Convergence is reached when centroids stabilize (don't move) or point assignments stop changing.
Incorrect! Try again.
11What is the optimization objective (cost function) of K-Means?
A.Maximize Inter-cluster distance
B.Minimize Within-Cluster Sum of Squares (WCSS)
C.Maximize the Silhouette score
D.Minimize the number of clusters
Correct Answer: Minimize Within-Cluster Sum of Squares (WCSS)
Explanation:K-Means tries to minimize the sum of squared distances between data points and their respective cluster centroids (Inertia).
Incorrect! Try again.
12The objective function of K-Means is non-convex. What does this imply?
A.It always finds the global minimum
B.It may get stuck in a local minimum
C.It cannot be optimized
D.It requires labeled data
Correct Answer: It may get stuck in a local minimum
Explanation:Because the function is non-convex, the final result depends on the initialization, and it is not guaranteed to find the absolute best clustering.
Incorrect! Try again.
13Which of the following is a disadvantage of the K-Means algorithm?
A.It is computationally very expensive for small datasets
B.It is sensitive to outliers
C.It works only on labeled data
D.It cannot handle numerical data
Correct Answer: It is sensitive to outliers
Explanation:Outliers can significantly shift the mean (centroid), affecting the assignment of other points and distorting the clusters.
Incorrect! Try again.
14If you set K equal to the number of data points (N), what will the WCSS be?
A.Infinity
B.Zero
C.Maximum possible value
D.Undefined
Correct Answer: Zero
Explanation:If every point is its own cluster, the distance between the point and its centroid is zero, resulting in a total WCSS of zero.
Incorrect! Try again.
15What is the 'Elbow Method' used for?
A.Initializing centroids
B.Speeding up convergence
C.Determining the optimal number of clusters (K)
D.Handling outliers
Correct Answer: Determining the optimal number of clusters (K)
Explanation:The Elbow Method plots WCSS against K to find the point where adding more clusters yields diminishing returns.
Incorrect! Try again.
16In the Elbow Method plot, what is typically on the Y-axis?
A.Number of clusters (K)
B.Accuracy
C.Inertia or WCSS
D.Time taken
Correct Answer: Inertia or WCSS
Explanation:The Y-axis represents the cost (Within-Cluster Sum of Squares), while the X-axis represents the number of clusters.
Incorrect! Try again.
17What is the 'Random Initialization Trap' in K-Means?
A.Choosing K randomly leads to errors
B.Randomly picking centroids can lead to poor local optima
C.Random data points cannot be clustered
D.The algorithm fails if data is random
Correct Answer: Randomly picking centroids can lead to poor local optima
Explanation:Poor random choices for initial centroids can result in sub-optimal clustering or slower convergence.
Incorrect! Try again.
18What is K-Means++?
A.A version of K-Means for supervised learning
B.A method to choose the optimal K
C.A smarter initialization technique for K-Means
D.A post-processing step for K-Means
Correct Answer: A smarter initialization technique for K-Means
Explanation:K-Means++ initializes centroids to be far apart from each other, improving convergence speed and result quality.
Incorrect! Try again.
19How does K-Means++ select the first centroid?
A.It calculates the global mean
B.It chooses the point furthest from the origin
C.It picks one data point uniformly at random
D.It picks the point with the highest variance
Correct Answer: It picks one data point uniformly at random
Explanation:The first centroid is chosen randomly; subsequent centroids are chosen based on probability proportional to distance squared.
Incorrect! Try again.
20What is the difference between Hard Clustering and Soft Clustering?
A.Hard clustering is faster; Soft is slower
B.Hard clustering allows overlapping; Soft does not
C.Hard clustering assigns a point to one cluster; Soft assigns probabilities
D.Hard clustering uses K-Means; Soft uses Decision Trees
Correct Answer: Hard clustering assigns a point to one cluster; Soft assigns probabilities
Explanation:In hard clustering, a point belongs to exactly one cluster. In soft clustering, a point has a degree of membership to all clusters.
Incorrect! Try again.
21Standard K-Means is an example of which type of clustering?
A.Soft Clustering
B.Hard Clustering
C.Hierarchical Clustering
D.Density-based Clustering
Correct Answer: Hard Clustering
Explanation:Standard K-Means assigns each point to the specific cluster with the nearest centroid, implying binary membership.
Incorrect! Try again.
22Which algorithm is a well-known example of Soft Clustering?
A.K-Means
B.Fuzzy C-Means
C.DBSCAN
D.Agglomerative Clustering
Correct Answer: Fuzzy C-Means
Explanation:Fuzzy C-Means allows data points to belong to multiple clusters with varying degrees of membership.
Incorrect! Try again.
23If a data point has a membership vector [0.7, 0.2, 0.1] for 3 clusters, this is an example of:
A.Hard Clustering
B.Soft Clustering
C.Regression
D.Outlier Detection
Correct Answer: Soft Clustering
Explanation:The vector indicates probabilities or weights of belonging to different clusters, characteristic of soft clustering.
Incorrect! Try again.
24What shape of clusters does K-Means typically assume?
A.Arbitrary shapes
B.Spherical or convex
C.Elongated shapes
D.Spirals
Correct Answer: Spherical or convex
Explanation:Because it relies on Euclidean distance and means, K-Means works best on spherical, convex clusters.
Incorrect! Try again.
25Why is feature scaling (standardization/normalization) important in K-Means?
A.It is not important
B.To ensure the algorithm runs faster only
C.To prevent features with larger ranges from dominating the distance metric
D.To convert categorical data to numerical
Correct Answer: To prevent features with larger ranges from dominating the distance metric
Explanation:Since K-Means uses distance, a feature with a range of 0-1000 will overpower a feature with a range of 0-1 if not scaled.
Incorrect! Try again.
26What is the computational complexity of one iteration of K-Means?
A.O(N^2)
B.O(K N d)
C.O(e^N)
D.O(N * log N)
Correct Answer: O(K N d)
Explanation:Where K is clusters, N is data points, and d is dimensions. It is linear with respect to N.
Incorrect! Try again.
27In the Elbow method, the 'elbow' point represents:
A.The point of maximum error
B.The point where adding another cluster does not significantly reduce WCSS
C.The point where WCSS becomes zero
D.The point where K equals 1
Correct Answer: The point where adding another cluster does not significantly reduce WCSS
Explanation:It indicates the optimal trade-off between minimizing error and minimizing model complexity (number of clusters).
Incorrect! Try again.
28Which of the following implies that K-Means has converged?
A.The assignment of points to clusters remains unchanged
B.WCSS increases
C.The number of clusters decreases
D.The data becomes labeled
Correct Answer: The assignment of points to clusters remains unchanged
Explanation:If point assignments don't change, centroids won't change, and the algorithm has reached a stable state.
Incorrect! Try again.
29What is 'Inertia' in the context of Scikit-Learn's K-Means implementation?
A.The time taken to run
B.The sum of squared distances of samples to their closest cluster center
C.The distance between cluster centers
D.The number of iterations
Correct Answer: The sum of squared distances of samples to their closest cluster center
Explanation:Inertia is the specific term used in Scikit-Learn for WCSS (Within-Cluster Sum of Squares).
Incorrect! Try again.
30Which strategy is used to mitigate the local optima problem in K-Means?
A.Decrease the learning rate
B.Run the algorithm multiple times with different initializations
C.Increase the number of clusters
D.Use Manhattan distance
Correct Answer: Run the algorithm multiple times with different initializations
Explanation:Running the algorithm multiple times (n_init) and choosing the result with the lowest WCSS helps avoid local optima.
Incorrect! Try again.
31Can K-Means handle categorical data directly?
A.Yes, it works natively
B.No, it requires numerical data
C.Only if the data is ordinal
D.Yes, using Hamming distance
Correct Answer: No, it requires numerical data
Explanation:Standard K-Means relies on means and Euclidean distance, which are undefined for categorical data (though K-Modes exists for that).
Incorrect! Try again.
32In K-Means++, how is the probability of selecting the next centroid determined?
A.Inversely proportional to the distance from existing centroids
B.Proportional to the squared distance from the nearest existing centroid
C.Randomly with uniform distribution
D.Based on the density of the points
Correct Answer: Proportional to the squared distance from the nearest existing centroid
Explanation:This ensures that new centroids are likely to be far away from existing ones, spreading them out.
Incorrect! Try again.
33What is a 'Voronoi Diagram' in relation to K-Means?
A.A plot of the cost function
B.A visualization where regions are defined by the closest centroid
C.A method to initialize K
D.A type of soft clustering
Correct Answer: A visualization where regions are defined by the closest centroid
Explanation:The partitions created by K-Means can be visualized as Voronoi cells, separating the space based on distance to centroids.
Incorrect! Try again.
34If the clusters in the data are of very different densities and sizes, K-Means will:
A.Perform perfectly
B.Likely fail to identify the correct clusters
C.Automatically adjust the metric
D.Merge the clusters
Correct Answer: Likely fail to identify the correct clusters
Explanation:K-Means assumes clusters are roughly spherical and of similar size/density; it struggles with varying densities.
Incorrect! Try again.
35Which step ensures K-Means is an unsupervised algorithm?
A.Calculating the mean
B.Not using target labels for training
C.Iterating until convergence
D.Minimizing WCSS
Correct Answer: Not using target labels for training
Explanation:The defining feature is that it structures the data based on intrinsic properties rather than external labels.
Incorrect! Try again.
36In the equation for WCSS, what is being squared?
A.The number of clusters
B.The distance between a point and its assigned centroid
C.The distance between two centroids
D.The number of iterations
Correct Answer: The distance between a point and its assigned centroid
Explanation:WCSS sums the squared Euclidean distances between points and their cluster centers.
Incorrect! Try again.
37Why is it often difficult to pick the optimal K using the Elbow method?
A.The plot is always a straight line
B.The 'elbow' might not be sharp or clear
C.It requires labeled data
D.It takes too long to compute
Correct Answer: The 'elbow' might not be sharp or clear
Explanation:Sometimes the curve is smooth, making the choice of the 'elbow' point subjective.
Incorrect! Try again.
38What is the primary role of the 'Coordinate Descent' concept in K-Means?
A.It is the method used to optimize the objective function
B.It is used for initialization
C.It is used to visualize data
D.It calculates the distance
Correct Answer: It is the method used to optimize the objective function
Explanation:K-Means optimizes the cost function by alternating between two steps (assignment and update), effectively performing coordinate descent.
Incorrect! Try again.
39If you perform K-Means on a dataset with 2 distinct well-separated blobs but set K=4, what happens?
A.The algorithm crashes
B.It finds 2 clusters and ignores the other 2
C.It splits the natural blobs into smaller clusters
D.It merges the blobs
Correct Answer: It splits the natural blobs into smaller clusters
Explanation:The algorithm is forced to find 4 clusters, so it will partition the natural blobs to satisfy the requirement.
Incorrect! Try again.
40In Soft Clustering, the sum of membership weights for a single data point across all clusters usually equals:
A.
B.1
C.100
D.K
Correct Answer: 1
Explanation:The weights represent probabilities or proportions, so they must sum to 1 for a given data point.
Incorrect! Try again.
41Which of the following is NOT an application of K-Means?
A.Customer Segmentation
B.Image Compression (Color Quantization)
C.Spam Classification (Supervised)
D.Document Clustering
Correct Answer: Spam Classification (Supervised)
Explanation:Spam classification is typically a supervised learning task (e.g., Naive Bayes, SVM), not clustering.
Incorrect! Try again.
42Does K-Means guarantee finding the global optimum for the WCSS?
A.Yes, always
B.No, it depends on initialization
C.Yes, if K is small
D.Only if using Manhattan distance
Correct Answer: No, it depends on initialization
Explanation:K-Means converges to a local optimum, which is why multiple initializations are often used.
Incorrect! Try again.
43The computational cost of the distance calculation step for one point against K centroids is proportional to:
A.K
B.N
C.N^2
D.1
Correct Answer: K
Explanation:For one point, you must calculate the distance to each of the K centroids.
Incorrect! Try again.
44Which component constitutes the 'model' after training K-Means?
A.The original dataset
B.The coordinates of the final centroids
C.The list of outliers
D.The Elbow plot
Correct Answer: The coordinates of the final centroids
Explanation:The centroids define the clusters; new data can be assigned to clusters based on these centroid locations.
Incorrect! Try again.
45What is the relationship between Within-Cluster variance and Between-Cluster variance in a good clustering?
A.High within-cluster, Low between-cluster
B.Low within-cluster, High between-cluster
C.High within-cluster, High between-cluster
D.Low within-cluster, Low between-cluster
Correct Answer: Low within-cluster, High between-cluster
Explanation:Good clusters have points tight together (low internal variance) and far apart from other clusters (high external variance).
Incorrect! Try again.
46Lloyd's Algorithm is another name for:
A.K-Means Algorithm
B.Hierarchical Clustering
C.DBSCAN
D.KNN
Correct Answer: K-Means Algorithm
Explanation:Standard K-Means is frequently referred to as Lloyd's algorithm.
Incorrect! Try again.
47In the context of image segmentation, what does a pixel represent in K-Means?
A.A cluster
B.A centroid
C.A data point
D.A label
Correct Answer: A data point
Explanation:Each pixel (often represented by RGB values) is treated as a data point to be clustered based on color similarity.
Incorrect! Try again.
48Why might one choose a K value slightly different from the Elbow point?
A.To increase computational cost
B.Based on business requirements or downstream tasks
C.Because the Elbow method is always wrong
D.To maximize WCSS
Correct Answer: Based on business requirements or downstream tasks
Explanation:Domain knowledge (e.g., needing exactly 3 t-shirt sizes: S, M, L) often overrides the purely mathematical suggestion of the Elbow method.
Incorrect! Try again.
49If K=1, the centroid location will be:
A.The origin (0,0)
B.The mean of the entire dataset
C.A random data point
D.Undefined
Correct Answer: The mean of the entire dataset
Explanation:With one cluster, the centroid minimizes distance to all points, which is the global arithmetic mean.
Incorrect! Try again.
50What happens if a cluster becomes empty during K-Means iterations?
A.The algorithm stops
B.The empty cluster is usually re-initialized or removed
C.The K value increases
D.It is ignored and WCSS becomes 0
Correct Answer: The empty cluster is usually re-initialized or removed
Explanation:Implementations typically handle this by resetting the centroid to a random point or the point furthest from its current centroid.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.