Unit 1 - Practice Quiz

INT423 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary characteristic of Unsupervised Learning?

A. The algorithm uses a feedback loop for rewards
B. The algorithm trains on data without labels
C. The algorithm predicts a continuous numerical value
D. The algorithm trains on labeled data

2 Which of the following is a primary goal of clustering?

A. To reduce the noise in a signal
B. To group similar data points together
C. To classify images into predefined categories
D. To predict future values based on past trends

3 In the context of K-Means, what does 'K' represent?

A. The dimension of the features
B. The number of clusters
C. The number of iterations
D. The number of data points

4 What kind of problem is K-Means designed to solve?

A. Clustering
B. Classification
C. Reinforcement Learning
D. Regression

5 What is a 'centroid' in the K-Means algorithm?

A. An outlier in the dataset
B. The boundary line between clusters
C. The data point furthest from the center
D. The geometric center of a cluster

6 Which distance metric is most commonly used in standard K-Means?

A. Manhattan distance
B. Hamming distance
C. Euclidean distance
D. Cosine similarity

7 What is the first step of the K-Means algorithm?

A. Initialize cluster centroids
B. Assign points to the nearest cluster
C. Calculate the total error
D. Update the centroids

8 During the assignment step of K-Means, how is a data point assigned to a cluster?

A. Randomly
B. To the cluster with the highest variance
C. To the cluster with the closest centroid
D. To the cluster with the most points

9 What happens during the update step of the K-Means algorithm?

A. Centroids are moved to the mean of their assigned points
B. The number of clusters (K) is increased
C. Points are reassigned to different clusters
D. New data points are added

10 When does the K-Means algorithm stop iterating?

A. When the training error is zero
B. When the centroids do not change significantly
C. When K is equal to N
D. After exactly 10 iterations

11 What is the optimization objective (cost function) of K-Means?

A. Maximize Inter-cluster distance
B. Minimize Within-Cluster Sum of Squares (WCSS)
C. Maximize the Silhouette score
D. Minimize the number of clusters

12 The objective function of K-Means is non-convex. What does this imply?

A. It requires labeled data
B. It may get stuck in a local minimum
C. It cannot be optimized
D. It always finds the global minimum

13 Which of the following is a disadvantage of the K-Means algorithm?

A. It is computationally very expensive for small datasets
B. It works only on labeled data
C. It is sensitive to outliers
D. It cannot handle numerical data

14 If you set K equal to the number of data points (N), what will the WCSS be?

A. Zero
B. Maximum possible value
C. Infinity
D. Undefined

15 What is the 'Elbow Method' used for?

A. Handling outliers
B. Determining the optimal number of clusters (K)
C. Speeding up convergence
D. Initializing centroids

16 In the Elbow Method plot, what is typically on the Y-axis?

A. Inertia or WCSS
B. Time taken
C. Number of clusters (K)
D. Accuracy

17 What is the 'Random Initialization Trap' in K-Means?

A. Choosing K randomly leads to errors
B. Randomly picking centroids can lead to poor local optima
C. The algorithm fails if data is random
D. Random data points cannot be clustered

18 What is K-Means++?

A. A post-processing step for K-Means
B. A method to choose the optimal K
C. A version of K-Means for supervised learning
D. A smarter initialization technique for K-Means

19 How does K-Means++ select the first centroid?

A. It chooses the point furthest from the origin
B. It picks the point with the highest variance
C. It picks one data point uniformly at random
D. It calculates the global mean

20 What is the difference between Hard Clustering and Soft Clustering?

A. Hard clustering allows overlapping; Soft does not
B. Hard clustering assigns a point to one cluster; Soft assigns probabilities
C. Hard clustering is faster; Soft is slower
D. Hard clustering uses K-Means; Soft uses Decision Trees

21 Standard K-Means is an example of which type of clustering?

A. Soft Clustering
B. Density-based Clustering
C. Hard Clustering
D. Hierarchical Clustering

22 Which algorithm is a well-known example of Soft Clustering?

A. Fuzzy C-Means
B. Agglomerative Clustering
C. K-Means
D. DBSCAN

23 If a data point has a membership vector [0.7, 0.2, 0.1] for 3 clusters, this is an example of:

A. Soft Clustering
B. Outlier Detection
C. Hard Clustering
D. Regression

24 What shape of clusters does K-Means typically assume?

A. Elongated shapes
B. Spherical or convex
C. Arbitrary shapes
D. Spirals

25 Why is feature scaling (standardization/normalization) important in K-Means?

A. It is not important
B. To convert categorical data to numerical
C. To prevent features with larger ranges from dominating the distance metric
D. To ensure the algorithm runs faster only

26 What is the computational complexity of one iteration of K-Means?

A. O(K N d)
B. O(e^N)
C. O(N^2)
D. O(N * log N)

27 In the Elbow method, the 'elbow' point represents:

A. The point where adding another cluster does not significantly reduce WCSS
B. The point of maximum error
C. The point where K equals 1
D. The point where WCSS becomes zero

28 Which of the following implies that K-Means has converged?

A. The number of clusters decreases
B. WCSS increases
C. The data becomes labeled
D. The assignment of points to clusters remains unchanged

29 What is 'Inertia' in the context of Scikit-Learn's K-Means implementation?

A. The time taken to run
B. The sum of squared distances of samples to their closest cluster center
C. The number of iterations
D. The distance between cluster centers

30 Which strategy is used to mitigate the local optima problem in K-Means?

A. Use Manhattan distance
B. Decrease the learning rate
C. Increase the number of clusters
D. Run the algorithm multiple times with different initializations

31 Can K-Means handle categorical data directly?

A. Yes, using Hamming distance
B. Yes, it works natively
C. Only if the data is ordinal
D. No, it requires numerical data

32 In K-Means++, how is the probability of selecting the next centroid determined?

A. Proportional to the squared distance from the nearest existing centroid
B. Randomly with uniform distribution
C. Based on the density of the points
D. Inversely proportional to the distance from existing centroids

33 What is a 'Voronoi Diagram' in relation to K-Means?

A. A type of soft clustering
B. A method to initialize K
C. A visualization where regions are defined by the closest centroid
D. A plot of the cost function

34 If the clusters in the data are of very different densities and sizes, K-Means will:

A. Merge the clusters
B. Likely fail to identify the correct clusters
C. Perform perfectly
D. Automatically adjust the metric

35 Which step ensures K-Means is an unsupervised algorithm?

A. Minimizing WCSS
B. Iterating until convergence
C. Not using target labels for training
D. Calculating the mean

36 In the equation for WCSS, what is being squared?

A. The distance between two centroids
B. The number of iterations
C. The distance between a point and its assigned centroid
D. The number of clusters

37 Why is it often difficult to pick the optimal K using the Elbow method?

A. The plot is always a straight line
B. The 'elbow' might not be sharp or clear
C. It takes too long to compute
D. It requires labeled data

38 What is the primary role of the 'Coordinate Descent' concept in K-Means?

A. It is the method used to optimize the objective function
B. It calculates the distance
C. It is used to visualize data
D. It is used for initialization

39 If you perform K-Means on a dataset with 2 distinct well-separated blobs but set K=4, what happens?

A. The algorithm crashes
B. It splits the natural blobs into smaller clusters
C. It finds 2 clusters and ignores the other 2
D. It merges the blobs

40 In Soft Clustering, the sum of membership weights for a single data point across all clusters usually equals:

A. K
B. 1
C. 100
D. 0

41 Which of the following is NOT an application of K-Means?

A. Document Clustering
B. Customer Segmentation
C. Image Compression (Color Quantization)
D. Spam Classification (Supervised)

42 Does K-Means guarantee finding the global optimum for the WCSS?

A. No, it depends on initialization
B. Yes, if K is small
C. Yes, always
D. Only if using Manhattan distance

43 The computational cost of the distance calculation step for one point against K centroids is proportional to:

A. 1
B. N^2
C. N
D. K

44 Which component constitutes the 'model' after training K-Means?

A. The Elbow plot
B. The coordinates of the final centroids
C. The list of outliers
D. The original dataset

45 What is the relationship between Within-Cluster variance and Between-Cluster variance in a good clustering?

A. Low within-cluster, High between-cluster
B. Low within-cluster, Low between-cluster
C. High within-cluster, High between-cluster
D. High within-cluster, Low between-cluster

46 Lloyd's Algorithm is another name for:

A. K-Means Algorithm
B. KNN
C. Hierarchical Clustering
D. DBSCAN

47 In the context of image segmentation, what does a pixel represent in K-Means?

A. A centroid
B. A data point
C. A cluster
D. A label

48 Why might one choose a K value slightly different from the Elbow point?

A. To increase computational cost
B. Based on business requirements or downstream tasks
C. Because the Elbow method is always wrong
D. To maximize WCSS

49 If K=1, the centroid location will be:

A. The origin (0,0)
B. A random data point
C. Undefined
D. The mean of the entire dataset

50 What happens if a cluster becomes empty during K-Means iterations?

A. The empty cluster is usually re-initialized or removed
B. It is ignored and WCSS becomes 0
C. The algorithm stops
D. The K value increases