Unit 1 - Practice Quiz

INT423 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the primary characteristic of Unsupervised Learning?

A. The algorithm trains on data without labels
B. The algorithm trains on labeled data
C. The algorithm predicts a continuous numerical value
D. The algorithm uses a feedback loop for rewards

2 Which of the following is a primary goal of clustering?

A. To group similar data points together
B. To reduce the noise in a signal
C. To predict future values based on past trends
D. To classify images into predefined categories

3 In the context of K-Means, what does 'K' represent?

A. The number of clusters
B. The number of data points
C. The number of iterations
D. The dimension of the features

4 What kind of problem is K-Means designed to solve?

A. Classification
B. Clustering
C. Reinforcement Learning
D. Regression

5 What is a 'centroid' in the K-Means algorithm?

A. An outlier in the dataset
B. The geometric center of a cluster
C. The data point furthest from the center
D. The boundary line between clusters

6 Which distance metric is most commonly used in standard K-Means?

A. Hamming distance
B. Cosine similarity
C. Manhattan distance
D. Euclidean distance

7 What is the first step of the K-Means algorithm?

A. Assign points to the nearest cluster
B. Initialize cluster centroids
C. Calculate the total error
D. Update the centroids

8 During the assignment step of K-Means, how is a data point assigned to a cluster?

A. To the cluster with the closest centroid
B. To the cluster with the most points
C. Randomly
D. To the cluster with the highest variance

9 What happens during the update step of the K-Means algorithm?

A. Centroids are moved to the mean of their assigned points
B. The number of clusters (K) is increased
C. Points are reassigned to different clusters
D. New data points are added

10 When does the K-Means algorithm stop iterating?

A. When K is equal to N
B. When the centroids do not change significantly
C. When the training error is zero
D. After exactly 10 iterations

11 What is the optimization objective (cost function) of K-Means?

A. Minimize the number of clusters
B. Minimize Within-Cluster Sum of Squares (WCSS)
C. Maximize Inter-cluster distance
D. Maximize the Silhouette score

12 The objective function of K-Means is non-convex. What does this imply?

A. It cannot be optimized
B. It always finds the global minimum
C. It requires labeled data
D. It may get stuck in a local minimum

13 Which of the following is a disadvantage of the K-Means algorithm?

A. It is sensitive to outliers
B. It works only on labeled data
C. It is computationally very expensive for small datasets
D. It cannot handle numerical data

14 If you set K equal to the number of data points (N), what will the WCSS be?

A. Maximum possible value
B. Zero
C. Infinity
D. Undefined

15 What is the 'Elbow Method' used for?

A. Speeding up convergence
B. Determining the optimal number of clusters (K)
C. Handling outliers
D. Initializing centroids

16 In the Elbow Method plot, what is typically on the Y-axis?

A. Number of clusters (K)
B. Inertia or WCSS
C. Time taken
D. Accuracy

17 What is the 'Random Initialization Trap' in K-Means?

A. The algorithm fails if data is random
B. Random data points cannot be clustered
C. Randomly picking centroids can lead to poor local optima
D. Choosing K randomly leads to errors

18 What is K-Means++?

A. A method to choose the optimal K
B. A version of K-Means for supervised learning
C. A post-processing step for K-Means
D. A smarter initialization technique for K-Means

19 How does K-Means++ select the first centroid?

A. It picks the point with the highest variance
B. It calculates the global mean
C. It picks one data point uniformly at random
D. It chooses the point furthest from the origin

20 What is the difference between Hard Clustering and Soft Clustering?

A. Hard clustering allows overlapping; Soft does not
B. Hard clustering is faster; Soft is slower
C. Hard clustering assigns a point to one cluster; Soft assigns probabilities
D. Hard clustering uses K-Means; Soft uses Decision Trees

21 Standard K-Means is an example of which type of clustering?

A. Hard Clustering
B. Soft Clustering
C. Density-based Clustering
D. Hierarchical Clustering

22 Which algorithm is a well-known example of Soft Clustering?

A. DBSCAN
B. Fuzzy C-Means
C. K-Means
D. Agglomerative Clustering

23 If a data point has a membership vector [0.7, 0.2, 0.1] for 3 clusters, this is an example of:

A. Outlier Detection
B. Regression
C. Hard Clustering
D. Soft Clustering

24 What shape of clusters does K-Means typically assume?

A. Spirals
B. Elongated shapes
C. Spherical or convex
D. Arbitrary shapes

25 Why is feature scaling (standardization/normalization) important in K-Means?

A. To convert categorical data to numerical
B. To prevent features with larger ranges from dominating the distance metric
C. It is not important
D. To ensure the algorithm runs faster only

26 What is the computational complexity of one iteration of K-Means?

A. O(e^N)
B. O(N^2)
C. O(K N d)
D. O(N * log N)

27 In the Elbow method, the 'elbow' point represents:

A. The point where adding another cluster does not significantly reduce WCSS
B. The point where WCSS becomes zero
C. The point of maximum error
D. The point where K equals 1

28 Which of the following implies that K-Means has converged?

A. The assignment of points to clusters remains unchanged
B. The number of clusters decreases
C. WCSS increases
D. The data becomes labeled

29 What is 'Inertia' in the context of Scikit-Learn's K-Means implementation?

A. The sum of squared distances of samples to their closest cluster center
B. The distance between cluster centers
C. The time taken to run
D. The number of iterations

30 Which strategy is used to mitigate the local optima problem in K-Means?

A. Use Manhattan distance
B. Increase the number of clusters
C. Decrease the learning rate
D. Run the algorithm multiple times with different initializations

31 Can K-Means handle categorical data directly?

A. Only if the data is ordinal
B. No, it requires numerical data
C. Yes, it works natively
D. Yes, using Hamming distance

32 In K-Means++, how is the probability of selecting the next centroid determined?

A. Based on the density of the points
B. Inversely proportional to the distance from existing centroids
C. Randomly with uniform distribution
D. Proportional to the squared distance from the nearest existing centroid

33 What is a 'Voronoi Diagram' in relation to K-Means?

A. A visualization where regions are defined by the closest centroid
B. A plot of the cost function
C. A method to initialize K
D. A type of soft clustering

34 If the clusters in the data are of very different densities and sizes, K-Means will:

A. Likely fail to identify the correct clusters
B. Automatically adjust the metric
C. Perform perfectly
D. Merge the clusters

35 Which step ensures K-Means is an unsupervised algorithm?

A. Iterating until convergence
B. Not using target labels for training
C. Minimizing WCSS
D. Calculating the mean

36 In the equation for WCSS, what is being squared?

A. The number of clusters
B. The number of iterations
C. The distance between two centroids
D. The distance between a point and its assigned centroid

37 Why is it often difficult to pick the optimal K using the Elbow method?

A. The 'elbow' might not be sharp or clear
B. It takes too long to compute
C. The plot is always a straight line
D. It requires labeled data

38 What is the primary role of the 'Coordinate Descent' concept in K-Means?

A. It is used for initialization
B. It calculates the distance
C. It is used to visualize data
D. It is the method used to optimize the objective function

39 If you perform K-Means on a dataset with 2 distinct well-separated blobs but set K=4, what happens?

A. It merges the blobs
B. It finds 2 clusters and ignores the other 2
C. It splits the natural blobs into smaller clusters
D. The algorithm crashes

40 In Soft Clustering, the sum of membership weights for a single data point across all clusters usually equals:

A. 1
B. K
C. 0
D. 100

41 Which of the following is NOT an application of K-Means?

A. Image Compression (Color Quantization)
B. Document Clustering
C. Customer Segmentation
D. Spam Classification (Supervised)

42 Does K-Means guarantee finding the global optimum for the WCSS?

A. Only if using Manhattan distance
B. Yes, always
C. Yes, if K is small
D. No, it depends on initialization

43 The computational cost of the distance calculation step for one point against K centroids is proportional to:

A. N^2
B. 1
C. K
D. N

44 Which component constitutes the 'model' after training K-Means?

A. The original dataset
B. The list of outliers
C. The Elbow plot
D. The coordinates of the final centroids

45 What is the relationship between Within-Cluster variance and Between-Cluster variance in a good clustering?

A. High within-cluster, Low between-cluster
B. Low within-cluster, High between-cluster
C. Low within-cluster, Low between-cluster
D. High within-cluster, High between-cluster

46 Lloyd's Algorithm is another name for:

A. Hierarchical Clustering
B. K-Means Algorithm
C. KNN
D. DBSCAN

47 In the context of image segmentation, what does a pixel represent in K-Means?

A. A centroid
B. A data point
C. A label
D. A cluster

48 Why might one choose a K value slightly different from the Elbow point?

A. To maximize WCSS
B. To increase computational cost
C. Based on business requirements or downstream tasks
D. Because the Elbow method is always wrong

49 If K=1, the centroid location will be:

A. Undefined
B. The mean of the entire dataset
C. The origin (0,0)
D. A random data point

50 What happens if a cluster becomes empty during K-Means iterations?

A. The algorithm stops
B. It is ignored and WCSS becomes 0
C. The empty cluster is usually re-initialized or removed
D. The K value increases