Unit2 - Subjective Questions

INT396 • Practice Questions with Detailed Answers

1

Explain the fundamental assumptions made by partition-based clustering algorithms, such as k-Means, regarding the structure, shape, and size of the clusters.

2

Distinguish between Hard and Soft clustering. Provide an example of an algorithm for each approach.

3

Define the objective function of the standard k-Means algorithm. Explain its components and what the algorithm attempts to achieve with this function.

4

Outline the iterative steps of Lloyd's algorithm for k-Means clustering.

5

Describe the k-Means++ initialization strategy and explain how it improves upon the random initialization method.

6

Prove or logically explain why the standard k-Means algorithm is guaranteed to converge.

7

Discuss the primary limitations of the k-Means clustering algorithm, particularly focusing on outliers and cluster shapes.

8

Compare and contrast k-Means and k-Medoids (PAM) clustering algorithms. Under what circumstances would you prefer k-Medoids?

9

Explain the Partitioning Around Medoids (PAM) algorithm step-by-step.

10

Explain the impact of data standardization and scaling on partition-based clustering algorithms like k-Means.

11

Describe the MiniBatch k-Means algorithm and discuss its advantages for large-scale datasets compared to standard k-Means.

12

Define Inertia (Within-Cluster Sum of Squares) and explain its role in evaluating cluster quality. Why can't Inertia be used to definitively determine the absolute quality of clusters?

13

Explain the Silhouette Coefficient. How is it calculated, and how should its values be interpreted in the context of cluster validation?

14

Describe the Davies–Bouldin Index for cluster validation. What constitutes a "good" value for this index?

15

Explain the Elbow Method for determining the optimal number of clusters (). Discuss its common pitfalls.

16

Explain how soft clustering relaxes the constraints of hard clustering, specifically formatting the mathematical properties of the membership matrix.

17

Detail how the cluster centers are updated in the MiniBatch k-Means algorithm as opposed to the batch update in standard k-Means.

18

Provide a hypothetical scenario demonstrating how failing to scale features with vastly different variances can distort k-Means clusters.

19

Formulate the optimization problem for k-Means clustering and explain conceptually why finding the absolute global minimum is considered an NP-hard problem.

20

Compare Inertia, Silhouette Coefficient, and Davies-Bouldin Index. Are any of these metrics robust to non-globular clusters, or do they all suffer from similar geometric assumptions?