1

Explain the fundamental role of Unsupervised Learning and distinguish it from Supervised Learning.

Role of Unsupervised Learning:
Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. Its primary roles include:

Clustering: Grouping similar data points together (e.g., customer segmentation).
Dimensionality Reduction: Compressing data while maintaining structure (e.g., PCA).
Anomaly Detection: Identifying rare events or outliers.

Differences:

Feature	Supervised Learning	Unsupervised Learning
Input Data	Labeled data (Input-Output pairs).	Unlabeled data (Input only).
Goal	Predict outcomes or classify data.	Find hidden patterns or structures.
Feedback	Direct feedback (error minimization).	No external feedback/ground truth.
Algorithms	Linear Regression, SVM, Decision Trees.	K-Means, Apriori, PCA.

2

Mathematically define Euclidean Distance and Manhattan Distance. In which scenarios is Manhattan distance preferred over Euclidean?

3

What is Cosine Distance? Derive the formula for Cosine Similarity and explain why it is commonly used in text mining.

4

Describe the K-Means Clustering algorithm step-by-step. What is the objective function it tries to minimize?

5

Explain the 'Elbow Method' for determining the optimal number of clusters ( $K$ ) in K-Means.

6

Compare K-Means and K-Medoids (PAM). Why is K-Medoids considered more robust to outliers?

Comparison:

Feature	K-Means	K-Medoids (PAM)
Center Representation	Mean of points in the cluster.	Actual data point (Medoid) from the cluster.
Complexity	Faster, $O(K \cdot N \cdot I)$ .	Slower, computationally expensive for large datasets.
Robustness	Sensitive to outliers.	Robust to outliers.

Robustness to Outliers:
K-Means uses the mean to update centroids. A single outlier with an extreme value can significantly shift the mean, dragging the cluster centroid toward the outlier. K-Medoids uses an actual data point (medoid) that minimizes the sum of dissimilarities. Since it uses the most central item rather than an average value, extreme outliers do not affect the position of the representative point as drastically.

7

Discuss Hierarchical Clustering. Distinguish between Agglomerative and Divisive approaches.

8

Explain the different Linkage Criteria used in Hierarchical Clustering: Single, Complete, Average, and Ward's Linkage.

9

Describe the DBSCAN algorithm. Define the concepts of Core Points, Border Points, and Noise Points.

10

What are the primary advantages of Density-Based Clustering (DBSCAN) over Partitioning methods like K-Means?

11

Explain the concept of Anomaly Detection in the context of Unsupervised Learning.

12

Define Inertia as an evaluation metric for clustering. What are its limitations?

13

Explain the Silhouette Score. How is it calculated and how do you interpret its value?

14

What is the Davies–Bouldin Index? How does it differ from the Silhouette Score in terms of optimization?

15

Discuss the 'Curse of Dimensionality' and its impact on choosing distance metrics for clustering.

16

Compare Partitioning (K-Means), Hierarchical, and Density-based (DBSCAN) clustering methods.

Feature	Partitioning (K-Means)	Hierarchical	Density-Based (DBSCAN)
Shape of Clusters	Spherical/Convex.	Any shape (depends on linkage).	Arbitrary shapes.
Parameters	Number of clusters ( $K$ ).	Number of clusters or Cutoff distance.	Radius ( $\epsilon$ ) and MinPts.
Outliers	Sensitive (forces assignment).	Sensitive (often merges with clusters).	Robust (identifies as noise).
Complexity	Linear $O(N)$ . Efficient for large data.	Cubic $O(N^3)$ or Quadratic $O(N^2)$ . Slow.	$O(N \log N)$ with spatial index.
determinism	Non-deterministic (random init).	Deterministic.	Deterministic (mostly).

17

What is a Dendrogram? How is it used to determine the number of clusters in Hierarchical Clustering?

18

Why is scaling or normalization of features critical before applying distance-based clustering algorithms?

19

Explain the concept of 'Centroid' in K-Means versus 'Medoid' in K-Medoids.

20

Describe the main challenges associated with Unsupervised Learning compared to Supervised Learning.

Unit6 - Subjective Questions