1

Explain the concept of the 'Curse of Dimensionality' and how it impacts the performance of machine learning algorithms.

2

What is Dimensionality Reduction? Discuss the primary reasons why it is a crucial step in unsupervised learning and data preprocessing.

3

Differentiate between Feature Selection and Feature Extraction in the context of Dimensionality Reduction.

4

Explain the geometric intuition behind Principal Component Analysis (PCA).

5

Outline the step-by-step mathematical algorithm for performing Principal Component Analysis (PCA).

6

Describe the role of eigenvalues and eigenvectors in Principal Component Analysis (PCA).

7

What is Manifold Learning? Explain the concept using the classic 'Swiss Roll' dataset.

8

Differentiate between Euclidean distance and Geodesic distance in the context of Manifold Learning.

9

Why do linear dimensionality reduction techniques like PCA often fail on complex real-world data structures?

10

Explain the conceptual framework of t-SNE (t-Distributed Stochastic Neighbor Embedding). What is its primary use case?

11

What is the 'crowding problem' in Stochastic Neighbor Embedding (SNE), and how does t-SNE solve it?

12

Describe the role of Kullback-Leibler (KL) divergence in the t-SNE algorithm.

13

Compare and contrast t-SNE and PCA for dimensionality reduction.

Comparison of PCA and t-SNE:

Feature	PCA (Principal Component Analysis)	t-SNE (t-Distributed Stochastic Neighbor Embedding)
Linearity	Linear technique. Relies on matrix factorization.	Non-linear technique. Relies on manifold learning and probability.
Goal	Maximizes global variance. Preserves large pairwise distances.	Preserves local neighborhood structures. Focuses on local pairwise distances.
Computational Cost	Fast and computationally inexpensive. Scales well.	Very slow and computationally expensive. Does not scale well to huge datasets.
Determinism	Deterministic. Gives the same output every time.	Stochastic (randomized). Gives different outputs on different runs unless a seed is set.
Use Case	Feature extraction, noise reduction, and general dimensionality reduction for downstream ML tasks.	Primarily used for 2D/3D Data Visualization of complex, non-linear data.
Distance Preservation	Global distances are preserved.	Only local distances are preserved; global distances are often distorted.

14

Explain the conceptual intuition behind UMAP (Uniform Manifold Approximation and Projection).

15

What are the key advantages of UMAP over t-SNE?

16

What is an Autoencoder? Explain its basic architecture including the Encoder, Bottleneck, and Decoder.

17

Explain the concept of 'reconstruction loss' in an Autoencoder. How does it guide the learning process?

18

How does a linear Autoencoder relate to Principal Component Analysis (PCA)? What is the benefit of adding non-linear activation functions?

19

In Autoencoders, what happens if the bottleneck layer is completely removed or if it has a higher dimensionality than the input layer? Explain the concept of 'Overcomplete Autoencoders'.

20

Discuss the significance of the hyperparameter 'Perplexity' in the t-SNE algorithm.

Unit4 - Subjective Questions