Unit 2 - Practice Quiz

INT423

1 What is the fundamental principle behind the Isolation Forest algorithm for anomaly detection?

A. It groups similar data points into high-density clusters.
B. It isolates anomalies by randomly selecting a feature and a split value.
C. It projects data onto a lower-dimensional hyperplane to find outliers.
D. It calculates the distance of each point to the k-nearest neighbors.

2 In an Isolation Forest, how are anomalies distinguished from normal observations based on tree structure?

A. Anomalies have longer path lengths from the root.
B. Anomalies have shorter path lengths from the root.
C. Anomalies end up in the largest leaf nodes.
D. Anomalies are always found at the root node.

3 Which of the following is a primary advantage of Isolation Forest over distance-based anomaly detection methods?

A. It requires labeled data for training.
B. It is computationally expensive for high-dimensional data.
C. It has linear time complexity and handles high-dimensional data well.
D. It calculates the density of every point precisely.

4 When deciding between Anomaly Detection and Supervised Learning, which scenario favors Anomaly Detection?

A. When the dataset is balanced with equal positive and negative examples.
B. When the number of positive examples (anomalies) is very small compared to negative examples.
C. When you have a massive amount of labeled data for all classes.
D. When the anomalies look exactly like the normal data.

5 In the context of supervised learning vs. anomaly detection, what is a 'skewed class' problem?

A. When the data has too many features.
B. When one class has significantly more samples than the other.
C. When the data is not normalized.
D. When the decision boundary is non-linear.

6 Which metric is generally NOT suitable for evaluating a model trained on a highly skewed dataset (anomaly detection scenario)?

A. Precision
B. Recall
C. F1-Score
D. Accuracy

7 How can Principal Component Analysis (PCA) be used for anomaly detection?

A. By increasing the number of dimensions to separate points.
B. By identifying points with a high reconstruction error.
C. By clustering points based on the first principal component only.
D. By labeling the data using eigenvectors.

8 Why is feature scaling (e.g., Mean Normalization) critical before applying PCA for anomaly detection?

A. PCA is a tree-based algorithm and requires scaling.
B. PCA seeks to maximize variance, so features with larger scales will dominate.
C. PCA only works with categorical data.
D. It ensures the reconstruction error is always zero.

9 In Hierarchical Clustering, what is the visual representation of the cluster hierarchy called?

A. Histogram
B. Scatter Plot
C. Dendrogram
D. Heatmap

10 Which of the following is NOT a requirement for Hierarchical Clustering?

A. A distance metric.
B. A linkage criterion.
C. Specifying the number of clusters (k) beforehand.
D. A dataset of points.

11 Agglomerative Clustering is often referred to as a strategy of which type?

A. Top-down
B. Bottom-up
C. Divide and conquer
D. Density-based

12 What is the first step in Agglomerative Clustering?

A. Assign all points to a single cluster.
B. Calculate the centroid of the entire dataset.
C. Treat each data point as an individual cluster.
D. Randomly pick k centroids.

13 In Agglomerative Clustering, 'Single Linkage' defines the distance between two clusters as:

A. The distance between their centroids.
B. The maximum distance between any single point in one cluster and any single point in the other.
C. The minimum distance between any single point in one cluster and any single point in the other.
D. The average distance between all pairs of points.

14 What is a known disadvantage of using Single Linkage in Agglomerative Clustering?

A. It forces clusters to be spherical.
B. It is sensitive to the order of data.
C. It suffers from the 'chaining' effect.
D. It is computationally too fast.

15 Which linkage method in Agglomerative Clustering minimizes the variance of the clusters being merged?

A. Single Linkage
B. Complete Linkage
C. Average Linkage
D. Ward's Method

16 DBSCAN stands for:

A. Density-Based Spatial Clustering of Applications with Noise
B. Distance-Based Spatial Clustering of Algorithms with Noise
C. Density-Based Statistical Clustering of Applications with Networks
D. Dual-Based Spatial Clustering of Applications with Nodes

17 What are the two main hyperparameters required for DBSCAN?

A. Number of clusters (k) and iterations.
B. Epsilon (eps) and Minimum Points (MinPts).
C. Learning rate and batch size.
D. Tree depth and number of estimators.

18 In DBSCAN, a point is classified as a 'Core Point' if:

A. It is the centroid of the data.
B. It has at least 'MinPts' neighbors within radius 'eps'.
C. It is reachable from a core point but has fewer than 'MinPts' neighbors.
D. It is far away from all other points.

19 How does DBSCAN classify a point that is within the 'eps' radius of a core point but has fewer than 'MinPts' neighbors itself?

A. Core Point
B. Noise Point
C. Border Point
D. Centroid

20 Which of the following is a major advantage of DBSCAN over K-Means?

A. It is faster for all dataset sizes.
B. It can discover clusters of arbitrary shapes.
C. It works well with varying densities.
D. It does not require any parameters.

21 What happens to 'Noise' points in DBSCAN?

A. They are assigned to the nearest cluster.
B. They are treated as a separate cluster containing outliers.
C. They are deleted from the dataset before clustering.
D. They are assigned to the largest cluster.

22 In Isolation Forest, the 'anomaly score' is derived from:

A. The Euclidean distance to the nearest neighbor.
B. The number of points in the epsilon radius.
C. The average path length of the point across the ensemble of trees.
D. The variance of the cluster it belongs to.

23 Which of the following is an example of 'Novelty Detection' rather than 'Outlier Detection'?

A. Detecting credit card fraud in historical transaction data.
B. Cleaning a dataset by removing errors.
C. Training on only 'normal' images of dogs to detect a cat image during testing.
D. Finding a malfunction in a machine during a live run based on past failures.

24 When choosing features for anomaly detection, what is a desirable property?

A. Features should be highly correlated with the anomaly label (if available).
B. Features should take on unusually large or small values for anomalies compared to normal instances.
C. Features should be categorical only.
D. Features should have zero variance.

25 What is the 'curse of dimensionality' in the context of distance-based clustering?

A. Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
B. The algorithm runs faster as dimensions increase.
C. High dimensions make visualization easier.
D. It refers to the difficulty of collecting data.

26 In hierarchical clustering, what does 'cutting the tree' determine?

A. The linkage criteria used.
B. The number of clusters in the final solution.
C. The distance metric used.
D. The root of the tree.

27 Which clustering algorithm is essentially an ensemble of random decision trees?

A. K-Means
B. DBSCAN
C. Isolation Forest
D. Agglomerative Clustering

28 Complete Linkage in Agglomerative Clustering is calculated based on:

A. The maximum distance between points in two clusters.
B. The minimum distance between points in two clusters.
C. The average distance between all points.
D. The distance between centroids.

29 If your dataset has clusters with significantly different densities, which algorithm might struggle?

A. Gaussian Mixture Models
B. DBSCAN
C. Decision Tree
D. Isolation Forest

30 What is the primary goal of PCA when used as a preprocessing step for clustering?

A. To increase the number of features.
B. To label the data.
C. To reduce noise and computational complexity by dimensionality reduction.
D. To ensure all clusters are the same size.

31 In an Isolation Forest, what is the maximum possible path length for a tree trained on samples?

A.
B.
C.
D.

32 Which supervised learning algorithm is most similar to the concept of Hierarchical Clustering?

A. Linear Regression
B. Decision Trees
C. Support Vector Machines
D. Neural Networks

33 Why is 'Divisive' hierarchical clustering less common than 'Agglomerative'?

A. It is less accurate.
B. It is computationally more expensive ( split possibilities).
C. It cannot produce a dendrogram.
D. It requires labeled data.

34 In the context of Anomaly Detection, what is a False Negative?

A. A normal point flagged as an anomaly.
B. An anomaly classified as normal.
C. A normal point classified as normal.
D. An anomaly classified as an anomaly.

35 Which of the following scenarios is BEST suited for Supervised Learning rather than Anomaly Detection?

A. Manufacturing quality control with 1 defective part per 10,000.
B. Email spam detection with thousands of examples for both spam and ham.
C. Intrusion detection with unknown attack patterns.
D. Detecting new stars in astronomy images.

36 What does the 'MinPts' parameter in DBSCAN represent?

A. The minimum distance between clusters.
B. The minimum number of points required to form a dense region.
C. The minimum number of clusters to find.
D. The minimum number of iterations to run.

37 Which PCA component captures the most variance in the data?

A. The last principal component.
B. The first principal component.
C. The second principal component.
D. All components capture equal variance.

38 How does Agglomerative Clustering handle outliers?

A. It deletes them automatically.
B. They are usually merged into clusters very late in the process.
C. It assigns them to a 'noise' bucket immediately.
D. It cannot run if outliers are present.

39 In Isolation Forest, subsampling (using a small subset of data to build each tree) helps to:

A. Increase the training time.
B. Reduce the ability to detect anomalies.
C. Minimize the effects of swamping and masking.
D. Increase memory usage.

40 Which of the following is true regarding the shape of clusters found by K-Means vs DBSCAN?

A. K-Means finds arbitrary shapes; DBSCAN finds spherical shapes.
B. Both find arbitrary shapes.
C. K-Means tends to find spherical shapes; DBSCAN finds arbitrary shapes.
D. Both are limited to spherical shapes.

41 When using PCA for anomaly detection, if a point has a very low projection on the principal components but a high reconstruction error, it implies:

A. The point is normal.
B. The point lies on the principal hyperplane.
C. The point lies far from the subspace defined by the principal components (Anomaly).
D. The point is the mean of the data.

42 In hierarchical clustering, what is the time complexity of the standard agglomerative algorithm (naive implementation)?

A.
B.
C.
D.

43 For a dataset with varying cluster sizes and significant noise, which algorithm is generally most robust?

A. K-Means
B. DBSCAN
C. Single Linkage Agglomerative Clustering
D. Linear Regression

44 What is 'masking' in the context of anomaly detection?

A. When an anomaly is hidden because it is too similar to normal data.
B. When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
C. When features are removed from the dataset.
D. When the algorithm runs out of memory.

45 Which of the following is NOT a distance metric commonly used in Hierarchical Clustering?

A. Euclidean Distance
B. Manhattan Distance
C. Cosine Similarity
D. Gini Impurity

46 If 'epsilon' is chosen to be very small in DBSCAN, what is the likely outcome?

A. All points will be in one cluster.
B. Most points will be classified as noise/outliers.
C. The algorithm will crash.
D. It will act exactly like K-Means.

47 If 'epsilon' is chosen to be very large in DBSCAN, what is the likely outcome?

A. All points will likely be merged into a single cluster.
B. Every point will be a noise point.
C. The clusters will be very small.
D. It creates a hierarchical tree.

48 Why is 'Average Linkage' often preferred over Single and Complete Linkage?

A. It is the fastest method.
B. It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
C. It does not require a distance matrix.
D. It always produces k=2 clusters.

49 In the context of fraud detection, why might one use Supervised Learning over Anomaly Detection?

A. If there are absolutely no examples of fraud available.
B. If the fraud patterns change every day completely.
C. If the company has a large, historically labeled database of verified fraud cases.
D. If the dataset is small.

50 The 'root' of a dendrogram in hierarchical clustering represents:

A. The first data point in the set.
B. A single cluster containing all data points.
C. The cluster with the highest variance.
D. The noise points.