Unit 2 - Practice Quiz

INT423 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the fundamental principle behind the Isolation Forest algorithm for anomaly detection?

A. It isolates anomalies by randomly selecting a feature and a split value.
B. It projects data onto a lower-dimensional hyperplane to find outliers.
C. It groups similar data points into high-density clusters.
D. It calculates the distance of each point to the k-nearest neighbors.

2 In an Isolation Forest, how are anomalies distinguished from normal observations based on tree structure?

A. Anomalies end up in the largest leaf nodes.
B. Anomalies have longer path lengths from the root.
C. Anomalies are always found at the root node.
D. Anomalies have shorter path lengths from the root.

3 Which of the following is a primary advantage of Isolation Forest over distance-based anomaly detection methods?

A. It has linear time complexity and handles high-dimensional data well.
B. It requires labeled data for training.
C. It calculates the density of every point precisely.
D. It is computationally expensive for high-dimensional data.

4 When deciding between Anomaly Detection and Supervised Learning, which scenario favors Anomaly Detection?

A. When the anomalies look exactly like the normal data.
B. When you have a massive amount of labeled data for all classes.
C. When the dataset is balanced with equal positive and negative examples.
D. When the number of positive examples (anomalies) is very small compared to negative examples.

5 In the context of supervised learning vs. anomaly detection, what is a 'skewed class' problem?

A. When the decision boundary is non-linear.
B. When one class has significantly more samples than the other.
C. When the data is not normalized.
D. When the data has too many features.

6 Which metric is generally NOT suitable for evaluating a model trained on a highly skewed dataset (anomaly detection scenario)?

A. Recall
B. Precision
C. F1-Score
D. Accuracy

7 How can Principal Component Analysis (PCA) be used for anomaly detection?

A. By increasing the number of dimensions to separate points.
B. By labeling the data using eigenvectors.
C. By clustering points based on the first principal component only.
D. By identifying points with a high reconstruction error.

8 Why is feature scaling (e.g., Mean Normalization) critical before applying PCA for anomaly detection?

A. PCA is a tree-based algorithm and requires scaling.
B. It ensures the reconstruction error is always zero.
C. PCA seeks to maximize variance, so features with larger scales will dominate.
D. PCA only works with categorical data.

9 In Hierarchical Clustering, what is the visual representation of the cluster hierarchy called?

A. Scatter Plot
B. Dendrogram
C. Heatmap
D. Histogram

10 Which of the following is NOT a requirement for Hierarchical Clustering?

A. A distance metric.
B. Specifying the number of clusters (k) beforehand.
C. A dataset of points.
D. A linkage criterion.

11 Agglomerative Clustering is often referred to as a strategy of which type?

A. Bottom-up
B. Density-based
C. Divide and conquer
D. Top-down

12 What is the first step in Agglomerative Clustering?

A. Calculate the centroid of the entire dataset.
B. Randomly pick k centroids.
C. Treat each data point as an individual cluster.
D. Assign all points to a single cluster.

13 In Agglomerative Clustering, 'Single Linkage' defines the distance between two clusters as:

A. The distance between their centroids.
B. The average distance between all pairs of points.
C. The minimum distance between any single point in one cluster and any single point in the other.
D. The maximum distance between any single point in one cluster and any single point in the other.

14 What is a known disadvantage of using Single Linkage in Agglomerative Clustering?

A. It is computationally too fast.
B. It forces clusters to be spherical.
C. It is sensitive to the order of data.
D. It suffers from the 'chaining' effect.

15 Which linkage method in Agglomerative Clustering minimizes the variance of the clusters being merged?

A. Single Linkage
B. Average Linkage
C. Complete Linkage
D. Ward's Method

16 DBSCAN stands for:

A. Density-Based Spatial Clustering of Applications with Noise
B. Density-Based Statistical Clustering of Applications with Networks
C. Dual-Based Spatial Clustering of Applications with Nodes
D. Distance-Based Spatial Clustering of Algorithms with Noise

17 What are the two main hyperparameters required for DBSCAN?

A. Tree depth and number of estimators.
B. Epsilon (eps) and Minimum Points (MinPts).
C. Number of clusters (k) and iterations.
D. Learning rate and batch size.

18 In DBSCAN, a point is classified as a 'Core Point' if:

A. It is reachable from a core point but has fewer than 'MinPts' neighbors.
B. It has at least 'MinPts' neighbors within radius 'eps'.
C. It is far away from all other points.
D. It is the centroid of the data.

19 How does DBSCAN classify a point that is within the 'eps' radius of a core point but has fewer than 'MinPts' neighbors itself?

A. Core Point
B. Centroid
C. Border Point
D. Noise Point

20 Which of the following is a major advantage of DBSCAN over K-Means?

A. It works well with varying densities.
B. It can discover clusters of arbitrary shapes.
C. It does not require any parameters.
D. It is faster for all dataset sizes.

21 What happens to 'Noise' points in DBSCAN?

A. They are assigned to the largest cluster.
B. They are deleted from the dataset before clustering.
C. They are treated as a separate cluster containing outliers.
D. They are assigned to the nearest cluster.

22 In Isolation Forest, the 'anomaly score' is derived from:

A. The number of points in the epsilon radius.
B. The Euclidean distance to the nearest neighbor.
C. The average path length of the point across the ensemble of trees.
D. The variance of the cluster it belongs to.

23 Which of the following is an example of 'Novelty Detection' rather than 'Outlier Detection'?

A. Finding a malfunction in a machine during a live run based on past failures.
B. Training on only 'normal' images of dogs to detect a cat image during testing.
C. Cleaning a dataset by removing errors.
D. Detecting credit card fraud in historical transaction data.

24 When choosing features for anomaly detection, what is a desirable property?

A. Features should be highly correlated with the anomaly label (if available).
B. Features should take on unusually large or small values for anomalies compared to normal instances.
C. Features should have zero variance.
D. Features should be categorical only.

25 What is the 'curse of dimensionality' in the context of distance-based clustering?

A. High dimensions make visualization easier.
B. The algorithm runs faster as dimensions increase.
C. Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
D. It refers to the difficulty of collecting data.

26 In hierarchical clustering, what does 'cutting the tree' determine?

A. The root of the tree.
B. The linkage criteria used.
C. The distance metric used.
D. The number of clusters in the final solution.

27 Which clustering algorithm is essentially an ensemble of random decision trees?

A. DBSCAN
B. K-Means
C. Agglomerative Clustering
D. Isolation Forest

28 Complete Linkage in Agglomerative Clustering is calculated based on:

A. The distance between centroids.
B. The average distance between all points.
C. The minimum distance between points in two clusters.
D. The maximum distance between points in two clusters.

29 If your dataset has clusters with significantly different densities, which algorithm might struggle?

A. Gaussian Mixture Models
B. Isolation Forest
C. Decision Tree
D. DBSCAN

30 What is the primary goal of PCA when used as a preprocessing step for clustering?

A. To label the data.
B. To ensure all clusters are the same size.
C. To increase the number of features.
D. To reduce noise and computational complexity by dimensionality reduction.

31 In an Isolation Forest, what is the maximum possible path length for a tree trained on samples?

A.
B.
C.
D.

32 Which supervised learning algorithm is most similar to the concept of Hierarchical Clustering?

A. Decision Trees
B. Support Vector Machines
C. Linear Regression
D. Neural Networks

33 Why is 'Divisive' hierarchical clustering less common than 'Agglomerative'?

A. It is less accurate.
B. It requires labeled data.
C. It cannot produce a dendrogram.
D. It is computationally more expensive ( split possibilities).

34 In the context of Anomaly Detection, what is a False Negative?

A. A normal point classified as normal.
B. An anomaly classified as normal.
C. An anomaly classified as an anomaly.
D. A normal point flagged as an anomaly.

35 Which of the following scenarios is BEST suited for Supervised Learning rather than Anomaly Detection?

A. Manufacturing quality control with 1 defective part per 10,000.
B. Intrusion detection with unknown attack patterns.
C. Detecting new stars in astronomy images.
D. Email spam detection with thousands of examples for both spam and ham.

36 What does the 'MinPts' parameter in DBSCAN represent?

A. The minimum number of points required to form a dense region.
B. The minimum number of clusters to find.
C. The minimum distance between clusters.
D. The minimum number of iterations to run.

37 Which PCA component captures the most variance in the data?

A. The last principal component.
B. The second principal component.
C. The first principal component.
D. All components capture equal variance.

38 How does Agglomerative Clustering handle outliers?

A. It cannot run if outliers are present.
B. It assigns them to a 'noise' bucket immediately.
C. They are usually merged into clusters very late in the process.
D. It deletes them automatically.

39 In Isolation Forest, subsampling (using a small subset of data to build each tree) helps to:

A. Minimize the effects of swamping and masking.
B. Increase memory usage.
C. Reduce the ability to detect anomalies.
D. Increase the training time.

40 Which of the following is true regarding the shape of clusters found by K-Means vs DBSCAN?

A. K-Means tends to find spherical shapes; DBSCAN finds arbitrary shapes.
B. Both are limited to spherical shapes.
C. Both find arbitrary shapes.
D. K-Means finds arbitrary shapes; DBSCAN finds spherical shapes.

41 When using PCA for anomaly detection, if a point has a very low projection on the principal components but a high reconstruction error, it implies:

A. The point lies far from the subspace defined by the principal components (Anomaly).
B. The point is the mean of the data.
C. The point lies on the principal hyperplane.
D. The point is normal.

42 In hierarchical clustering, what is the time complexity of the standard agglomerative algorithm (naive implementation)?

A.
B.
C.
D.

43 For a dataset with varying cluster sizes and significant noise, which algorithm is generally most robust?

A. DBSCAN
B. K-Means
C. Linear Regression
D. Single Linkage Agglomerative Clustering

44 What is 'masking' in the context of anomaly detection?

A. When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
B. When an anomaly is hidden because it is too similar to normal data.
C. When features are removed from the dataset.
D. When the algorithm runs out of memory.

45 Which of the following is NOT a distance metric commonly used in Hierarchical Clustering?

A. Cosine Similarity
B. Manhattan Distance
C. Euclidean Distance
D. Gini Impurity

46 If 'epsilon' is chosen to be very small in DBSCAN, what is the likely outcome?

A. It will act exactly like K-Means.
B. The algorithm will crash.
C. Most points will be classified as noise/outliers.
D. All points will be in one cluster.

47 If 'epsilon' is chosen to be very large in DBSCAN, what is the likely outcome?

A. The clusters will be very small.
B. All points will likely be merged into a single cluster.
C. It creates a hierarchical tree.
D. Every point will be a noise point.

48 Why is 'Average Linkage' often preferred over Single and Complete Linkage?

A. It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
B. It always produces k=2 clusters.
C. It does not require a distance matrix.
D. It is the fastest method.

49 In the context of fraud detection, why might one use Supervised Learning over Anomaly Detection?

A. If the dataset is small.
B. If there are absolutely no examples of fraud available.
C. If the fraud patterns change every day completely.
D. If the company has a large, historically labeled database of verified fraud cases.

50 The 'root' of a dendrogram in hierarchical clustering represents:

A. The noise points.
B. A single cluster containing all data points.
C. The first data point in the set.
D. The cluster with the highest variance.