Unit 2 - Practice Quiz

INT423 50 Questions
0 Correct 0 Wrong 50 Left
0/50

1 What is the fundamental principle behind the Isolation Forest algorithm for anomaly detection?

A. It isolates anomalies by randomly selecting a feature and a split value.
B. It groups similar data points into high-density clusters.
C. It calculates the distance of each point to the k-nearest neighbors.
D. It projects data onto a lower-dimensional hyperplane to find outliers.

2 In an Isolation Forest, how are anomalies distinguished from normal observations based on tree structure?

A. Anomalies end up in the largest leaf nodes.
B. Anomalies have longer path lengths from the root.
C. Anomalies are always found at the root node.
D. Anomalies have shorter path lengths from the root.

3 Which of the following is a primary advantage of Isolation Forest over distance-based anomaly detection methods?

A. It requires labeled data for training.
B. It is computationally expensive for high-dimensional data.
C. It has linear time complexity and handles high-dimensional data well.
D. It calculates the density of every point precisely.

4 When deciding between Anomaly Detection and Supervised Learning, which scenario favors Anomaly Detection?

A. When the dataset is balanced with equal positive and negative examples.
B. When the anomalies look exactly like the normal data.
C. When you have a massive amount of labeled data for all classes.
D. When the number of positive examples (anomalies) is very small compared to negative examples.

5 In the context of supervised learning vs. anomaly detection, what is a 'skewed class' problem?

A. When one class has significantly more samples than the other.
B. When the data is not normalized.
C. When the decision boundary is non-linear.
D. When the data has too many features.

6 Which metric is generally NOT suitable for evaluating a model trained on a highly skewed dataset (anomaly detection scenario)?

A. Accuracy
B. F1-Score
C. Recall
D. Precision

7 How can Principal Component Analysis (PCA) be used for anomaly detection?

A. By labeling the data using eigenvectors.
B. By clustering points based on the first principal component only.
C. By identifying points with a high reconstruction error.
D. By increasing the number of dimensions to separate points.

8 Why is feature scaling (e.g., Mean Normalization) critical before applying PCA for anomaly detection?

A. PCA is a tree-based algorithm and requires scaling.
B. It ensures the reconstruction error is always zero.
C. PCA seeks to maximize variance, so features with larger scales will dominate.
D. PCA only works with categorical data.

9 In Hierarchical Clustering, what is the visual representation of the cluster hierarchy called?

A. Scatter Plot
B. Histogram
C. Heatmap
D. Dendrogram

10 Which of the following is NOT a requirement for Hierarchical Clustering?

A. A dataset of points.
B. A distance metric.
C. Specifying the number of clusters (k) beforehand.
D. A linkage criterion.

11 Agglomerative Clustering is often referred to as a strategy of which type?

A. Divide and conquer
B. Top-down
C. Density-based
D. Bottom-up

12 What is the first step in Agglomerative Clustering?

A. Treat each data point as an individual cluster.
B. Calculate the centroid of the entire dataset.
C. Randomly pick k centroids.
D. Assign all points to a single cluster.

13 In Agglomerative Clustering, 'Single Linkage' defines the distance between two clusters as:

A. The average distance between all pairs of points.
B. The distance between their centroids.
C. The minimum distance between any single point in one cluster and any single point in the other.
D. The maximum distance between any single point in one cluster and any single point in the other.

14 What is a known disadvantage of using Single Linkage in Agglomerative Clustering?

A. It forces clusters to be spherical.
B. It is sensitive to the order of data.
C. It is computationally too fast.
D. It suffers from the 'chaining' effect.

15 Which linkage method in Agglomerative Clustering minimizes the variance of the clusters being merged?

A. Average Linkage
B. Ward's Method
C. Complete Linkage
D. Single Linkage

16 DBSCAN stands for:

A. Distance-Based Spatial Clustering of Algorithms with Noise
B. Density-Based Statistical Clustering of Applications with Networks
C. Density-Based Spatial Clustering of Applications with Noise
D. Dual-Based Spatial Clustering of Applications with Nodes

17 What are the two main hyperparameters required for DBSCAN?

A. Number of clusters (k) and iterations.
B. Epsilon (eps) and Minimum Points (MinPts).
C. Learning rate and batch size.
D. Tree depth and number of estimators.

18 In DBSCAN, a point is classified as a 'Core Point' if:

A. It is reachable from a core point but has fewer than 'MinPts' neighbors.
B. It is far away from all other points.
C. It has at least 'MinPts' neighbors within radius 'eps'.
D. It is the centroid of the data.

19 How does DBSCAN classify a point that is within the 'eps' radius of a core point but has fewer than 'MinPts' neighbors itself?

A. Border Point
B. Core Point
C. Noise Point
D. Centroid

20 Which of the following is a major advantage of DBSCAN over K-Means?

A. It does not require any parameters.
B. It can discover clusters of arbitrary shapes.
C. It is faster for all dataset sizes.
D. It works well with varying densities.

21 What happens to 'Noise' points in DBSCAN?

A. They are deleted from the dataset before clustering.
B. They are treated as a separate cluster containing outliers.
C. They are assigned to the largest cluster.
D. They are assigned to the nearest cluster.

22 In Isolation Forest, the 'anomaly score' is derived from:

A. The Euclidean distance to the nearest neighbor.
B. The average path length of the point across the ensemble of trees.
C. The variance of the cluster it belongs to.
D. The number of points in the epsilon radius.

23 Which of the following is an example of 'Novelty Detection' rather than 'Outlier Detection'?

A. Training on only 'normal' images of dogs to detect a cat image during testing.
B. Cleaning a dataset by removing errors.
C. Detecting credit card fraud in historical transaction data.
D. Finding a malfunction in a machine during a live run based on past failures.

24 When choosing features for anomaly detection, what is a desirable property?

A. Features should be categorical only.
B. Features should be highly correlated with the anomaly label (if available).
C. Features should take on unusually large or small values for anomalies compared to normal instances.
D. Features should have zero variance.

25 What is the 'curse of dimensionality' in the context of distance-based clustering?

A. It refers to the difficulty of collecting data.
B. High dimensions make visualization easier.
C. The algorithm runs faster as dimensions increase.
D. Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.

26 In hierarchical clustering, what does 'cutting the tree' determine?

A. The linkage criteria used.
B. The distance metric used.
C. The root of the tree.
D. The number of clusters in the final solution.

27 Which clustering algorithm is essentially an ensemble of random decision trees?

A. DBSCAN
B. K-Means
C. Isolation Forest
D. Agglomerative Clustering

28 Complete Linkage in Agglomerative Clustering is calculated based on:

A. The maximum distance between points in two clusters.
B. The distance between centroids.
C. The minimum distance between points in two clusters.
D. The average distance between all points.

29 If your dataset has clusters with significantly different densities, which algorithm might struggle?

A. Gaussian Mixture Models
B. Decision Tree
C. DBSCAN
D. Isolation Forest

30 What is the primary goal of PCA when used as a preprocessing step for clustering?

A. To reduce noise and computational complexity by dimensionality reduction.
B. To ensure all clusters are the same size.
C. To increase the number of features.
D. To label the data.

31 In an Isolation Forest, what is the maximum possible path length for a tree trained on samples?

A.
B.
C.
D.

32 Which supervised learning algorithm is most similar to the concept of Hierarchical Clustering?

A. Neural Networks
B. Support Vector Machines
C. Decision Trees
D. Linear Regression

33 Why is 'Divisive' hierarchical clustering less common than 'Agglomerative'?

A. It requires labeled data.
B. It is computationally more expensive ( split possibilities).
C. It cannot produce a dendrogram.
D. It is less accurate.

34 In the context of Anomaly Detection, what is a False Negative?

A. A normal point flagged as an anomaly.
B. An anomaly classified as an anomaly.
C. A normal point classified as normal.
D. An anomaly classified as normal.

35 Which of the following scenarios is BEST suited for Supervised Learning rather than Anomaly Detection?

A. Intrusion detection with unknown attack patterns.
B. Detecting new stars in astronomy images.
C. Email spam detection with thousands of examples for both spam and ham.
D. Manufacturing quality control with 1 defective part per 10,000.

36 What does the 'MinPts' parameter in DBSCAN represent?

A. The minimum distance between clusters.
B. The minimum number of clusters to find.
C. The minimum number of points required to form a dense region.
D. The minimum number of iterations to run.

37 Which PCA component captures the most variance in the data?

A. The second principal component.
B. All components capture equal variance.
C. The last principal component.
D. The first principal component.

38 How does Agglomerative Clustering handle outliers?

A. It assigns them to a 'noise' bucket immediately.
B. It deletes them automatically.
C. It cannot run if outliers are present.
D. They are usually merged into clusters very late in the process.

39 In Isolation Forest, subsampling (using a small subset of data to build each tree) helps to:

A. Increase the training time.
B. Reduce the ability to detect anomalies.
C. Increase memory usage.
D. Minimize the effects of swamping and masking.

40 Which of the following is true regarding the shape of clusters found by K-Means vs DBSCAN?

A. Both are limited to spherical shapes.
B. K-Means tends to find spherical shapes; DBSCAN finds arbitrary shapes.
C. Both find arbitrary shapes.
D. K-Means finds arbitrary shapes; DBSCAN finds spherical shapes.

41 When using PCA for anomaly detection, if a point has a very low projection on the principal components but a high reconstruction error, it implies:

A. The point lies on the principal hyperplane.
B. The point is normal.
C. The point lies far from the subspace defined by the principal components (Anomaly).
D. The point is the mean of the data.

42 In hierarchical clustering, what is the time complexity of the standard agglomerative algorithm (naive implementation)?

A.
B.
C.
D.

43 For a dataset with varying cluster sizes and significant noise, which algorithm is generally most robust?

A. DBSCAN
B. Single Linkage Agglomerative Clustering
C. K-Means
D. Linear Regression

44 What is 'masking' in the context of anomaly detection?

A. When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
B. When the algorithm runs out of memory.
C. When an anomaly is hidden because it is too similar to normal data.
D. When features are removed from the dataset.

45 Which of the following is NOT a distance metric commonly used in Hierarchical Clustering?

A. Euclidean Distance
B. Manhattan Distance
C. Gini Impurity
D. Cosine Similarity

46 If 'epsilon' is chosen to be very small in DBSCAN, what is the likely outcome?

A. All points will be in one cluster.
B. The algorithm will crash.
C. Most points will be classified as noise/outliers.
D. It will act exactly like K-Means.

47 If 'epsilon' is chosen to be very large in DBSCAN, what is the likely outcome?

A. It creates a hierarchical tree.
B. Every point will be a noise point.
C. The clusters will be very small.
D. All points will likely be merged into a single cluster.

48 Why is 'Average Linkage' often preferred over Single and Complete Linkage?

A. It always produces k=2 clusters.
B. It does not require a distance matrix.
C. It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
D. It is the fastest method.

49 In the context of fraud detection, why might one use Supervised Learning over Anomaly Detection?

A. If the fraud patterns change every day completely.
B. If the dataset is small.
C. If the company has a large, historically labeled database of verified fraud cases.
D. If there are absolutely no examples of fraud available.

50 The 'root' of a dendrogram in hierarchical clustering represents:

A. A single cluster containing all data points.
B. The noise points.
C. The cluster with the highest variance.
D. The first data point in the set.