1What is the fundamental principle behind the Isolation Forest algorithm for anomaly detection?
A.It isolates anomalies by randomly selecting a feature and a split value.
B.It projects data onto a lower-dimensional hyperplane to find outliers.
C.It groups similar data points into high-density clusters.
D.It calculates the distance of each point to the k-nearest neighbors.
Correct Answer: It isolates anomalies by randomly selecting a feature and a split value.
Explanation:
Isolation Forest works on the principle that anomalies are 'few and different,' making them easier to isolate with fewer random partitions than normal points.
Incorrect! Try again.
2In an Isolation Forest, how are anomalies distinguished from normal observations based on tree structure?
A.Anomalies end up in the largest leaf nodes.
B.Anomalies have longer path lengths from the root.
C.Anomalies are always found at the root node.
D.Anomalies have shorter path lengths from the root.
Correct Answer: Anomalies have shorter path lengths from the root.
Explanation:
Because anomalies are distinct and rare, they require fewer random splits to be isolated, resulting in shorter path lengths in the trees.
Incorrect! Try again.
3Which of the following is a primary advantage of Isolation Forest over distance-based anomaly detection methods?
A.It has linear time complexity and handles high-dimensional data well.
B.It requires labeled data for training.
C.It calculates the density of every point precisely.
D.It is computationally expensive for high-dimensional data.
Correct Answer: It has linear time complexity and handles high-dimensional data well.
Explanation:
Isolation Forest does not rely on expensive distance calculations between all points, making it efficient (linear time complexity) and effective in high dimensions.
Incorrect! Try again.
4When deciding between Anomaly Detection and Supervised Learning, which scenario favors Anomaly Detection?
A.When the anomalies look exactly like the normal data.
B.When you have a massive amount of labeled data for all classes.
C.When the dataset is balanced with equal positive and negative examples.
D.When the number of positive examples (anomalies) is very small compared to negative examples.
Correct Answer: When the number of positive examples (anomalies) is very small compared to negative examples.
Explanation:
Anomaly detection is preferred when the 'positive' class (anomalies) is extremely rare or when the nature of future anomalies is unknown.
Incorrect! Try again.
5In the context of supervised learning vs. anomaly detection, what is a 'skewed class' problem?
A.When the decision boundary is non-linear.
B.When one class has significantly more samples than the other.
C.When the data is not normalized.
D.When the data has too many features.
Correct Answer: When one class has significantly more samples than the other.
Explanation:
A skewed class problem occurs when the distribution of classes is highly imbalanced, such as having 99.9% normal transactions and 0.1% fraudulent ones.
Incorrect! Try again.
6Which metric is generally NOT suitable for evaluating a model trained on a highly skewed dataset (anomaly detection scenario)?
A.Recall
B.Precision
C.F1-Score
D.Accuracy
Correct Answer: Accuracy
Explanation:
In highly skewed datasets, a model that predicts 'normal' for every instance can achieve high accuracy (e.g., 99%) while failing to detect any anomalies.
Incorrect! Try again.
7How can Principal Component Analysis (PCA) be used for anomaly detection?
A.By increasing the number of dimensions to separate points.
B.By labeling the data using eigenvectors.
C.By clustering points based on the first principal component only.
D.By identifying points with a high reconstruction error.
Correct Answer: By identifying points with a high reconstruction error.
Explanation:
PCA captures the normal variance of data. Anomalies often cannot be well-reconstructed using the principal components, leading to a high reconstruction error.
Incorrect! Try again.
8Why is feature scaling (e.g., Mean Normalization) critical before applying PCA for anomaly detection?
A.PCA is a tree-based algorithm and requires scaling.
B.It ensures the reconstruction error is always zero.
C.PCA seeks to maximize variance, so features with larger scales will dominate.
D.PCA only works with categorical data.
Correct Answer: PCA seeks to maximize variance, so features with larger scales will dominate.
Explanation:
PCA projects data in the direction of maximum variance. If features are not scaled, variables with large absolute values will dominate the principal components purely due to their scale.
Incorrect! Try again.
9In Hierarchical Clustering, what is the visual representation of the cluster hierarchy called?
A.Scatter Plot
B.Dendrogram
C.Heatmap
D.Histogram
Correct Answer: Dendrogram
Explanation:
A dendrogram is a tree-like diagram that records the sequences of merges or splits in hierarchical clustering.
Incorrect! Try again.
10Which of the following is NOT a requirement for Hierarchical Clustering?
A.A distance metric.
B.Specifying the number of clusters (k) beforehand.
C.A dataset of points.
D.A linkage criterion.
Correct Answer: Specifying the number of clusters (k) beforehand.
Explanation:
Unlike K-Means, hierarchical clustering does not require the number of clusters to be specified in advance; the number of clusters is determined by cutting the dendrogram.
Incorrect! Try again.
11Agglomerative Clustering is often referred to as a strategy of which type?
A.Bottom-up
B.Density-based
C.Divide and conquer
D.Top-down
Correct Answer: Bottom-up
Explanation:
Agglomerative clustering starts with every point as its own cluster and iteratively merges the closest pairs, hence a 'bottom-up' approach.
Incorrect! Try again.
12What is the first step in Agglomerative Clustering?
A.Calculate the centroid of the entire dataset.
B.Randomly pick k centroids.
C.Treat each data point as an individual cluster.
D.Assign all points to a single cluster.
Correct Answer: Treat each data point as an individual cluster.
Explanation:
The algorithm initializes by treating every single data point as a distinct cluster (N clusters for N points).
Incorrect! Try again.
13In Agglomerative Clustering, 'Single Linkage' defines the distance between two clusters as:
A.The distance between their centroids.
B.The average distance between all pairs of points.
C.The minimum distance between any single point in one cluster and any single point in the other.
D.The maximum distance between any single point in one cluster and any single point in the other.
Correct Answer: The minimum distance between any single point in one cluster and any single point in the other.
Explanation:
Single linkage uses the shortest distance between a point in cluster A and a point in cluster B (nearest neighbor approach).
Incorrect! Try again.
14What is a known disadvantage of using Single Linkage in Agglomerative Clustering?
A.It is computationally too fast.
B.It forces clusters to be spherical.
C.It is sensitive to the order of data.
D.It suffers from the 'chaining' effect.
Correct Answer: It suffers from the 'chaining' effect.
Explanation:
Single linkage tends to merge clusters via long, thin chains of points, which can merge distinct groups if a chain of noise points connects them.
Incorrect! Try again.
15Which linkage method in Agglomerative Clustering minimizes the variance of the clusters being merged?
A.Single Linkage
B.Average Linkage
C.Complete Linkage
D.Ward's Method
Correct Answer: Ward's Method
Explanation:
Ward's method merges the two clusters that result in the minimum increase in total within-cluster variance (Sum of Squared Errors).
Incorrect! Try again.
16DBSCAN stands for:
A.Density-Based Spatial Clustering of Applications with Noise
B.Density-Based Statistical Clustering of Applications with Networks
C.Dual-Based Spatial Clustering of Applications with Nodes
D.Distance-Based Spatial Clustering of Algorithms with Noise
Correct Answer: Density-Based Spatial Clustering of Applications with Noise
Explanation:
DBSCAN is an acronym for Density-Based Spatial Clustering of Applications with Noise.
Incorrect! Try again.
17What are the two main hyperparameters required for DBSCAN?
A.Tree depth and number of estimators.
B.Epsilon (eps) and Minimum Points (MinPts).
C.Number of clusters (k) and iterations.
D.Learning rate and batch size.
Correct Answer: Epsilon (eps) and Minimum Points (MinPts).
Explanation:
DBSCAN requires 'eps' (the radius of the neighborhood) and 'MinPts' (the minimum number of points required to form a dense region).
Incorrect! Try again.
18In DBSCAN, a point is classified as a 'Core Point' if:
A.It is reachable from a core point but has fewer than 'MinPts' neighbors.
B.It has at least 'MinPts' neighbors within radius 'eps'.
C.It is far away from all other points.
D.It is the centroid of the data.
Correct Answer: It has at least 'MinPts' neighbors within radius 'eps'.
Explanation:
By definition, a core point is a point that has a dense neighborhood, specifically containing at least MinPts within the Epsilon radius.
Incorrect! Try again.
19How does DBSCAN classify a point that is within the 'eps' radius of a core point but has fewer than 'MinPts' neighbors itself?
A.Core Point
B.Centroid
C.Border Point
D.Noise Point
Correct Answer: Border Point
Explanation:
Border points are part of a cluster (reachable from a core point) but are not dense enough to be core points themselves.
Incorrect! Try again.
20Which of the following is a major advantage of DBSCAN over K-Means?
A.It works well with varying densities.
B.It can discover clusters of arbitrary shapes.
C.It does not require any parameters.
D.It is faster for all dataset sizes.
Correct Answer: It can discover clusters of arbitrary shapes.
Explanation:
Unlike K-Means, which assumes spherical clusters, DBSCAN forms clusters based on density connectivity, allowing it to find non-convex shapes like crescents or rings.
Incorrect! Try again.
21What happens to 'Noise' points in DBSCAN?
A.They are assigned to the largest cluster.
B.They are deleted from the dataset before clustering.
C.They are treated as a separate cluster containing outliers.
D.They are assigned to the nearest cluster.
Correct Answer: They are treated as a separate cluster containing outliers.
Explanation:
DBSCAN explicitly identifies noise points (points not reachable from any core point) and leaves them unassigned to any main cluster, effectively performing outlier detection.
Incorrect! Try again.
22In Isolation Forest, the 'anomaly score' is derived from:
A.The number of points in the epsilon radius.
B.The Euclidean distance to the nearest neighbor.
C.The average path length of the point across the ensemble of trees.
D.The variance of the cluster it belongs to.
Correct Answer: The average path length of the point across the ensemble of trees.
Explanation:
The anomaly score is a function of the average path length; shorter average paths indicate a higher likelihood of being an anomaly.
Incorrect! Try again.
23Which of the following is an example of 'Novelty Detection' rather than 'Outlier Detection'?
A.Finding a malfunction in a machine during a live run based on past failures.
B.Training on only 'normal' images of dogs to detect a cat image during testing.
C.Cleaning a dataset by removing errors.
D.Detecting credit card fraud in historical transaction data.
Correct Answer: Training on only 'normal' images of dogs to detect a cat image during testing.
Explanation:
Novelty detection involves training on a clean dataset (only normal data) and identifying new observations that differ from this training data.
Incorrect! Try again.
24When choosing features for anomaly detection, what is a desirable property?
A.Features should be highly correlated with the anomaly label (if available).
B.Features should take on unusually large or small values for anomalies compared to normal instances.
C.Features should have zero variance.
D.Features should be categorical only.
Correct Answer: Features should take on unusually large or small values for anomalies compared to normal instances.
Explanation:
Good features for anomaly detection allow the algorithm to distinguish normal behavior from abnormal behavior, often manifested as extreme values.
Incorrect! Try again.
25What is the 'curse of dimensionality' in the context of distance-based clustering?
A.High dimensions make visualization easier.
B.The algorithm runs faster as dimensions increase.
C.Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
D.It refers to the difficulty of collecting data.
Correct Answer: Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
Explanation:
In high-dimensional spaces, the volume increases so much that data becomes sparse, and the contrast between the nearest and farthest neighbors diminishes.
Incorrect! Try again.
26In hierarchical clustering, what does 'cutting the tree' determine?
A.The root of the tree.
B.The linkage criteria used.
C.The distance metric used.
D.The number of clusters in the final solution.
Correct Answer: The number of clusters in the final solution.
Explanation:
Cutting the dendrogram at a specific height determines how many vertical lines are intersected, which corresponds to the number of clusters.
Incorrect! Try again.
27Which clustering algorithm is essentially an ensemble of random decision trees?
A.DBSCAN
B.K-Means
C.Agglomerative Clustering
D.Isolation Forest
Correct Answer: Isolation Forest
Explanation:
Isolation Forest builds an ensemble of 'Isolation Trees' (iTrees) to isolate points.
Incorrect! Try again.
28Complete Linkage in Agglomerative Clustering is calculated based on:
A.The distance between centroids.
B.The average distance between all points.
C.The minimum distance between points in two clusters.
D.The maximum distance between points in two clusters.
Correct Answer: The maximum distance between points in two clusters.
Explanation:
Complete linkage considers the farthest distance between pairs of points in two clusters, tending to produce compact, spherical clusters.
Incorrect! Try again.
29If your dataset has clusters with significantly different densities, which algorithm might struggle?
A.Gaussian Mixture Models
B.Isolation Forest
C.Decision Tree
D.DBSCAN
Correct Answer: DBSCAN
Explanation:
Standard DBSCAN uses a global 'epsilon' and 'MinPts'. If clusters have varying densities, a single density threshold cannot capture all clusters effectively.
Incorrect! Try again.
30What is the primary goal of PCA when used as a preprocessing step for clustering?
A.To label the data.
B.To ensure all clusters are the same size.
C.To increase the number of features.
D.To reduce noise and computational complexity by dimensionality reduction.
Correct Answer: To reduce noise and computational complexity by dimensionality reduction.
Explanation:
PCA reduces dimensions by keeping components with high variance, thereby filtering out noise and making clustering algorithms more efficient.
Incorrect! Try again.
31In an Isolation Forest, what is the maximum possible path length for a tree trained on samples?
A.
B.
C.
D.
Correct Answer:
Explanation:
In the worst-case scenario (a completely unbalanced tree), a path could be linear, equal to edges, though average depth is usually logarithmic.
Incorrect! Try again.
32Which supervised learning algorithm is most similar to the concept of Hierarchical Clustering?
A.Decision Trees
B.Support Vector Machines
C.Linear Regression
D.Neural Networks
Correct Answer: Decision Trees
Explanation:
Both Hierarchical Clustering and Decision Trees involve splitting data recursively, resulting in a tree-like structure.
Incorrect! Try again.
33Why is 'Divisive' hierarchical clustering less common than 'Agglomerative'?
A.It is less accurate.
B.It requires labeled data.
C.It cannot produce a dendrogram.
D.It is computationally more expensive ( split possibilities).
Correct Answer: It is computationally more expensive ( split possibilities).
Explanation:
Divisive clustering (Top-down) requires splitting a cluster. Finding the optimal split is computationally intensive compared to merging closest pairs in Agglomerative.
Incorrect! Try again.
34In the context of Anomaly Detection, what is a False Negative?
A.A normal point classified as normal.
B.An anomaly classified as normal.
C.An anomaly classified as an anomaly.
D.A normal point flagged as an anomaly.
Correct Answer: An anomaly classified as normal.
Explanation:
A False Negative in anomaly detection means the algorithm failed to detect an anomaly (it tested negative for the condition, but was actually positive).
Incorrect! Try again.
35Which of the following scenarios is BEST suited for Supervised Learning rather than Anomaly Detection?
A.Manufacturing quality control with 1 defective part per 10,000.
B.Intrusion detection with unknown attack patterns.
C.Detecting new stars in astronomy images.
D.Email spam detection with thousands of examples for both spam and ham.
Correct Answer: Email spam detection with thousands of examples for both spam and ham.
Explanation:
Since there are ample examples of both classes (spam and non-spam), the model can learn the characteristics of both efficiently, making it a supervised problem.
Incorrect! Try again.
36What does the 'MinPts' parameter in DBSCAN represent?
A.The minimum number of points required to form a dense region.
B.The minimum number of clusters to find.
C.The minimum distance between clusters.
D.The minimum number of iterations to run.
Correct Answer: The minimum number of points required to form a dense region.
Explanation:
MinPts defines the threshold for a region to be considered 'dense' enough to be a core part of a cluster.
Incorrect! Try again.
37Which PCA component captures the most variance in the data?
A.The last principal component.
B.The second principal component.
C.The first principal component.
D.All components capture equal variance.
Correct Answer: The first principal component.
Explanation:
By definition, the first principal component is the direction in the feature space along which the data varies the most.
Incorrect! Try again.
38How does Agglomerative Clustering handle outliers?
A.It cannot run if outliers are present.
B.It assigns them to a 'noise' bucket immediately.
C.They are usually merged into clusters very late in the process.
D.It deletes them automatically.
Correct Answer: They are usually merged into clusters very late in the process.
Explanation:
Since outliers are far from other points, they are merged only when the distance threshold becomes very large, appearing near the top of the dendrogram.
Incorrect! Try again.
39In Isolation Forest, subsampling (using a small subset of data to build each tree) helps to:
A.Minimize the effects of swamping and masking.
B.Increase memory usage.
C.Reduce the ability to detect anomalies.
D.Increase the training time.
Correct Answer: Minimize the effects of swamping and masking.
Explanation:
Subsampling improves performance by reducing the likelihood that normal points surround anomalies (swamping) or anomalies hide other anomalies (masking).
Incorrect! Try again.
40Which of the following is true regarding the shape of clusters found by K-Means vs DBSCAN?
A.K-Means tends to find spherical shapes; DBSCAN finds arbitrary shapes.
K-Means minimizes variance from a centroid (spherical), while DBSCAN follows density chains, allowing it to model complex geometric shapes.
Incorrect! Try again.
41When using PCA for anomaly detection, if a point has a very low projection on the principal components but a high reconstruction error, it implies:
A.The point lies far from the subspace defined by the principal components (Anomaly).
B.The point is the mean of the data.
C.The point lies on the principal hyperplane.
D.The point is normal.
Correct Answer: The point lies far from the subspace defined by the principal components (Anomaly).
Explanation:
High reconstruction error means information was lost when projecting to lower dimensions, implying the point does not conform to the correlation structure of normal data.
Incorrect! Try again.
42In hierarchical clustering, what is the time complexity of the standard agglomerative algorithm (naive implementation)?
A.
B.
C.
D.
Correct Answer:
Explanation:
The standard naive implementation involves calculating the distance matrix () and updating it times, leading to . Optimized versions can reach .
Incorrect! Try again.
43For a dataset with varying cluster sizes and significant noise, which algorithm is generally most robust?
A.DBSCAN
B.K-Means
C.Linear Regression
D.Single Linkage Agglomerative Clustering
Correct Answer: DBSCAN
Explanation:
DBSCAN is designed to handle noise explicitly and does not assume equal cluster sizes or shapes.
Incorrect! Try again.
44What is 'masking' in the context of anomaly detection?
A.When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
B.When an anomaly is hidden because it is too similar to normal data.
C.When features are removed from the dataset.
D.When the algorithm runs out of memory.
Correct Answer: When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
Explanation:
Masking occurs when a group of anomalies is dense enough to appear as a normal cluster or affect the isolation process of a single anomaly.
Incorrect! Try again.
45Which of the following is NOT a distance metric commonly used in Hierarchical Clustering?
A.Cosine Similarity
B.Manhattan Distance
C.Euclidean Distance
D.Gini Impurity
Correct Answer: Gini Impurity
Explanation:
Gini Impurity is a metric for split quality in Decision Trees (classification), not a distance metric for clustering.
Incorrect! Try again.
46If 'epsilon' is chosen to be very small in DBSCAN, what is the likely outcome?
A.It will act exactly like K-Means.
B.The algorithm will crash.
C.Most points will be classified as noise/outliers.
D.All points will be in one cluster.
Correct Answer: Most points will be classified as noise/outliers.
Explanation:
If epsilon is too small, the neighborhood of points will not contain enough neighbors to satisfy 'MinPts', resulting in no dense regions and many noise points.
Incorrect! Try again.
47If 'epsilon' is chosen to be very large in DBSCAN, what is the likely outcome?
A.The clusters will be very small.
B.All points will likely be merged into a single cluster.
C.It creates a hierarchical tree.
D.Every point will be a noise point.
Correct Answer: All points will likely be merged into a single cluster.
Explanation:
If epsilon is large enough to cover the whole dataset, every point is reachable from every other point, merging everything into one cluster.
Incorrect! Try again.
48Why is 'Average Linkage' often preferred over Single and Complete Linkage?
A.It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
B.It always produces k=2 clusters.
C.It does not require a distance matrix.
D.It is the fastest method.
Correct Answer: It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
Explanation:
Average linkage uses the average distance between all pairs, making it a compromise that avoids the chaining of Single linkage and the overcrowding of Complete linkage.
Incorrect! Try again.
49In the context of fraud detection, why might one use Supervised Learning over Anomaly Detection?
A.If the dataset is small.
B.If there are absolutely no examples of fraud available.
C.If the fraud patterns change every day completely.
D.If the company has a large, historically labeled database of verified fraud cases.
Correct Answer: If the company has a large, historically labeled database of verified fraud cases.
Explanation:
If sufficient labeled examples of the 'anomalous' class exist, Supervised Learning usually yields better predictive performance than unsupervised anomaly detection.
Incorrect! Try again.
50The 'root' of a dendrogram in hierarchical clustering represents:
A.The noise points.
B.A single cluster containing all data points.
C.The first data point in the set.
D.The cluster with the highest variance.
Correct Answer: A single cluster containing all data points.
Explanation:
The top (root) of the dendrogram represents the final state of agglomerative clustering where all data points have been merged into one unique cluster.