1What is the fundamental principle behind the Isolation Forest algorithm for anomaly detection?
A.It groups similar data points into high-density clusters.
B.It isolates anomalies by randomly selecting a feature and a split value.
C.It projects data onto a lower-dimensional hyperplane to find outliers.
D.It calculates the distance of each point to the k-nearest neighbors.
Correct Answer: It isolates anomalies by randomly selecting a feature and a split value.
Explanation:Isolation Forest works on the principle that anomalies are 'few and different,' making them easier to isolate with fewer random partitions than normal points.
Incorrect! Try again.
2In an Isolation Forest, how are anomalies distinguished from normal observations based on tree structure?
A.Anomalies have longer path lengths from the root.
B.Anomalies have shorter path lengths from the root.
C.Anomalies end up in the largest leaf nodes.
D.Anomalies are always found at the root node.
Correct Answer: Anomalies have shorter path lengths from the root.
Explanation:Because anomalies are distinct and rare, they require fewer random splits to be isolated, resulting in shorter path lengths in the trees.
Incorrect! Try again.
3Which of the following is a primary advantage of Isolation Forest over distance-based anomaly detection methods?
A.It requires labeled data for training.
B.It is computationally expensive for high-dimensional data.
C.It has linear time complexity and handles high-dimensional data well.
D.It calculates the density of every point precisely.
Correct Answer: It has linear time complexity and handles high-dimensional data well.
Explanation:Isolation Forest does not rely on expensive distance calculations between all points, making it efficient (linear time complexity) and effective in high dimensions.
Incorrect! Try again.
4When deciding between Anomaly Detection and Supervised Learning, which scenario favors Anomaly Detection?
A.When the dataset is balanced with equal positive and negative examples.
B.When the number of positive examples (anomalies) is very small compared to negative examples.
C.When you have a massive amount of labeled data for all classes.
D.When the anomalies look exactly like the normal data.
Correct Answer: When the number of positive examples (anomalies) is very small compared to negative examples.
Explanation:Anomaly detection is preferred when the 'positive' class (anomalies) is extremely rare or when the nature of future anomalies is unknown.
Incorrect! Try again.
5In the context of supervised learning vs. anomaly detection, what is a 'skewed class' problem?
A.When the data has too many features.
B.When one class has significantly more samples than the other.
C.When the data is not normalized.
D.When the decision boundary is non-linear.
Correct Answer: When one class has significantly more samples than the other.
Explanation:A skewed class problem occurs when the distribution of classes is highly imbalanced, such as having 99.9% normal transactions and 0.1% fraudulent ones.
Incorrect! Try again.
6Which metric is generally NOT suitable for evaluating a model trained on a highly skewed dataset (anomaly detection scenario)?
A.Precision
B.Recall
C.F1-Score
D.Accuracy
Correct Answer: Accuracy
Explanation:In highly skewed datasets, a model that predicts 'normal' for every instance can achieve high accuracy (e.g., 99%) while failing to detect any anomalies.
Incorrect! Try again.
7How can Principal Component Analysis (PCA) be used for anomaly detection?
A.By increasing the number of dimensions to separate points.
B.By identifying points with a high reconstruction error.
C.By clustering points based on the first principal component only.
D.By labeling the data using eigenvectors.
Correct Answer: By identifying points with a high reconstruction error.
Explanation:PCA captures the normal variance of data. Anomalies often cannot be well-reconstructed using the principal components, leading to a high reconstruction error.
Incorrect! Try again.
8Why is feature scaling (e.g., Mean Normalization) critical before applying PCA for anomaly detection?
A.PCA is a tree-based algorithm and requires scaling.
B.PCA seeks to maximize variance, so features with larger scales will dominate.
C.PCA only works with categorical data.
D.It ensures the reconstruction error is always zero.
Correct Answer: PCA seeks to maximize variance, so features with larger scales will dominate.
Explanation:PCA projects data in the direction of maximum variance. If features are not scaled, variables with large absolute values will dominate the principal components purely due to their scale.
Incorrect! Try again.
9In Hierarchical Clustering, what is the visual representation of the cluster hierarchy called?
A.Histogram
B.Scatter Plot
C.Dendrogram
D.Heatmap
Correct Answer: Dendrogram
Explanation:A dendrogram is a tree-like diagram that records the sequences of merges or splits in hierarchical clustering.
Incorrect! Try again.
10Which of the following is NOT a requirement for Hierarchical Clustering?
A.A distance metric.
B.A linkage criterion.
C.Specifying the number of clusters (k) beforehand.
D.A dataset of points.
Correct Answer: Specifying the number of clusters (k) beforehand.
Explanation:Unlike K-Means, hierarchical clustering does not require the number of clusters to be specified in advance; the number of clusters is determined by cutting the dendrogram.
Incorrect! Try again.
11Agglomerative Clustering is often referred to as a strategy of which type?
A.Top-down
B.Bottom-up
C.Divide and conquer
D.Density-based
Correct Answer: Bottom-up
Explanation:Agglomerative clustering starts with every point as its own cluster and iteratively merges the closest pairs, hence a 'bottom-up' approach.
Incorrect! Try again.
12What is the first step in Agglomerative Clustering?
A.Assign all points to a single cluster.
B.Calculate the centroid of the entire dataset.
C.Treat each data point as an individual cluster.
D.Randomly pick k centroids.
Correct Answer: Treat each data point as an individual cluster.
Explanation:The algorithm initializes by treating every single data point as a distinct cluster (N clusters for N points).
Incorrect! Try again.
13In Agglomerative Clustering, 'Single Linkage' defines the distance between two clusters as:
A.The distance between their centroids.
B.The maximum distance between any single point in one cluster and any single point in the other.
C.The minimum distance between any single point in one cluster and any single point in the other.
D.The average distance between all pairs of points.
Correct Answer: The minimum distance between any single point in one cluster and any single point in the other.
Explanation:Single linkage uses the shortest distance between a point in cluster A and a point in cluster B (nearest neighbor approach).
Incorrect! Try again.
14What is a known disadvantage of using Single Linkage in Agglomerative Clustering?
A.It forces clusters to be spherical.
B.It is sensitive to the order of data.
C.It suffers from the 'chaining' effect.
D.It is computationally too fast.
Correct Answer: It suffers from the 'chaining' effect.
Explanation:Single linkage tends to merge clusters via long, thin chains of points, which can merge distinct groups if a chain of noise points connects them.
Incorrect! Try again.
15Which linkage method in Agglomerative Clustering minimizes the variance of the clusters being merged?
A.Single Linkage
B.Complete Linkage
C.Average Linkage
D.Ward's Method
Correct Answer: Ward's Method
Explanation:Ward's method merges the two clusters that result in the minimum increase in total within-cluster variance (Sum of Squared Errors).
Incorrect! Try again.
16DBSCAN stands for:
A.Density-Based Spatial Clustering of Applications with Noise
B.Distance-Based Spatial Clustering of Algorithms with Noise
C.Density-Based Statistical Clustering of Applications with Networks
D.Dual-Based Spatial Clustering of Applications with Nodes
Correct Answer: Density-Based Spatial Clustering of Applications with Noise
Explanation:DBSCAN is an acronym for Density-Based Spatial Clustering of Applications with Noise.
Incorrect! Try again.
17What are the two main hyperparameters required for DBSCAN?
A.Number of clusters (k) and iterations.
B.Epsilon (eps) and Minimum Points (MinPts).
C.Learning rate and batch size.
D.Tree depth and number of estimators.
Correct Answer: Epsilon (eps) and Minimum Points (MinPts).
Explanation:DBSCAN requires 'eps' (the radius of the neighborhood) and 'MinPts' (the minimum number of points required to form a dense region).
Incorrect! Try again.
18In DBSCAN, a point is classified as a 'Core Point' if:
A.It is the centroid of the data.
B.It has at least 'MinPts' neighbors within radius 'eps'.
C.It is reachable from a core point but has fewer than 'MinPts' neighbors.
D.It is far away from all other points.
Correct Answer: It has at least 'MinPts' neighbors within radius 'eps'.
Explanation:By definition, a core point is a point that has a dense neighborhood, specifically containing at least MinPts within the Epsilon radius.
Incorrect! Try again.
19How does DBSCAN classify a point that is within the 'eps' radius of a core point but has fewer than 'MinPts' neighbors itself?
A.Core Point
B.Noise Point
C.Border Point
D.Centroid
Correct Answer: Border Point
Explanation:Border points are part of a cluster (reachable from a core point) but are not dense enough to be core points themselves.
Incorrect! Try again.
20Which of the following is a major advantage of DBSCAN over K-Means?
A.It is faster for all dataset sizes.
B.It can discover clusters of arbitrary shapes.
C.It works well with varying densities.
D.It does not require any parameters.
Correct Answer: It can discover clusters of arbitrary shapes.
Explanation:Unlike K-Means, which assumes spherical clusters, DBSCAN forms clusters based on density connectivity, allowing it to find non-convex shapes like crescents or rings.
Incorrect! Try again.
21What happens to 'Noise' points in DBSCAN?
A.They are assigned to the nearest cluster.
B.They are treated as a separate cluster containing outliers.
C.They are deleted from the dataset before clustering.
D.They are assigned to the largest cluster.
Correct Answer: They are treated as a separate cluster containing outliers.
Explanation:DBSCAN explicitly identifies noise points (points not reachable from any core point) and leaves them unassigned to any main cluster, effectively performing outlier detection.
Incorrect! Try again.
22In Isolation Forest, the 'anomaly score' is derived from:
A.The Euclidean distance to the nearest neighbor.
B.The number of points in the epsilon radius.
C.The average path length of the point across the ensemble of trees.
D.The variance of the cluster it belongs to.
Correct Answer: The average path length of the point across the ensemble of trees.
Explanation:The anomaly score is a function of the average path length; shorter average paths indicate a higher likelihood of being an anomaly.
Incorrect! Try again.
23Which of the following is an example of 'Novelty Detection' rather than 'Outlier Detection'?
A.Detecting credit card fraud in historical transaction data.
B.Cleaning a dataset by removing errors.
C.Training on only 'normal' images of dogs to detect a cat image during testing.
D.Finding a malfunction in a machine during a live run based on past failures.
Correct Answer: Training on only 'normal' images of dogs to detect a cat image during testing.
Explanation:Novelty detection involves training on a clean dataset (only normal data) and identifying new observations that differ from this training data.
Incorrect! Try again.
24When choosing features for anomaly detection, what is a desirable property?
A.Features should be highly correlated with the anomaly label (if available).
B.Features should take on unusually large or small values for anomalies compared to normal instances.
C.Features should be categorical only.
D.Features should have zero variance.
Correct Answer: Features should take on unusually large or small values for anomalies compared to normal instances.
Explanation:Good features for anomaly detection allow the algorithm to distinguish normal behavior from abnormal behavior, often manifested as extreme values.
Incorrect! Try again.
25What is the 'curse of dimensionality' in the context of distance-based clustering?
A.Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
B.The algorithm runs faster as dimensions increase.
C.High dimensions make visualization easier.
D.It refers to the difficulty of collecting data.
Correct Answer: Distance metrics become less meaningful as dimensions increase, making all points appear equidistant.
Explanation:In high-dimensional spaces, the volume increases so much that data becomes sparse, and the contrast between the nearest and farthest neighbors diminishes.
Incorrect! Try again.
26In hierarchical clustering, what does 'cutting the tree' determine?
A.The linkage criteria used.
B.The number of clusters in the final solution.
C.The distance metric used.
D.The root of the tree.
Correct Answer: The number of clusters in the final solution.
Explanation:Cutting the dendrogram at a specific height determines how many vertical lines are intersected, which corresponds to the number of clusters.
Incorrect! Try again.
27Which clustering algorithm is essentially an ensemble of random decision trees?
A.K-Means
B.DBSCAN
C.Isolation Forest
D.Agglomerative Clustering
Correct Answer: Isolation Forest
Explanation:Isolation Forest builds an ensemble of 'Isolation Trees' (iTrees) to isolate points.
Incorrect! Try again.
28Complete Linkage in Agglomerative Clustering is calculated based on:
A.The maximum distance between points in two clusters.
B.The minimum distance between points in two clusters.
C.The average distance between all points.
D.The distance between centroids.
Correct Answer: The maximum distance between points in two clusters.
Explanation:Complete linkage considers the farthest distance between pairs of points in two clusters, tending to produce compact, spherical clusters.
Incorrect! Try again.
29If your dataset has clusters with significantly different densities, which algorithm might struggle?
A.Gaussian Mixture Models
B.DBSCAN
C.Decision Tree
D.Isolation Forest
Correct Answer: DBSCAN
Explanation:Standard DBSCAN uses a global 'epsilon' and 'MinPts'. If clusters have varying densities, a single density threshold cannot capture all clusters effectively.
Incorrect! Try again.
30What is the primary goal of PCA when used as a preprocessing step for clustering?
A.To increase the number of features.
B.To label the data.
C.To reduce noise and computational complexity by dimensionality reduction.
D.To ensure all clusters are the same size.
Correct Answer: To reduce noise and computational complexity by dimensionality reduction.
Explanation:PCA reduces dimensions by keeping components with high variance, thereby filtering out noise and making clustering algorithms more efficient.
Incorrect! Try again.
31In an Isolation Forest, what is the maximum possible path length for a tree trained on samples?
A.
B.
C.
D.
Correct Answer:
Explanation:In the worst-case scenario (a completely unbalanced tree), a path could be linear, equal to edges, though average depth is usually logarithmic.
Incorrect! Try again.
32Which supervised learning algorithm is most similar to the concept of Hierarchical Clustering?
A.Linear Regression
B.Decision Trees
C.Support Vector Machines
D.Neural Networks
Correct Answer: Decision Trees
Explanation:Both Hierarchical Clustering and Decision Trees involve splitting data recursively, resulting in a tree-like structure.
Incorrect! Try again.
33Why is 'Divisive' hierarchical clustering less common than 'Agglomerative'?
A.It is less accurate.
B.It is computationally more expensive ( split possibilities).
C.It cannot produce a dendrogram.
D.It requires labeled data.
Correct Answer: It is computationally more expensive ( split possibilities).
Explanation:Divisive clustering (Top-down) requires splitting a cluster. Finding the optimal split is computationally intensive compared to merging closest pairs in Agglomerative.
Incorrect! Try again.
34In the context of Anomaly Detection, what is a False Negative?
A.A normal point flagged as an anomaly.
B.An anomaly classified as normal.
C.A normal point classified as normal.
D.An anomaly classified as an anomaly.
Correct Answer: An anomaly classified as normal.
Explanation:A False Negative in anomaly detection means the algorithm failed to detect an anomaly (it tested negative for the condition, but was actually positive).
Incorrect! Try again.
35Which of the following scenarios is BEST suited for Supervised Learning rather than Anomaly Detection?
A.Manufacturing quality control with 1 defective part per 10,000.
B.Email spam detection with thousands of examples for both spam and ham.
C.Intrusion detection with unknown attack patterns.
D.Detecting new stars in astronomy images.
Correct Answer: Email spam detection with thousands of examples for both spam and ham.
Explanation:Since there are ample examples of both classes (spam and non-spam), the model can learn the characteristics of both efficiently, making it a supervised problem.
Incorrect! Try again.
36What does the 'MinPts' parameter in DBSCAN represent?
A.The minimum distance between clusters.
B.The minimum number of points required to form a dense region.
C.The minimum number of clusters to find.
D.The minimum number of iterations to run.
Correct Answer: The minimum number of points required to form a dense region.
Explanation:MinPts defines the threshold for a region to be considered 'dense' enough to be a core part of a cluster.
Incorrect! Try again.
37Which PCA component captures the most variance in the data?
A.The last principal component.
B.The first principal component.
C.The second principal component.
D.All components capture equal variance.
Correct Answer: The first principal component.
Explanation:By definition, the first principal component is the direction in the feature space along which the data varies the most.
Incorrect! Try again.
38How does Agglomerative Clustering handle outliers?
A.It deletes them automatically.
B.They are usually merged into clusters very late in the process.
C.It assigns them to a 'noise' bucket immediately.
D.It cannot run if outliers are present.
Correct Answer: They are usually merged into clusters very late in the process.
Explanation:Since outliers are far from other points, they are merged only when the distance threshold becomes very large, appearing near the top of the dendrogram.
Incorrect! Try again.
39In Isolation Forest, subsampling (using a small subset of data to build each tree) helps to:
A.Increase the training time.
B.Reduce the ability to detect anomalies.
C.Minimize the effects of swamping and masking.
D.Increase memory usage.
Correct Answer: Minimize the effects of swamping and masking.
Explanation:Subsampling improves performance by reducing the likelihood that normal points surround anomalies (swamping) or anomalies hide other anomalies (masking).
Incorrect! Try again.
40Which of the following is true regarding the shape of clusters found by K-Means vs DBSCAN?
Explanation:K-Means minimizes variance from a centroid (spherical), while DBSCAN follows density chains, allowing it to model complex geometric shapes.
Incorrect! Try again.
41When using PCA for anomaly detection, if a point has a very low projection on the principal components but a high reconstruction error, it implies:
A.The point is normal.
B.The point lies on the principal hyperplane.
C.The point lies far from the subspace defined by the principal components (Anomaly).
D.The point is the mean of the data.
Correct Answer: The point lies far from the subspace defined by the principal components (Anomaly).
Explanation:High reconstruction error means information was lost when projecting to lower dimensions, implying the point does not conform to the correlation structure of normal data.
Incorrect! Try again.
42In hierarchical clustering, what is the time complexity of the standard agglomerative algorithm (naive implementation)?
A.
B.
C.
D.
Correct Answer:
Explanation:The standard naive implementation involves calculating the distance matrix () and updating it times, leading to . Optimized versions can reach .
Incorrect! Try again.
43For a dataset with varying cluster sizes and significant noise, which algorithm is generally most robust?
A.K-Means
B.DBSCAN
C.Single Linkage Agglomerative Clustering
D.Linear Regression
Correct Answer: DBSCAN
Explanation:DBSCAN is designed to handle noise explicitly and does not assume equal cluster sizes or shapes.
Incorrect! Try again.
44What is 'masking' in the context of anomaly detection?
A.When an anomaly is hidden because it is too similar to normal data.
B.When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
C.When features are removed from the dataset.
D.When the algorithm runs out of memory.
Correct Answer: When the presence of a cluster of anomalies makes it difficult to detect individual anomalies.
Explanation:Masking occurs when a group of anomalies is dense enough to appear as a normal cluster or affect the isolation process of a single anomaly.
Incorrect! Try again.
45Which of the following is NOT a distance metric commonly used in Hierarchical Clustering?
A.Euclidean Distance
B.Manhattan Distance
C.Cosine Similarity
D.Gini Impurity
Correct Answer: Gini Impurity
Explanation:Gini Impurity is a metric for split quality in Decision Trees (classification), not a distance metric for clustering.
Incorrect! Try again.
46If 'epsilon' is chosen to be very small in DBSCAN, what is the likely outcome?
A.All points will be in one cluster.
B.Most points will be classified as noise/outliers.
C.The algorithm will crash.
D.It will act exactly like K-Means.
Correct Answer: Most points will be classified as noise/outliers.
Explanation:If epsilon is too small, the neighborhood of points will not contain enough neighbors to satisfy 'MinPts', resulting in no dense regions and many noise points.
Incorrect! Try again.
47If 'epsilon' is chosen to be very large in DBSCAN, what is the likely outcome?
A.All points will likely be merged into a single cluster.
B.Every point will be a noise point.
C.The clusters will be very small.
D.It creates a hierarchical tree.
Correct Answer: All points will likely be merged into a single cluster.
Explanation:If epsilon is large enough to cover the whole dataset, every point is reachable from every other point, merging everything into one cluster.
Incorrect! Try again.
48Why is 'Average Linkage' often preferred over Single and Complete Linkage?
A.It is the fastest method.
B.It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
C.It does not require a distance matrix.
D.It always produces k=2 clusters.
Correct Answer: It balances the extremes of chaining (Single) and sensitivity to outliers (Complete).
Explanation:Average linkage uses the average distance between all pairs, making it a compromise that avoids the chaining of Single linkage and the overcrowding of Complete linkage.
Incorrect! Try again.
49In the context of fraud detection, why might one use Supervised Learning over Anomaly Detection?
A.If there are absolutely no examples of fraud available.
B.If the fraud patterns change every day completely.
C.If the company has a large, historically labeled database of verified fraud cases.
D.If the dataset is small.
Correct Answer: If the company has a large, historically labeled database of verified fraud cases.
Explanation:If sufficient labeled examples of the 'anomalous' class exist, Supervised Learning usually yields better predictive performance than unsupervised anomaly detection.
Incorrect! Try again.
50The 'root' of a dendrogram in hierarchical clustering represents:
A.The first data point in the set.
B.A single cluster containing all data points.
C.The cluster with the highest variance.
D.The noise points.
Correct Answer: A single cluster containing all data points.
Explanation:The top (root) of the dendrogram represents the final state of agglomerative clustering where all data points have been merged into one unique cluster.
Incorrect! Try again.
Give Feedback
Help us improve by sharing your thoughts or reporting issues.