Unit 1 - Practice Quiz

INT396 60 Questions
0 Correct 0 Wrong 60 Left
0/60

1 What is the primary characteristic of the data used in unsupervised learning?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Easy
A. It requires an external reward signal to guide learning.
B. It consists of completely unlabeled data.
C. It contains only sequential data with timestamps.
D. It contains both features and explicitly labeled target variables.

2 Which learning paradigm uses a dataset containing a small amount of labeled data and a large amount of unlabeled data?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Easy
A. Semi-Supervised Learning
B. Unsupervised Learning
C. Supervised Learning
D. Reinforcement Learning

3 If a machine learning model is trained to predict house prices based on historical sales data where the final price is known, which type of learning is this?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Easy
A. Semi-Supervised Learning
B. Self-Supervised Learning
C. Supervised Learning
D. Unsupervised Learning

4 In the mathematical formulation of an unsupervised learning problem, the dataset is typically represented as:

Problem formulation for unsupervised learning Easy
A. where is a reward
B.
C.
D.

5 What is the primary objective when formulating an unsupervised learning problem?

Problem formulation for unsupervised learning Easy
A. To manually assign categories to all incoming data streams.
B. To discover underlying patterns, groupings, or structures within the data.
C. To map input features to specific, predefined output labels accurately.
D. To maximize a numerical reward signal over time.

6 How is unsupervised learning applied in market segmentation?

Real-life use cases: Market segmentation Easy
A. By mapping product images to specific text descriptions.
B. By predicting the exact dollar amount a customer will spend next month.
C. By automatically grouping customers with similar purchasing habits into distinct segments.
D. By classifying whether a transaction is approved or declined by a bank.

7 Why is market segmentation considered an unsupervised learning problem?

Real-life use cases: Market segmentation Easy
A. Because the company already knows exactly which segment each customer belongs to.
B. Because the algorithm discovers natural customer groupings without predefined category labels.
C. Because the data relies heavily on external rewards from ad clicks.
D. Because it requires a large team of humans to manually label the customer data.

8 In customer behavior analysis, what does 'association rule learning' (an unsupervised method) typically discover?

Real-life use cases: Customer behavior analysis Easy
A. The precise date a customer will cancel their subscription.
B. The exact age and gender of an anonymous website visitor.
C. Rules showing which products are frequently bought together (e.g., 'If bread, then butter').
D. The mathematical equation for the company's total annual revenue.

9 When an e-commerce platform groups website visitors based on their browsing paths to better understand user journeys, this is an example of:

Real-life use cases: Customer behavior analysis Easy
A. Reinforcement Learning
B. Customer behavior analysis using unsupervised learning
C. Supervised Image Classification
D. Predictive regression modeling

10 What is the main goal of anomaly detection in data mining?

Real-life use cases: Anomaly & fraud detection Easy
A. To find the most common and frequent items in a dataset.
B. To identify rare data points or events that deviate significantly from normal behavior.
C. To classify images into predefined categories.
D. To calculate the average value of all numerical features.

11 Why is unsupervised learning highly suitable for detecting new, unseen types of credit card fraud?

Real-life use cases: Anomaly & fraud detection Easy
A. Because it trains the fraudster on how to avoid detection.
B. Because it detects deviations from normal spending patterns without needing prior examples of the new fraud.
C. Because it relies on explicitly labeled examples of past fraud.
D. Because fraud detection requires algorithms to generate realistic fake credit cards.

12 In bioinformatics, clustering algorithms are often used to group genes. What is the algorithm attempting to discover?

Real-life use cases: Pattern discovery in biological and social data Easy
A. The English translation of the genetic code.
B. Genes that have similar expression patterns under certain conditions.
C. The specific name of the disease caused by a single gene.
D. The exact age of the organism from which the gene was extracted.

13 In a social network, discovering tightly knit groups of friends or users who interact frequently is known as:

Real-life use cases: Pattern discovery in biological and social data Easy
A. Community detection
B. Regression analysis
C. Time-series forecasting
D. Supervised binary classification

14 Which distance metric represents the shortest straight-line distance between two points in Euclidean space?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Easy
A. Jaccard Similarity
B. Euclidean Distance
C. Manhattan Distance
D. Cosine Similarity

15 Which metric is calculated as the sum of the absolute differences of their Cartesian coordinates?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Easy
A. Cosine Distance
B. Manhattan Distance
C. Euclidean Distance
D. Minkowski Distance with

16 The formula represents which distance metric?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Easy
A. Euclidean Distance
B. Cosine Similarity
C. Manhattan Distance
D. Hamming Distance

17 Cosine similarity measures the similarity between two vectors based on:

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Easy
A. The angle between them in a multidimensional space.
B. The number of identical features they share.
C. The sum of their absolute differences.
D. The straight-line distance between their endpoints.

18 If two vectors point in exactly the same direction, their Cosine similarity is:

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Easy
A. $0$
B. $1$
C.
D.

19 Why is the choice of distance metric crucial in an unsupervised learning algorithm like K-Means?

when and why distance choice matters Easy
A. Because it determines the learning rate of the neural network.
B. Because it defines the mathematical definition of 'similarity', which dictates how data points are grouped.
C. Because unsupervised algorithms cannot run unless distance is measured in miles.
D. Because it acts as the labeled target variable for the model.

20 If you are clustering text documents based on word frequencies, why is Cosine similarity usually preferred over Euclidean distance?

when and why distance choice matters Easy
A. Because text documents are always measured in a 2D space.
B. Because Euclidean distance cannot be computed on numerical data.
C. Because Cosine similarity automatically translates languages.
D. Because Cosine similarity focuses on the orientation (content similarity) rather than the magnitude (document length).

21 A medical research facility has a dataset of 50,000 X-ray images, but only 1,000 of them have been diagnosed and tagged by expert radiologists. The facility wants to build a model to categorize the remaining images. Which learning paradigm is most appropriate for this scenario?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Medium
A. Reinforcement Learning, by rewarding the model for correctly guessing the labels of the 49,000 images.
B. Supervised Learning, by training only on the 1,000 labeled images and ignoring the rest.
C. Semi-Supervised Learning, by using the 1,000 labeled images to guide the learning process over the entire dataset.
D. Unsupervised Learning, by clustering all 50,000 images and ignoring the labels entirely.

22 Which of the following scenarios describes a transition from a supervised learning problem to an unsupervised learning problem?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Medium
A. Switching from using a small set of labeled data and a large set of unlabeled data to using exclusively labeled data.
B. Switching from clustering news articles by similarity to using a pre-trained model to classify them into 'Sports', 'Politics', and 'Tech'.
C. Switching from predicting customer churn based on past data to predicting future sales revenue.
D. Switching from classifying emails as spam or not spam to identifying natural topical groupings in a large corpus of text without using predefined categories.

23 In contrasting learning paradigms, which fundamental characteristic distinguishes how a model's performance is typically evaluated in supervised versus unsupervised learning?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Medium
A. Supervised learning evaluation relies on comparing predictions to known ground-truth labels, whereas unsupervised learning often relies on internal metrics like intra-cluster variance.
B. Supervised learning uses distance metrics like Euclidean distance, whereas unsupervised learning strictly uses loss functions like Cross-Entropy.
C. There is no difference; both paradigms require a hold-out test set with ground-truth labels to evaluate generalization.
D. Supervised learning evaluates the speed of convergence, whereas unsupervised learning evaluates the size of the dataset processed.

24 Let represent a dataset. Which of the following best represents a common mathematical objective in an unsupervised dimensionality reduction task?

Problem formulation for unsupervised learning Medium
A. Find a transformation to project into (where ) such that the variance of the projected data is maximized or reconstruction error is minimized.
B. Maximize the reward function over a sequence of actions taken within the data space.
C. Assign each data point to a class by maximizing the margin between the classes.
D. Find a mapping function , where minimizes the Mean Squared Error against a target variable.

25 In the formulation of a clustering problem, the objective is often to partition a set of observations into sets . What is the typical goal of the objective function in this context?

Problem formulation for unsupervised learning Medium
A. To assign equal numbers of observations to each set regardless of the distance between points.
B. To maximize the distance between data points within the same set .
C. To minimize the distance between the centroids of different sets and .
D. To minimize the within-cluster sum of squares (intra-cluster variance) and maximize the inter-cluster variance.

26 A retail brand applies a clustering algorithm to demographic and purchasing data to achieve market segmentation. What is the primary operational benefit of the output generated by this unsupervised learning task?

Real-life use cases: Market segmentation Medium
A. It perfectly predicts the exact monetary value of the next purchase for every individual customer.
B. It automatically labels which customers will churn in the next 30 days based on historical ground truth.
C. It eliminates the need for capturing future customer demographic data.
D. It identifies distinct customer profiles based on shared behaviors, allowing for highly targeted and personalized marketing campaigns.

27 An e-commerce platform uses association rule mining, an unsupervised learning technique, to analyze customer transaction logs. Which of the following insights is most likely derived from this analysis?

Customer behavior analysis Medium
A. The identification of fraudulent credit card transactions during checkout.
B. The discovery that customers who purchase laptops are 70% more likely to also purchase a wireless mouse in the same transaction.
C. The exact probability that a specific customer will return an item next month.
D. The classification of user reviews into positive, neutral, or negative sentiments.

28 When building an unsupervised anomaly detection system for credit card fraud, the model learns the normal distribution of transactions. How does it identify a potentially fraudulent transaction?

Anomaly & fraud detection Medium
A. By flagging transactions that fall into low-density regions of the learned probability distribution.
B. By classifying the transaction using a decision tree trained on labeled examples of past fraud.
C. By matching the transaction against a hard-coded database of known fraudulent IP addresses.
D. By looking up the user's past history of reported frauds.

29 What is a significant limitation of using strictly unsupervised learning for anomaly detection in a network intrusion system?

Anomaly & fraud detection Medium
A. It often suffers from high false positive rates because novel but benign network behaviors may be flagged as anomalies.
B. It is unable to process high-dimensional network data.
C. It requires a massive, perfectly balanced dataset of both normal and attack traffic.
D. It cannot detect zero-day (previously unseen) attacks.

30 Biologists are analyzing gene expression data (RNA-Seq) across different tissue samples. How can unsupervised learning help them discover new biological patterns?

Pattern discovery in biological and social data Medium
A. By classifying the tissue samples as 'diseased' or 'healthy' using prior clinical labels.
B. By predicting the exact protein structure of a gene based on known homologous structures.
C. By grouping genes that exhibit similar expression profiles across samples, potentially revealing co-regulated gene networks.
D. By determining the precise mutation rate of the DNA sequence over time.

31 In social network analysis, researchers apply a community detection algorithm to a graph of user interactions. What is the fundamental assumption underlying this unsupervised approach?

Pattern discovery in biological and social data Medium
A. Every user must belong to exactly one perfectly sized community.
B. The distance between users is strictly proportional to their geographical distance.
C. Users within a community will have more connections to each other than they do to users outside the community.
D. The algorithm requires labeled examples of 'influencer' users to seed the communities.

32 Consider two data points in a 2D space: and . What are the Euclidean distance and the Manhattan distance between point and point , respectively?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Medium
A. Euclidean: 5, Manhattan: 5
B. Euclidean: 25, Manhattan: 7
C. Euclidean: 5, Manhattan: 7
D. Euclidean: 7, Manhattan: 5

33 Which of the following scenarios best justifies the use of Cosine similarity over Euclidean distance?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Medium
A. When the dataset consists of categorical variables that have been one-hot encoded, and exact matches matter most.
B. When comparing documents represented by TF-IDF vectors, where the length of the document should not heavily influence the similarity.
C. When trying to find the shortest driving path in a city grid.
D. When clustering locations based on GPS coordinates where absolute physical distance is important.

34 Given two non-zero vectors and in , under what condition will the Cosine similarity between them be equal to $0$?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Medium
A. When and point in exactly opposite directions.
B. When and are identical.
C. When and are orthogonal (perpendicular) to each other.
D. When the Euclidean distance between and is exactly $1$.

35 A dataset has two features: Age (ranging from 18 to 80) and Annual Income (ranging from $20,000 to $150,000). If a clustering algorithm uses Euclidean distance on the raw, unscaled data, what is the most likely outcome?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity Medium
A. The Annual Income feature will dominate the distance calculations, making Age almost irrelevant to cluster formation.
B. The algorithm will fail to compute the distance because the units (years vs. dollars) are different.
C. The Age feature will dominate the distance calculations because smaller numbers are squared.
D. The algorithm will naturally balance both features because Euclidean distance is scale-invariant.

36 Why might a data scientist choose Manhattan distance ( norm) over Euclidean distance ( norm) when dealing with a high-dimensional dataset that contains several extreme outliers?

when and why distance choice matters Medium
A. Because Manhattan distance uses absolute differences, making it less sensitive to the large deviations caused by extreme outliers compared to the squared differences of Euclidean distance.
B. Because Manhattan distance inherently reduces the dimensionality of the dataset by ignoring zero-valued features.
C. Because Manhattan distance squares the differences, penalizing outliers more heavily and separating them into their own clusters.
D. Because Manhattan distance always guarantees that the clustering algorithm will converge to a global optimum.

37 In a recommendation system, you want to cluster users based on their ratings of movies. The data is highly sparse because most users have only rated a few movies out of thousands. Which distance/similarity measure is generally most effective here, and why?

when and why distance choice matters Medium
A. Minkowski distance with , because it isolates the single largest difference in movie ratings between two users.
B. Manhattan distance, because it evaluates the exact number of rating differences step-by-step.
C. Cosine similarity, because it focuses on the overlapping non-zero ratings and ignores the massive number of mutual zeros.
D. Euclidean distance, because it treats all unrated movies as zeros and directly computes the spatial distance.

38 A routing algorithm for automated delivery drones must navigate a warehouse composed of strictly arranged parallel and perpendicular aisles. Which distance metric mathematically represents the true navigation path of the drone?

when and why distance choice matters Medium
A. Euclidean distance
B. Cosine similarity
C. Manhattan distance
D. Mahalanobis distance

39 Assume you are clustering data using K-Means, which traditionally minimizes the within-cluster sum of squares (Euclidean distance). If you change the underlying objective to minimize the sum of absolute differences (Manhattan distance), forming the K-Medians algorithm, how does the geometric shape of the theoretical cluster boundaries shift?

when and why distance choice matters Medium
A. The boundaries shift from spherical/circular curves to piece-wise linear, axis-aligned (diamond-like) shapes.
B. The boundaries shift from axis-aligned squares to perfectly spherical curves.
C. The boundaries remain completely unchanged; only the centroid locations differ.
D. The boundaries disappear completely, as Manhattan distance cannot form enclosed regions.

40 Suppose you are comparing two sets of vectors. Set A has dense, normally distributed continuous features. Set B consists of binary feature vectors where a 1 indicates the presence of a trait. Why does the choice of distance matter fundamentally between Set A and Set B?

when and why distance choice matters Medium
A. Set B requires a distance metric that evaluates magnitudes (like Euclidean), whereas Set A requires metrics focusing on logical overlaps (like Jaccard).
B. Set B's binary nature means metrics like Jaccard or Hamming distance provide meaningful interpretations of trait overlap, whereas Euclidean is better suited for Set A's continuous magnitudes.
C. Set A represents categorical data better than Set B, requiring Cosine similarity.
D. The choice does not matter; Euclidean distance is universally optimal regardless of data distribution or type.

41 In semi-supervised learning, algorithms often rely on specific structural assumptions about the underlying data distribution to leverage unlabeled data effectively. Which of the following describes a scenario where applying semi-supervised learning is highly likely to degrade model performance compared to purely supervised learning?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Hard
A. The data strictly follows the manifold assumption, where high-dimensional data lies on a lower-dimensional structure.
B. The marginal distribution is uniformly distributed and provides no information about the conditional distribution .
C. The unlabeled data contains a high degree of missing values that are Missing Completely At Random (MCAR).
D. The cluster assumption holds, meaning points in the same dense region share the same class label.

42 Consider a Positive-Unlabeled (PU) learning scenario formulated to find anomalies. Under what condition can PU learning be theoretically reduced to a standard semi-supervised learning problem?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Hard
A. When the negative class heavily outnumbers the positive class in the unlabeled dataset.
B. When the unlabeled set is drawn from the exact same marginal distribution as the test data.
C. PU learning cannot be reduced to semi-supervised learning because the absence of labeled negatives strictly defines it as unsupervised density estimation.
D. When the Selected Completely At Random (SCAR) assumption holds, meaning labeled positives are a uniform random sample of all true positives.

43 Transductive learning is often contrasted with inductive semi-supervised learning. Which of the following is a strict mathematical limitation of transductive Support Vector Machines (TSVMs) applied to a clustering-like formulation?

Supervised vs. Unsupervised vs. Semi-Supervised Learning Hard
A. They can only operate using linear kernels because the manifold assumption is violated in infinite-dimensional spaces.
B. They require the unlabeled data to be mapped to a strictly orthogonal feature space relative to the labeled data.
C. They optimize the margin over both labeled and unlabeled data but fail to produce a global generalization function for unseen data out-of-sample.
D. They inherently assume that the unlabeled data follows a Gaussian distribution, limiting their use in non-parametric setups.

44 In latent variable models for unsupervised learning, the goal is often to maximize the log-likelihood of the observed data . Because the true posterior of the latent variables is often intractable, variational inference is used. Which inequality forms the fundamental basis for this problem formulation?

Problem formulation for unsupervised learning Hard
A. Markov's Inequality
B. Cauchy-Schwarz Inequality
C. Chebyshev's Inequality
D. Jensen's Inequality

45 When formulating a centroid-based clustering algorithm (like K-Means) as an optimization problem, it is mathematically posed as minimizing the within-cluster sum of squares (WCSS). What makes finding the exact global minimum of this formulation computationally prohibitive?

Problem formulation for unsupervised learning Hard
A. The problem requires computing the pseudo-inverse of a singular covariance matrix at each iteration.
B. The objective function is strictly convex but contains non-differentiable points at the cluster boundaries.
C. The optimization is fundamentally ill-posed because distance metrics fail to satisfy the triangle inequality in Euclidean space.
D. The search space involves combinatorial assignments of points to clusters, making the problem NP-hard for and .

46 An unsupervised anomaly detection system minimizes a reconstruction error function . If the model is an undercomplete linear autoencoder (equivalent to PCA), which of the following best describes the subspace on which projects the data to formulate the 'normal' profile?

Problem formulation for unsupervised learning Hard
A. The subspace spanned by the eigenvectors corresponding to the largest eigenvalues of the data's covariance matrix.
B. The subspace spanned by the eigenvectors corresponding to the smallest eigenvalues of the data's covariance matrix.
C. A non-linear manifold mapping defined by the kernel trick applied to the covariance matrix.
D. The subspace orthogonal to the principal components, capturing maximum data variance.

47 In anomaly detection, particularly in fraud detection systems using distance-based unsupervised methods, the phenomenon of 'swamping' occurs. How is 'swamping' mathematically defined or observed in this context?

Real-life use cases: Market segmentation, Customer behavior analysis, Anomaly & fraud detection, Pattern discovery in biological and social data Hard
A. When distance metrics collapse in high dimensions, causing all pairwise distances between normal and anomalous points to become uniform.
B. When the presence of massive amounts of normal data masks the outliers, pushing the outlier score below the detection threshold.
C. When normal instances are incorrectly classified as anomalies because they are drawn into the sparse feature space heavily influenced by true outliers.
D. When anomalous points are clustered so tightly together that they artificially inflate the local density, appearing as a normal cluster.

48 When applying unsupervised learning for pattern discovery in biological data (e.g., gene expression microarrays), 'biclustering' is often preferred over standard clustering. What specific problem formulation makes biclustering uniquely suited for this use case?

Real-life use cases: Market segmentation, Customer behavior analysis, Anomaly & fraud detection, Pattern discovery in biological and social data Hard
A. It utilizes non-Euclidean distance metrics exclusively to account for non-linear gene interactions.
B. It forces the clusters into a hierarchical binary tree structure, aligning with phylogenetic evolutionary mapping.
C. It simultaneously clusters rows (genes) and columns (conditions), discovering genes that exhibit similar behavior only under a specific subset of conditions.
D. It automatically projects the data into a 2-dimensional latent space to handle the high sparsity of biological matrices.

49 A bank uses K-means for market segmentation based on continuous customer behavioral features. To evaluate the quality of the unsupervised segmentation, the data science team uses the Silhouette Coefficient. In which of the following edge cases will the Silhouette Coefficient misleadingly report a near-zero or negative score despite the clusters being perfectly separated?

Real-life use cases: Market segmentation, Customer behavior analysis, Anomaly & fraud detection, Pattern discovery in biological and social data Hard
A. When the clusters are densely packed spherical distributions with identical variances.
B. When the clusters form concentric rings (e.g., non-convex geometries) and rely on density-based separation.
C. When the number of features (dimensions) far exceeds the number of observations ().
D. When all features have been rigorously standardized to have a mean of 0 and a variance of 1.

50 In customer behavior analysis, sequential pattern mining (e.g., PrefixSpan) is used to find frequent subsequences of purchases. If a transaction dataset has a very high diversity of items but short transaction lengths, why might an unsupervised frequent itemset mining algorithm like Apriori fail computationally while a sequential pattern approach is required?

Real-life use cases: Market segmentation, Customer behavior analysis, Anomaly & fraud detection, Pattern discovery in biological and social data Hard
A. Short transaction lengths inherently violate the anti-monotonicity property of the support measure.
B. Sequential pattern mining models strictly assume a Gaussian distribution of item frequencies.
C. Apriori cannot handle categorical variables without one-hot encoding, leading to memory overflow.
D. Apriori generates a massive number of unpruned candidate itemsets at lower levels before finding support, causing a combinatorial explosion.

51 When building an autoencoder for fraud detection on credit card transactions, the training dataset consists exclusively of 'normal' transactions. If the autoencoder is designed with a massive latent dimension (overcomplete) without proper regularization, what will be the expected outcome during inference on real-world data containing fraud?

Real-life use cases: Market segmentation, Customer behavior analysis, Anomaly & fraud detection, Pattern discovery in biological and social data Hard
A. The model will naturally approximate a linear PCA projection, maintaining standard anomaly detection capabilities.
B. The model will enforce extreme sparsity, causing normal transactions to have higher reconstruction errors than fraudulent ones.
C. The model will achieve a near-zero reconstruction error for both normal and fraudulent transactions, failing to detect fraud.
D. The model will easily detect fraud because the reconstruction error for all data points will universally increase.

52 Let and be two real-valued feature vectors representing text documents. If and are both strictly -normalized such that and , what is the exact mathematical relationship between their squared Euclidean distance and their Cosine similarity ?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A.
B.
C.
D.

53 According to the phenomenon associated with the 'curse of dimensionality' in distance metric spaces, as the dimensionality , what happens to the ratio for a given query point (where and are the maximum and minimum distances to other points)?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. It oscillates unpredictably, which is why Cosine similarity is exclusively used in high dimensions.
B. It approaches $0$, meaning the distance to the nearest neighbor and the farthest neighbor become virtually indistinguishable.
C. It approaches , meaning nearest neighbors become exponentially identifiable.
D. It converges to a non-zero constant dependent solely on the chosen norm.

54 A data scientist is designing a custom clustering algorithm using 'Cosine Distance', defined as . They intend to use a metric tree (e.g., Ball Tree) to speed up neighbor searches. Why will this approach fundamentally fail or yield incorrect optimizations?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. Cosine Distance cannot handle negative feature values, breaking the metric tree's split criteria.
B. Cosine Distance is mathematically equivalent to the norm, rendering the spherical bounds of a Ball Tree inefficient.
C. Cosine Distance forces the metric tree to map all data into an infinite-dimensional Hilbert space.
D. Cosine Distance does not satisfy the triangle inequality, which is a strict requirement for metric trees.

55 When performing clustering on high-dimensional data, researchers sometimes prefer fractional distance metrics ( norm where ) over standard Euclidean () or Manhattan () metrics. What is the primary theoretical justification for this choice?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. Fractional norms are less sensitive to the 'distance concentration' effect, providing better relative contrast between near and far points in high dimensions.
B. Fractional norms are computationally cheaper to compute because they bypass the need for floating-point exponentiation.
C. Fractional norms implicitly perform feature scaling, removing the need for standardization prior to clustering.
D. Fractional norms guarantee the convexity of the cluster boundaries, ensuring global convergence of K-Means.

56 In a dataset with highly correlated features with differing variances, standard Euclidean distance often creates skewed, elongated clusters. A common mitigation is to use the Mahalanobis distance. Which of the following data preprocessing steps followed by standard Euclidean distance is mathematically equivalent to computing the Mahalanobis distance on the original data?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. Applying an vector normalization to all data points to project them onto a unit hypersphere.
B. Min-Max scaling all features to the range .
C. Applying independent Z-score normalization (standardization) to each feature individually.
D. Transforming the data using Principal Component Analysis (PCA) and then dividing each principal component by its standard deviation (whitening).

57 Consider the Minkowski distance metric . As the parameter approaches infinity (), which widely known distance metric does this equation mathematically converge to?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. Mahalanobis distance
B. Chebyshev distance ( norm)
C. Manhattan distance ( norm)
D. Cosine distance

58 Why is Cosine similarity typically preferred over Euclidean distance when analyzing document similarity using TF-IDF (Term Frequency-Inverse Document Frequency) vectors?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. Because Euclidean distance is invalid for categorical distributions, whereas Cosine similarity implicitly performs a cross-entropy calculation.
B. Because TF-IDF requires non-negative distances, and Cosine similarity ensures distances strictly remain between 0 and 1, unlike Euclidean.
C. Because Euclidean distance is highly sensitive to the magnitude of the vectors, meaning a long document and a short document with the same topic distribution would appear artificially distant.
D. Because TF-IDF vectors are inherently dense, and Cosine similarity executes faster on dense matrix multiplications than Euclidean distance.

59 If an unsupervised algorithm relies on updating cluster centroids using the arithmetic mean of the assigned points, but uses Cosine similarity for assignments (e.g., Spherical K-Means), what critical adjustment must be made to the centroid update step to maintain mathematical consistency?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. The centroids must be updated using the geometric mean rather than the arithmetic mean.
B. The centroids must be defined using the median of the vectors rather than the mean to minimize angular distance.
C. The centroids must be shifted by a constant factor of to account for angular variance.
D. The computed arithmetic mean centroid must be -normalized to project it back onto the unit hypersphere.

60 A dataset contains features representing specific geolocational grids in a city, where travel is strictly limited to an orthogonal street network. If one attempts to use Euclidean distance to cluster optimal distribution hubs, what specific topological error is being introduced?

Distance & similarity metrics: Euclidean, Manhattan, Cosine similarity, when and why distance choice matters Hard
A. The model ignores the triangle inequality, allowing points to map outside the permissible grid.
B. The model will overestimate distances because the norm is strictly greater than the norm.
C. The model fails to account for the curse of dimensionality, causing grid points to collapse into a singular origin.
D. The model assumes traversal along the hypotenuse, systematically underestimating the true traversal cost between coordinate points.