1What is a local feature in the context of computer vision?
Overview of feature detection
Easy
A.A completely flat, uniform region in an image
B.A metadata tag describing the image content
C.An image pattern that is easily distinguishable from its immediate neighborhood
D.The overall color histogram of the entire image
Correct Answer: An image pattern that is easily distinguishable from its immediate neighborhood
Explanation:
A local feature is a specific, distinguishable patch or pattern in an image, such as an edge, corner, or interesting point, that differs from its immediate surroundings.
Incorrect! Try again.
2What type of image structure is the Harris detector primarily designed to find?
Harris corner detector
Easy
A.Corners
B.Smooth gradients
C.Flat uniform regions
D.Straight horizontal lines
Correct Answer: Corners
Explanation:
The Harris corner detector calculates intensity variations in all directions to specifically identify corners in an image.
Incorrect! Try again.
3What does the acronym SIFT stand for in computer vision?
scale invariant feature transform
Easy
A.Scale-Invariant Feature Transform
B.Spatial Intensity Feature Transform
C.Speeded-up Invariant Feature Test
D.Standard Image Feature Tool
Correct Answer: Scale-Invariant Feature Transform
Explanation:
SIFT stands for Scale-Invariant Feature Transform, an algorithm used to detect and describe local features in images.
Incorrect! Try again.
4The SURF algorithm was primarily designed as a faster alternative to which popular feature detector?
speeded up robust features
Easy
A.HOG
B.RANSAC
C.BRIEF
D.SIFT
Correct Answer: SIFT
Explanation:
SURF (Speeded Up Robust Features) is a patented local feature detector and descriptor that performs a similar task to SIFT but is computationally much faster.
Incorrect! Try again.
5What is the primary advantage of using binary feature descriptors over floating-point descriptors?
binary feature detectors
Easy
A.They are highly efficient to compute and match
B.They provide a continuously varying gradient map
C.They capture more color information
D.They are entirely unaffected by extreme lighting changes
Correct Answer: They are highly efficient to compute and match
Explanation:
Binary descriptors represent features as strings of 1s and 0s, making them extremely fast to compute and match using simple bitwise operations.
Incorrect! Try again.
6How does the FAST algorithm typically identify a corner?
FAST
Easy
A.By calculating the second derivative of the entire image
B.By comparing a central pixel's intensity to a circular ring of surrounding pixels
C.By analyzing the frequency domain using a Fourier transform
D.By building a histogram of oriented gradients
Correct Answer: By comparing a central pixel's intensity to a circular ring of surrounding pixels
Explanation:
FAST (Features from Accelerated Segment Test) detects corners by examining a circle of 16 pixels around a candidate pixel and checking if a contiguous set of pixels is brighter or darker than the center.
Incorrect! Try again.
7BRIEF creates an image descriptor by performing what specific operation?
BRIEF
Easy
A.Computing gradient histograms
B.Calculating complex matrix eigenvalues
C.Extracting scale-space extrema using Difference of Gaussians
D.Simple binary intensity comparisons between random pixel pairs
Correct Answer: Simple binary intensity comparisons between random pixel pairs
Explanation:
BRIEF (Binary Robust Independent Elementary Features) generates a binary string descriptor by performing simple intensity comparisons between predefined pairs of pixels in a patch.
Incorrect! Try again.
8ORB was introduced as a free and efficient alternative to SIFT and SURF. What two algorithms does it combine?
ORB
Easy
A.FAST and BRIEF
B.Harris and HOG
C.RANSAC and SIFT
D.K-D tree and LSH
Correct Answer: FAST and BRIEF
Explanation:
ORB (Oriented FAST and Rotated BRIEF) combines the FAST keypoint detector and the BRIEF descriptor with modifications to handle rotation.
Incorrect! Try again.
9What does HOG stand for in computer vision?
HOG
Easy
A.Highly Optimized Graphics
B.Heuristic Object Generator
C.Histogram of Oriented Gradients
D.Hashing of Grayscale
Correct Answer: Histogram of Oriented Gradients
Explanation:
HOG stands for Histogram of Oriented Gradients, a feature descriptor used primarily for object detection.
Incorrect! Try again.
10Which of the following is a common real-world application of feature detection and description?
applications of descriptors
Easy
A.Typing text into a document
B.Adjusting monitor brightness
C.Converting images from JPEG to PNG
D.Image panorama stitching
Correct Answer: Image panorama stitching
Explanation:
Feature detection and description are widely used in panorama stitching to find overlapping points between different images and align them correctly.
Incorrect! Try again.
11What is the primary goal of feature matching?
Overview of feature matching
Easy
A.To compress an image size
B.To establish correspondences between features in two or more images
C.To apply color filters to an image
D.To detect human faces only
Correct Answer: To establish correspondences between features in two or more images
Explanation:
Feature matching is the process of finding corresponding features across different images, which is essential for tasks like 3D reconstruction and tracking.
Incorrect! Try again.
12Which similarity measure is computationally best suited for comparing binary feature descriptors like ORB and BRIEF?
similarity measures
Easy
A.Mahalanobis distance
B.Cosine similarity
C.Hamming distance
D.Euclidean distance
Correct Answer: Hamming distance
Explanation:
The Hamming distance counts the number of differing bits between two binary strings. It is extremely fast to compute using the XOR operation, making it ideal for binary descriptors.
Incorrect! Try again.
13How does a Brute Force matcher find the best match for a feature from a first image?
brute force matching
Easy
A.It randomly guesses the best match to save time
B.It skips matching and outputs the identity matrix
C.It compares the feature against every single feature in the second image
D.It only compares the feature with features of the same color
Correct Answer: It compares the feature against every single feature in the second image
Explanation:
A Brute Force matcher takes a descriptor from one set and calculates its distance against all descriptors in the second set, returning the closest one.
Incorrect! Try again.
14In the context of feature matching, what is a K-D tree primarily used for?
K-D tree
Easy
A.Calculating the focal length of a camera
B.Speeding up nearest neighbor searches in high-dimensional spaces
C.Storing pixel colors in a 2D array
D.Drawing edges on a black and white image
Correct Answer: Speeding up nearest neighbor searches in high-dimensional spaces
Explanation:
A K-D (K-dimensional) tree is a space-partitioning data structure that allows for fast approximate nearest neighbor searches, making feature matching much faster than a brute-force approach.
Incorrect! Try again.
15What is the core idea behind Locality-Sensitive Hashing (LSH)?
Locality-Sensitive Hashing (LSH)
Easy
A.Hashing dissimilar items into the exact same bucket
B.Encrypting image features so they cannot be read
C.Hashing similar input items into the same buckets with high probability
D.Sorting features alphabetically
Correct Answer: Hashing similar input items into the same buckets with high probability
Explanation:
LSH maps similar data points into the same hash buckets to quickly find approximate nearest neighbors, which is highly efficient for matching binary descriptors.
Incorrect! Try again.
16What does RANSAC stand for?
RANSAC for robust matching
Easy
A.Random Sample Consensus
B.Robust Analysis of Scale and Corners
C.Randomized System Algorithm Code
D.Rapid Sampling and Calculation
Correct Answer: Random Sample Consensus
Explanation:
RANSAC stands for Random Sample Consensus, an iterative method used to estimate parameters of a mathematical model from a set of observed data containing outliers.
Incorrect! Try again.
17In the context of feature matching, what is the main purpose of the RANSAC algorithm?
RANSAC for robust matching
Easy
A.To extract more features from an image
B.To filter out incorrect matches (outliers) and find a robust mathematical model
C.To increase the brightness of the image
D.To compress the feature descriptors
Correct Answer: To filter out incorrect matches (outliers) and find a robust mathematical model
Explanation:
RANSAC is used to robustly fit a model (like a homography matrix) by ignoring outlier matches and focusing only on geometrically consistent inlier matches.
Incorrect! Try again.
18Which of the following properties makes SIFT highly reliable for matching images taken from different distances and angles?
scale invariant feature transform
Easy
A.It converts the image to purely black and white
B.It is invariant to uniform image scaling and rotation
C.It only detects vertical lines
D.It relies entirely on color histograms
Correct Answer: It is invariant to uniform image scaling and rotation
Explanation:
SIFT extracts features that are robust to changes in image scale, noise, illumination, and rotation, which is why it is called a Scale-Invariant Feature Transform.
Incorrect! Try again.
19Which distance metric is most commonly used to match continuous floating-point descriptors like SIFT and SURF?
similarity measures
Easy
A.Jaccard index
B.Hamming distance
C.Euclidean distance ( norm)
D.Levenshtein distance
Correct Answer: Euclidean distance ( norm)
Explanation:
Floating-point descriptors are typically represented as vectors of real numbers. The standard way to measure the similarity between two such vectors is the Euclidean distance.
Incorrect! Try again.
20HOG descriptors were traditionally and most famously popularized for which specific computer vision task?
HOG
Easy
A.Color correction
B.Pedestrian detection
C.Image compression
D.Lens distortion removal
Correct Answer: Pedestrian detection
Explanation:
The HOG descriptor was heavily popularized by Dalal and Triggs in 2005 for the specific task of robust pedestrian detection in images.
Incorrect! Try again.
21Which of the following characteristics is most crucial for a local feature detector to ensure that the same physical point in a scene is extracted across images taken from different viewpoints?
Overview of feature detection
Medium
A.Sparsity
B.Repeatability
C.Low contrast
D.High dimensionality
Correct Answer: Repeatability
Explanation:
Repeatability is the property that a feature detector will find the exact same scene point in two different images despite changes in viewpoint, scale, or lighting. This is essential for reliable feature matching.
Incorrect! Try again.
22In the Harris corner detector, the structure tensor (second-moment matrix) is computed. Let and be its eigenvalues. What does it indicate if and ?
Harris corner detector
Medium
A.An isolated point of noise
B.An edge
C.A corner
D.A flat region
Correct Answer: An edge
Explanation:
When one eigenvalue is large and the other is close to zero, it indicates strong gradient variation in only one direction, which corresponds to an edge.
Incorrect! Try again.
23The Harris response function is given by . Why is this specific formula used instead of directly computing the eigenvalues?
Harris corner detector
Medium
A.To avoid the computationally expensive eigenvalue decomposition
B.To normalize the image gradients against illumination changes
C.To achieve scale invariance
D.To enforce rotation invariance in the descriptor
Correct Answer: To avoid the computationally expensive eigenvalue decomposition
Explanation:
Computing the determinant and trace of a matrix only requires basic arithmetic operations, making it much faster than calculating the actual eigenvalues while still providing a reliable measure of cornerness.
Incorrect! Try again.
24How does the Scale Invariant Feature Transform (SIFT) algorithm initially identify potential keypoints across different scales?
scale invariant feature transform
Medium
A.By computing the Harris response at multiple image resolutions
B.By searching for local maxima in the Histogram of Oriented Gradients (HOG)
C.By finding local extrema in a Difference of Gaussians (DoG) pyramid
D.By finding local extrema in a Laplacian of Gaussian (LoG) pyramid
Correct Answer: By finding local extrema in a Difference of Gaussians (DoG) pyramid
Explanation:
SIFT uses the Difference of Gaussians (DoG) as an efficient approximation of the Laplacian of Gaussian (LoG) to find scale-space extrema, which serve as potential keypoints.
Incorrect! Try again.
25To achieve rotation invariance, SIFT assigns a dominant orientation to each keypoint. How is this dominant orientation determined?
scale invariant feature transform
Medium
A.By calculating the direction of the principal eigenvector of the Harris matrix
B.By aligning the keypoint patch with the absolute horizontal axis of the image
C.By analyzing the peak of a histogram of local image gradient orientations around the keypoint
D.By computing the phase of the Fourier transform of the keypoint patch
Correct Answer: By analyzing the peak of a histogram of local image gradient orientations around the keypoint
Explanation:
SIFT computes gradient magnitudes and orientations in a region around the keypoint and builds an orientation histogram. The highest peak in this histogram dictates the dominant orientation.
Incorrect! Try again.
26SURF accelerates the feature detection process compared to SIFT. What mathematical tool does SURF heavily rely on to speed up its filtering operations?
speeded up robust features
Medium
A.Singular Value Decomposition (SVD)
B.Integral images
C.Discrete Cosine Transform (DCT)
D.Fast Fourier Transform (FFT)
Correct Answer: Integral images
Explanation:
SURF uses integral images to allow the extremely fast computation of box filters, which are used to approximate the second-order Gaussian derivatives (approximating the Hessian matrix).
Incorrect! Try again.
27The FAST detector examines a circle of 16 pixels around a candidate keypoint. To quickly reject non-corners, which pixels are typically tested first?
FAST
Medium
A.All odd-numbered pixels
B.All even-numbered pixels
C.Pixels 1, 2, 3, and 4
D.Pixels 1, 5, 9, and 13
Correct Answer: Pixels 1, 5, 9, and 13
Explanation:
To speed up the algorithm, FAST typically checks the pixels at the compass directions (1, 5, 9, and 13). If at least three of these do not meet the threshold criteria, the point cannot be a corner and is immediately rejected.
Incorrect! Try again.
28How is a BRIEF descriptor formed for a given keypoint patch?
BRIEF
Medium
A.By creating a histogram of local gradient orientations
B.By performing simple binary intensity tests between random pairs of pixels
C.By calculating the Haar wavelet responses in horizontal and vertical directions
D.By computing the eigenvectors of the local covariance matrix
Correct Answer: By performing simple binary intensity tests between random pairs of pixels
Explanation:
BRIEF is a binary descriptor that creates a bit string. Each bit is the result of an intensity comparison between a specific, pre-defined random pair of pixels within the smoothed patch around the keypoint.
Incorrect! Try again.
29Standard FAST keypoints lack orientation, and standard BRIEF descriptors are not rotationally invariant. How does ORB modify FAST to compute a keypoint orientation?
ORB
Medium
A.It aligns the patch to the strongest edge detected nearby
B.It fits an ellipse to the keypoint patch and uses the major axis
C.It computes a gradient histogram similar to SIFT
D.It calculates the intensity centroid of the keypoint patch
Correct Answer: It calculates the intensity centroid of the keypoint patch
Explanation:
ORB achieves rotation invariance for its keypoints by calculating the intensity centroid (using image moments) of the patch. The vector from the center of the patch to the centroid defines the orientation.
Incorrect! Try again.
30In the HOG (Histogram of Oriented Gradients) descriptor, what is the primary purpose of applying block normalization across overlapping blocks?
HOG
Medium
A.To make the descriptor invariant to rotation
B.To reduce the dimensionality of the descriptor
C.To provide invariance to local changes in illumination and contrast
D.To ensure the descriptor is invariant to scale changes
Correct Answer: To provide invariance to local changes in illumination and contrast
Explanation:
Gradients are highly sensitive to overall lighting and contrast changes. Normalizing the histograms over overlapping spatial blocks ensures that the HOG descriptor is robust to these illumination variations.
Incorrect! Try again.
31When building an object recognition system for a heavily cluttered scene, why are local features (like SIFT) preferred over global features (like global color histograms)?
applications of descriptors
Medium
A.Local features are highly robust to partial occlusions
B.Local features are faster to extract than global features
C.Local features require significantly less memory to store
D.Local features inherently capture the absolute color of the object better
Correct Answer: Local features are highly robust to partial occlusions
Explanation:
Because local features describe small patches of the image independently, a significant portion of them can still be detected and matched even if parts of the object are occluded by clutter.
Incorrect! Try again.
32Which distance metric is computationally optimal for matching binary descriptors such as BRIEF, ORB, or BRISK?
similarity measures
Medium
A.Euclidean distance ( norm)
B.Manhattan distance ( norm)
C.Cosine similarity
D.Hamming distance
Correct Answer: Hamming distance
Explanation:
Binary descriptors consist of strings of 0s and 1s. The Hamming distance measures the number of differing bits, which can be computed extremely quickly using bitwise XOR and popcount operations.
Incorrect! Try again.
33What is the primary objective of Lowe's ratio test during the feature matching phase?
Overview of feature matching
Medium
A.To ensure features are matched across the same scale space
B.To normalize the descriptor lengths before computing Euclidean distances
C.To reject ambiguous matches by comparing the nearest neighbor distance to the second nearest neighbor distance
D.To filter out features that do not have a strong gradient magnitude
Correct Answer: To reject ambiguous matches by comparing the nearest neighbor distance to the second nearest neighbor distance
Explanation:
Lowe's ratio test rejects matches where the distance to the closest match is too similar to the distance to the second-closest match, which typically occurs in highly repetitive textures or background clutter.
Incorrect! Try again.
34Given an image with descriptors and an image with descriptors, what is the time complexity of brute force matching if we want to find the best match in for every descriptor in ?
brute force matching
Medium
A.
B.
C.
D.
Correct Answer:
Explanation:
Brute force matching computes the distance between every descriptor in the first set and every descriptor in the second set, resulting in distance computations.
Incorrect! Try again.
35While K-D trees are often used to accelerate nearest neighbor searches in feature matching, under what condition does a K-D tree's performance typically degrade to that of a linear (brute force) search?
K-D tree
Medium
A.When the descriptors contain negative values
B.When the dimensionality of the descriptors is very high (e.g., > 100)
C.When the number of descriptors is very small
D.When the descriptors are highly clustered
Correct Answer: When the dimensionality of the descriptors is very high (e.g., > 100)
Explanation:
K-D trees suffer from the 'curse of dimensionality.' In high-dimensional spaces (like a 128-dimensional SIFT descriptor), the search space volume grows exponentially, forcing the algorithm to visit almost all branches, neutralizing the speed advantage.
Incorrect! Try again.
36What is the core principle behind Locality-Sensitive Hashing (LSH) for approximate nearest neighbor search?
Locality-Sensitive Hashing (LSH)
Medium
A.It ensures that dissimilar items are mapped to the same hash bucket to save memory
B.It partitions the space deterministically using the median values of each dimension
C.It uses random projections so that similar descriptors fall into the same hash bucket with high probability
D.It uses cryptographic hash functions to secure descriptor data
Correct Answer: It uses random projections so that similar descriptors fall into the same hash bucket with high probability
Explanation:
LSH maps high-dimensional data into lower-dimensional representations (hash buckets) such that similar items have a much higher probability of colliding (sharing the same hash) than dissimilar items.
Incorrect! Try again.
37In RANSAC, let be the probability that a selected match is an inlier. If a model requires matches to be computed, what does the expression represent?
RANSAC for robust matching
Medium
A.The expected number of inliers found after iterations
B.The variance of the error in the final model fit
C.The probability that RANSAC finds a correct model after iterations
D.The probability of selecting at least one outlier in iterations
Correct Answer: The probability that RANSAC finds a correct model after iterations
Explanation:
is the probability of picking inliers. is the probability of failing to pick all inliers in one iteration. is the probability of failing times. Therefore, is the probability of succeeding at least once in iterations.
Incorrect! Try again.
38When using RANSAC to compute a homography matrix between two images, what is the minimum number of point correspondences () required to instantiate a model hypothesis in a single iteration?
RANSAC for robust matching
Medium
A.8
B.2
C.4
D.3
Correct Answer: 4
Explanation:
A homography matrix has 8 degrees of freedom. Since each 2D point correspondence provides 2 equations, a minimum of 4 correspondences is required to compute the homography.
Incorrect! Try again.
39What happens to the required number of RANSAC iterations if the proportion of outliers in your feature matches increases dramatically, assuming you want to maintain a 99% confidence of finding a good model?
RANSAC for robust matching
Medium
A.The required iterations increase exponentially
B.The required iterations remain the same, but the model fitting takes longer
C.The required iterations decrease because outliers are easier to reject
D.The required iterations increase linearly
Correct Answer: The required iterations increase exponentially
Explanation:
The number of iterations required is . As the inlier ratio drops (meaning outliers increase), becomes very small, causing the required iterations to grow exponentially.
Incorrect! Try again.
40Binary descriptors are primarily favored in real-time computer vision applications (like mobile AR) because they are memory efficient and fast to match. Which of the following is NOT a binary descriptor?
binary feature detectors
Medium
A.SURF
B.BRISK
C.BRIEF
D.ORB
Correct Answer: SURF
Explanation:
SURF uses a floating-point descriptor based on Haar wavelet responses, whereas BRIEF, ORB, and BRISK generate binary bit strings.
Incorrect! Try again.
41The Harris corner detector computes a corner response function . If the eigenvalues and of the second-moment matrix satisfy (where is very large), how does the Harris corner response behave, and what does it indicate structurally?
Harris corner detector
Hard
A., indicating a flat region
B., indicating an edge
C., indicating an isolated point
D., indicating a corner
Correct Answer: , indicating an edge
Explanation:
If , the determinant is small compared to the squared trace . Because of the subtraction, becomes negative, which correctly identifies an edge structure rather than a corner.
Incorrect! Try again.
42In the SIFT algorithm, extreme points are detected in the Difference of Gaussian (DoG) scale space. During sub-pixel localization, the Taylor expansion of the DoG function is used. If the calculated offset has a component larger than $0.5$ in any dimension, what action is taken?
scale invariant feature transform
Hard
A.The local contrast threshold is increased to filter out noise.
B.The offset is clamped to exactly 0.5 to prevent divergence.
C.The extremum is rejected as an unstable edge.
D.The extremum is moved to the adjacent sample point and the localization is recomputed.
Correct Answer: The extremum is moved to the adjacent sample point and the localization is recomputed.
Explanation:
An offset indicates that the true extremum lies closer to an adjacent sample point. To ensure accurate sub-pixel and sub-scale localization, SIFT updates the sample point to the neighboring one and repeats the interpolation.
Incorrect! Try again.
43SURF achieves computational efficiency by using Box Filters to approximate the Hessian matrix. How does SURF evaluate these Box Filters independent of the filter scale in time?
speeded up robust features
Hard
A.By utilizing an Integral Image representation
B.By precomputing the Fourier transform of the image
C.By downsampling the original image progressively
D.By applying 1D Gaussian smoothing separably
Correct Answer: By utilizing an Integral Image representation
Explanation:
SURF uses integral images to compute the sum of intensities over any upright rectangular area in constant time. This allows Box Filters of any size (scale) to be applied extremely fast without downsampling the image.
Incorrect! Try again.
44The FAST feature detector evaluates a circle of 16 pixels around a candidate pixel . To accelerate the rejection of non-corners, FAST initially examines a specific subset of pixels. Which subset is typically checked first to quickly reject a non-corner candidate?
FAST
Hard
A.Pixels 2, 6, 10, and 14
B.Pixels 1, 5, 9, and 13
C.Pixels 1, 3, 5, and 7
D.All even-numbered pixels around the circle
Correct Answer: Pixels 1, 5, 9, and 13
Explanation:
FAST initially checks the pixels at positions 1, 5, 9, and 13 (top, bottom, left, right). If at least three of these do not share the same threshold condition relative to the central pixel, it cannot be a corner, and the point is quickly rejected.
Incorrect! Try again.
45BRIEF descriptors are highly sensitive to in-plane rotation. How does the ORB algorithm modify the BRIEF descriptor extraction to achieve rotation invariance (steerable BRIEF)?
ORB
Hard
A.By calculating the intensity centroid of the patch and steering the sampling pattern accordingly
B.By computing the eigenvectors of the second moment matrix to align the patch
C.By extracting the dominant gradient orientation using a SIFT-like histogram
D.By replacing random point pairs with symmetric circular patterns
Correct Answer: By calculating the intensity centroid of the patch and steering the sampling pattern accordingly
Explanation:
ORB achieves rotation invariance by finding the intensity centroid of the keypoint patch. The vector from the geometric center to this centroid defines the patch's orientation, which is then used to rotate the BRIEF sampling pattern.
Incorrect! Try again.
46In HOG descriptor computation, block normalization is a critical step for contrast invariance. If L2-Hys normalization is applied to a block vector , which of the following best describes the process?
HOG
Hard
A.Apply a low-pass filter to , compute the L2-norm, and scale by a factor of 2.
B.Normalize using L1-norm, compute the square root of each element, and divide by the maximum value.
C.Normalize using L2-norm, clip values at a threshold (e.g., 0.2), and renormalize using L2-norm.
D.Subtract the mean of , divide by the standard deviation, and apply a sigmoid function.
Correct Answer: Normalize using L2-norm, clip values at a threshold (e.g., 0.2), and renormalize using L2-norm.
Explanation:
L2-Hys (L2-Hysteresis) normalization involves first normalizing the vector using the L2-norm, then clipping the maximum values (typically at 0.2) to mitigate the influence of extreme gradients, and finally renormalizing the clipped vector.
Incorrect! Try again.
47When using RANSAC to estimate a model from a set of putative point matches containing an unknown outlier ratio , the number of iterations required to ensure a probability of picking at least one outlier-free sample of size is given by:
RANSAC for robust matching
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
The probability of picking a completely outlier-free sample is . The probability of failing to do this for consecutive iterations is . Setting this failure probability to and solving for yields the correct formula.
Incorrect! Try again.
48In the context of approximate nearest neighbor search using LSH for binary descriptors like ORB, which hash function family is commonly utilized to preserve the Hamming distance?
Locality-Sensitive Hashing (LSH)
Hard
A.MinHash using Jaccard similarity over the non-zero gradients
B.Random projection hashing where each bit samples a specific index of the binary descriptor
C.E2LSH (Exact Euclidean LSH) using -stable distributions
D.SimHash operating on the TF-IDF weights of visual words
Correct Answer: Random projection hashing where each bit samples a specific index of the binary descriptor
Explanation:
For binary descriptors compared via Hamming distance, LSH is typically implemented using random bit sampling. The hash functions simply read specific bit indices of the descriptor, and the probability of a hash collision is linearly related to the Hamming distance.
Incorrect! Try again.
49Why does the standard K-D tree approach degrade to linear search when matching high-dimensional descriptors like the 128-dimensional SIFT?
K-D tree
Hard
A.The Euclidean distance metric loses its triangular inequality property in dimensions greater than 20.
B.The median-finding algorithm during tree construction fails to partition data evenly in high dimensions.
C.The tree depth exceeds the maximum stack limit during recursive traversal, forcing a linear scan.
D.The volume of the search sphere intersects an exponentially increasing number of hyperplane boundaries, requiring backtracking through almost all nodes.
Correct Answer: The volume of the search sphere intersects an exponentially increasing number of hyperplane boundaries, requiring backtracking through almost all nodes.
Explanation:
This is a manifestation of the 'curse of dimensionality'. In high dimensions, the search hyper-sphere intersects many bounding boxes of adjacent leaf nodes. The algorithm must backtrack and evaluate a large fraction of the tree, eliminating its speed advantage over a linear scan.
Incorrect! Try again.
50When matching two sets of feature descriptors using the Sum of Squared Differences (SSD) versus Normalized Cross-Correlation (NCC), which of the following conditions strictly favors NCC over SSD for robust matching?
similarity measures
Hard
A.The feature descriptors are strictly binary strings extracted via BRIEF.
B.The images have undergone an affine change in illumination intensity ().
C.The images contain high levels of zero-mean Gaussian noise.
D.The images have extreme scale variations and require scale-space extrema detection.
Correct Answer: The images have undergone an affine change in illumination intensity ().
Explanation:
NCC is invariant to affine illumination changes because it subtracts the mean (handling the shift ) and divides by the standard deviation (handling the scaling ). SSD has no such normalization and fails under these conditions.
Incorrect! Try again.
51A common strategy to eliminate ambiguous feature matches is Lowe's ratio test. If the nearest neighbor distance is and the second nearest neighbor distance is , the match is kept if . What underlying assumption justifies this test in complex scenes?
Overview of feature matching
Hard
A.The descriptor space forms a convex hull where the first and second neighbors always lie on opposite sides of the hyperplane.
B.The scale and rotation invariance of SIFT guarantees that is always an affine transformation of .
C.Correct matches are significantly closer to the query than any incorrect match, whereas incorrect matches have many neighbors at similar distances due to background clutter.
D.Outliers in the dataset always cluster together at a distance exactly twice that of the true inlier.
Correct Answer: Correct matches are significantly closer to the query than any incorrect match, whereas incorrect matches have many neighbors at similar distances due to background clutter.
Explanation:
Lowe's ratio test relies on the observation that false matches (often from repetitive textures) tend to have multiple similar candidates nearby in the descriptor space. A true, distinct match will be uniquely close, making the ratio small.
Incorrect! Try again.
52ORB uses rBRIEF (rotation-aware BRIEF) for its descriptor. To reduce the correlation among the binary tests and maximize variance, ORB applies a specific learning step during its design. What method is used to select the optimal pairs of pixels for the binary tests?
ORB
Hard
A.Support Vector Machines (SVM) to classify stable versus unstable bit pairs across the scale pyramid.
B.K-means clustering on the intensity differences of random Gaussian patches.
C.A greedy search over all possible pixel pairs maximizing variance and minimizing absolute correlation.
D.Principal Component Analysis (PCA) to project the binary strings into a lower-dimensional orthogonal space.
Correct Answer: A greedy search over all possible pixel pairs maximizing variance and minimizing absolute correlation.
Explanation:
ORB selects its binary tests by running a greedy algorithm over a massive set of training keypoints. It specifically selects tests (pixel pairs) that have high variance (mean near 0.5) and are mostly uncorrelated with previously chosen tests.
Incorrect! Try again.
53Which mathematical concept provides the theoretical foundation for treating feature detection as finding locations where the signal changes significantly in multiple directions?
Overview of feature detection
Hard
A.The Radon transform integrated over a full rotation
B.The Laplacian of Gaussian (LoG) zero-crossings
C.The Fourier transform phase spectrum evaluated at high frequencies
D.The auto-correlation function and the eigenvalues of the local structure tensor
Correct Answer: The auto-correlation function and the eigenvalues of the local structure tensor
Explanation:
Feature (especially corner) detection is grounded in the auto-correlation function, which measures patch variation when shifted. The local structure tensor (second-moment matrix) approximates this, and its eigenvalues summarize the signal's directional variations.
Incorrect! Try again.
54In bag-of-visual-words (BoVW) image classification, visual vocabularies are created by clustering feature descriptors. If an image has multiple repetitive structures yielding many identical descriptors, how does the Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme handle these specific visual words?
applications of descriptors
Hard
A.It assigns them a constant weight equal to the median frequency of the vocabulary.
B.It heavily penalizes them if they also appear frequently across the entire corpus of training images.
C.It entirely filters them out before the support vector machine classification stage.
D.It boosts their weight proportionally to the square of their occurrences to emphasize repetitive textures.
Correct Answer: It heavily penalizes them if they also appear frequently across the entire corpus of training images.
Explanation:
TF-IDF weights a visual word based on its occurrence in the current image (Term Frequency), but inversely scales it by how often it appears across all images (Inverse Document Frequency). Common textures across the dataset are thus penalized as uninformative.
Incorrect! Try again.
55SIFT handles rotation invariance by computing a dominant gradient orientation for each keypoint. If a local region has multiple peaks in the orientation histogram that are within 80% of the highest peak, how does SIFT process this?
scale invariant feature transform
Hard
A.It rejects the keypoint as being too ambiguous and prone to mismatches.
B.It increases the smoothing of the histogram until only a single dominant peak remains.
C.It computes the weighted average of all peaks to form a single orientation vector.
D.It creates multiple keypoints at the same location and scale, but with different orientations.
Correct Answer: It creates multiple keypoints at the same location and scale, but with different orientations.
Explanation:
To maintain robustness and prevent instability in matching, SIFT generates an entirely separate keypoint for each orientation peak that is at least 80% of the maximum peak, all sharing the same spatial location and scale.
Incorrect! Try again.
56In PROSAC (Progressive Sample Consensus), an extension of RANSAC, how is the sampling strategy modified to achieve faster convergence in feature matching?
RANSAC for robust matching
Hard
A.It incrementally adds dimensions to the descriptor space during the consensus phase.
B.It selects samples based on their spatial dispersion to maximize geometric constraints.
C.It progressively decreases the threshold for what constitutes an inlier during iterations.
D.It samples from a progressively expanding subset of matches ordered by a quality metric, such as Lowe's ratio score.
Correct Answer: It samples from a progressively expanding subset of matches ordered by a quality metric, such as Lowe's ratio score.
Explanation:
PROSAC sorts the putative matches by a quality metric. It begins by sampling from the most highly rated matches, progressively expanding the sampling pool if a consensus isn't reached, vastly speeding up convergence when top matches are true inliers.
Incorrect! Try again.
57FAST uses a machine learning approach to generalize and speed up corner detection. How is the decision tree formulated in the machine learning version of FAST?
FAST
Hard
A.A deep neural network applies 1D convolutions around the circular perimeter to detect edge patterns.
B.ID3 algorithm is used to select the pixel in the 16-pixel ring that yields the most information gain for classifying corner vs. non-corner.
C.A Random Forest is trained on the continuous intensity gradients of the central pixel.
D.A Support Vector Machine maps the 16-pixel ring into a high-dimensional space to find the optimal separating hyperplane.
Correct Answer: ID3 algorithm is used to select the pixel in the 16-pixel ring that yields the most information gain for classifying corner vs. non-corner.
Explanation:
The machine learning implementation of FAST uses the ID3 algorithm to construct a decision tree. It evaluates which pixel test in the 16-pixel ring provides the highest information gain to quickly classify the region as a corner or non-corner.
Incorrect! Try again.
58In the formulation of the Difference of Gaussian (DoG) function , the DoG provides an approximation to the scale-normalized Laplacian of Gaussian (LoG). What is the theoretical relationship between the DoG and the scale-normalized LoG ?
scale invariant feature transform
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
Using the heat diffusion equation and a finite difference approximation for the derivative , we rearrange to find .
Incorrect! Try again.
59Binary descriptors like BRISK rely on a specific sampling pattern around the keypoint. Unlike BRIEF, which uses random point pairs, what defines the spatial sampling pattern of BRISK?
binary feature detectors
Hard
A.A cross-shaped sampling mask that aligns with the dominant gradient orientation of the image patch.
B.A dense rectangular grid of overlapping blocks similar to the HOG cell structure.
C.A set of concentric rings with evenly spaced sampling points, where points are smoothed with Gaussian kernels proportional to their distance from the center.
D.A spiral pattern defined by the Fibonacci sequence to ensure equal area sampling.
Correct Answer: A set of concentric rings with evenly spaced sampling points, where points are smoothed with Gaussian kernels proportional to their distance from the center.
Explanation:
BRISK uses a deterministic sampling pattern consisting of concentric rings. To mitigate aliasing, the points are smoothed using Gaussian filters whose standard deviation increases as the points get further from the central keypoint.
Incorrect! Try again.
60When applying Brute Force matching between image A ( features) and image B ( features) using cross-check (mutual consistency) validation, a match between feature and feature is retained only if:
brute force matching
Hard
A. is the absolute nearest neighbor to in B, and is the absolute nearest neighbor to in A.
B.The distance ratio is below a threshold for both directions.
C.The distance between and is less than the median distance of all possible pairs.
D.The descriptor vector of and the descriptor vector of are linearly dependent.
Correct Answer: is the absolute nearest neighbor to in B, and is the absolute nearest neighbor to in A.
Explanation:
Cross-check validation requires mutual consistency. A match is only considered valid if the best match for feature in image A is feature in image B, AND the best match for feature in image B is feature in image A.