1Why is color information highly valuable in computer vision applications?
Importance of color in computer vision
Easy
A.It increases the processing speed of algorithms
B.It eliminates the need for edge detection
C.It reduces the memory required to store images
D.It provides a powerful descriptor that simplifies object identification and extraction
Correct Answer: It provides a powerful descriptor that simplifies object identification and extraction
Explanation:
Color provides additional feature dimensions compared to grayscale, making it much easier to distinguish, identify, and segment objects based on their visual appearance.
Incorrect! Try again.
2Which of the following tasks benefits the most from adding color information to a standard grayscale image?
Importance of color in computer vision
Easy
A.Calculating the spatial frequency of an image
B.Differentiating a red apple from a green apple of the same brightness
C.Finding the geometric center of a shape
D.Detecting straight lines using the Hough transform
Correct Answer: Differentiating a red apple from a green apple of the same brightness
Explanation:
Grayscale images represent intensity, so two objects with different colors but the same brightness will look identical. Color information easily resolves this ambiguity.
Incorrect! Try again.
3Which color model is primarily based on how the human eye perceives colors and is standard for displaying images on digital screens?
color models
Easy
A.CMYK
B.HSV
C.YIQ
D.RGB
Correct Answer: RGB
Explanation:
The RGB (Red, Green, Blue) color model is an additive model based on human vision and is universally used for electronic displays.
Incorrect! Try again.
4What do the letters in the HSV color model stand for?
color models
Easy
A.Hue, Saturation, Value
B.High, Standard, Variable
C.Hue, Shade, Volume
D.Hue, Saturation, Variance
Correct Answer: Hue, Saturation, Value
Explanation:
HSV stands for Hue (the color type), Saturation (the intensity of the color), and Value (the brightness of the color).
Incorrect! Try again.
5Why is it common to convert images from RGB to the HSV color space in computer vision?
conversion between color spaces
Easy
A.To convert a color image into a 3D model
B.To increase the resolution of the image
C.To compress the image size
D.To separate color information from intensity or lighting information
Correct Answer: To separate color information from intensity or lighting information
Explanation:
In HSV, the 'Value' channel holds the lighting information, while 'Hue' and 'Saturation' hold the color. This separation makes algorithms more robust to lighting changes.
Incorrect! Try again.
6When converting an RGB image to a grayscale image, what mathematical operation is typically performed on the pixels?
conversion between color spaces
Easy
A.Taking a weighted average of the , , and channels
B.Multiplying all three channels together
C.Finding the maximum value among , , and
D.Subtracting the channel from the channel
Correct Answer: Taking a weighted average of the , , and channels
Explanation:
Grayscale conversion is usually done by taking a weighted average of the Red, Green, and Blue channels to account for human perception of brightness (e.g., Green contributes the most).
Incorrect! Try again.
7What is the primary purpose of applying color augmentation techniques to an image dataset?
color augmentation techniques
Easy
A.To reduce the color depth to 8-bit
B.To artificially increase dataset diversity and make models robust to lighting changes
C.To compress the dataset for storage
D.To convert 2D images into 3D point clouds
Correct Answer: To artificially increase dataset diversity and make models robust to lighting changes
Explanation:
Color augmentation (like changing brightness or saturation) creates variations of existing images, helping machine learning models generalize better to unseen lighting conditions.
Incorrect! Try again.
8Which of the following is a common color augmentation technique?
color augmentation techniques
Easy
A.Max pooling
B.Edge detection
C.Image rotation
D.Color jittering (randomly changing brightness, contrast, and saturation)
Correct Answer: Color jittering (randomly changing brightness, contrast, and saturation)
Explanation:
Color jittering involves randomly altering the color properties of an image, such as brightness, contrast, and saturation, to augment the training data.
Incorrect! Try again.
9What does 'color constancy' refer to in computer vision?
color constancy
Easy
A.The ability to perceive the true color of an object despite changes in the color of the illumination
B.The technique of using only primary colors in an image
C.The process of keeping the file size constant when adding color
D.The ability to convert a color image into grayscale
Correct Answer: The ability to perceive the true color of an object despite changes in the color of the illumination
Explanation:
Color constancy is a feature of the human color perception system (and a goal in CV) which ensures that the perceived color of objects remains relatively constant under varying illumination conditions.
Incorrect! Try again.
10Which simple algorithm is often used as a baseline to achieve color constancy by assuming that the average color of a scene is neutral gray?
color constancy
Easy
A.The Gray World assumption
B.The Hough Transform
C.The RGB to HSV conversion
D.The Canny Edge Detector
Correct Answer: The Gray World assumption
Explanation:
The Gray World assumption is a classic color constancy algorithm. It assumes that given an image with a sufficient amount of color variations, the average value of the R, G, and B components should equal a common gray value.
Incorrect! Try again.
11What does the pixel value represent in a range image?
introduction to range images
Easy
A.The intensity of light reflected by an object
B.The color hue of the object
C.The distance or depth from the sensor to the object
D.The temperature of the object
Correct Answer: The distance or depth from the sensor to the object
Explanation:
In a range image (also called a depth map), each pixel's value directly corresponds to the physical distance between the sensor and the objects in the scene.
Incorrect! Try again.
12What is another common name for a range image?
introduction to range images
Easy
A.Grayscale map
B.Depth map
C.Thermal map
D.Texture map
Correct Answer: Depth map
Explanation:
Range images map the range (distance) to objects, and are therefore commonly referred to as depth maps.
Incorrect! Try again.
13Which of the following best describes the difference between a 2D intensity image and a 3D range image?
difference between 2D intensity images and 3D range images
Easy
A.Intensity images depend on light reflectance; range images measure geometric distance.
B.Intensity images are 3D; range images are 2D.
C.Intensity images can only be captured actively; range images are captured passively.
D.Intensity images capture distance; range images capture color.
Correct Answer: Intensity images depend on light reflectance; range images measure geometric distance.
Explanation:
A 2D intensity image records the amount of light reflected from a scene, whereas a 3D range image records the actual physical distance (geometry) to the surfaces in the scene.
Incorrect! Try again.
14Which type of image is generally immune to shadows and variations in ambient lighting?
difference between 2D intensity images and 3D range images
Easy
A.Grayscale images
B.3D range images
C.2D intensity images
D.RGB images
Correct Answer: 3D range images
Explanation:
Because range images measure physical distance rather than reflected light, they are inherently robust to shadows, illumination changes, and variations in surface color.
Incorrect! Try again.
15How do active range sensors operate?
active range sensors
Easy
A.They passively measure the ambient sunlight reflecting off objects.
B.They estimate depth strictly using two passive cameras (stereo vision).
C.They calculate depth based purely on the color of objects.
D.They emit their own energy (like light or sound) and measure the reflection to calculate distance.
Correct Answer: They emit their own energy (like light or sound) and measure the reflection to calculate distance.
Explanation:
Active sensors, unlike passive ones, project their own energy (such as lasers in LiDAR or infrared light) into the environment and measure the return signal to determine distance.
Incorrect! Try again.
16Which of the following is an example of an active range sensor?
active range sensors
Easy
A.Passive Stereo Camera
B.LiDAR (Light Detection and Ranging)
C.Thermal imaging camera
D.Standard RGB Web camera
Correct Answer: LiDAR (Light Detection and Ranging)
Explanation:
LiDAR is an active sensor that emits laser pulses and measures the time it takes for the light to bounce back to calculate precise distances.
Incorrect! Try again.
17Why is preprocessing often necessary for raw range data?
preprocessing of range data
Easy
A.To add artificial colors to the image
B.To convert the 3D data into 2D intensity data
C.To increase the physical range of the sensor
D.To remove missing data points and noise spikes common in depth sensors
Correct Answer: To remove missing data points and noise spikes common in depth sensors
Explanation:
Raw range data is often noisy and may contain 'holes' (missing data) due to reflective surfaces or sensor limitations. Preprocessing steps like filtering and interpolation fix these issues.
Incorrect! Try again.
18Which filter is commonly used in preprocessing range images to remove outlier noise (salt-and-pepper noise) without blurring the sharp geometric edges?
preprocessing of range data
Easy
A.Median filter
B.Fourier filter
C.High-pass filter
D.Mean filter
Correct Answer: Median filter
Explanation:
A median filter is highly effective at removing impulse noise (outliers) while preserving the sharp edges (like object boundaries) in a range image.
Incorrect! Try again.
19Which of the following fields heavily relies on range data for obstacle avoidance and mapping?
applications of range data
Easy
A.Autonomous driving and robotics
B.Audio processing
C.Digital typography
D.Text document scanning
Correct Answer: Autonomous driving and robotics
Explanation:
Autonomous vehicles and robots use range data (via LiDAR or depth cameras) to understand the 3D geometry of their environment, allowing them to detect and avoid obstacles safely.
Incorrect! Try again.
20How are range images primarily used in modern smartphone facial recognition systems (like FaceID)?
applications of range data
Easy
A.To check the skin tone of the user
B.To capture a 3D structural map of the face, preventing spoofing with 2D photos
C.To measure the room temperature
D.To improve the color saturation of selfies
Correct Answer: To capture a 3D structural map of the face, preventing spoofing with 2D photos
Explanation:
Modern facial recognition uses range (depth) data to build a 3D map of the face. This provides higher security because a flat 2D photograph cannot trick the depth sensor.
Incorrect! Try again.
21Why is color information often considered more robust than grayscale for object tracking under partial occlusion?
Importance of color in computer vision
Medium
A.Color provides a multi-dimensional feature space that helps distinguish objects from similar-intensity backgrounds.
B.Color components are entirely invariant to extreme changes in illumination.
C.Color images require less computational power to process during real-time tracking.
D.Grayscale images cannot be used to compute optical flow, making color essential for tracking.
Correct Answer: Color provides a multi-dimensional feature space that helps distinguish objects from similar-intensity backgrounds.
Explanation:
Color adds chromatic dimensions (e.g., Hue and Saturation) to the intensity data. This extra information creates a distinct feature space, allowing algorithms to separate objects from backgrounds even when their grayscale intensities match.
Incorrect! Try again.
22In image segmentation, how does the inclusion of color channels typically affect the performance of clustering algorithms like K-means compared to using only intensity?
Importance of color in computer vision
Medium
A.It forces the algorithm to rely solely on edge detection rather than regional properties.
B.It transforms the problem into a linear regression task, bypassing the need for clustering.
C.It increases the separability of classes by mapping pixels into a 3D feature space instead of a 1D space.
D.It reduces the separability of classes by introducing noise into the clustering process.
Correct Answer: It increases the separability of classes by mapping pixels into a 3D feature space instead of a 1D space.
Explanation:
Using a 3D color space (like RGB or Lab) instead of a 1D grayscale space increases the distance between different classes in the feature space, making clustering algorithms like K-means more effective at grouping similar pixels.
Incorrect! Try again.
23Which color model is most suitable for developing an application that requires intuitive manipulation of "tint" and "vividness" by a human user?
Color models
Medium
A.RGB (Red, Green, Blue)
B.HSV (Hue, Saturation, Value)
C.YUV (Luminance, Chrominance)
D.CMYK (Cyan, Magenta, Yellow, Key)
Correct Answer: HSV (Hue, Saturation, Value)
Explanation:
The HSV color model separates color into Hue (tint), Saturation (vividness), and Value (brightness). This closely aligns with human perception, making it highly intuitive for users to manipulate colors.
Incorrect! Try again.
24In the YCbCr color model used in video compression, why is the 'Y' channel typically sampled at a higher resolution than 'Cb' and 'Cr'?
Color models
Medium
A.Compression algorithms can only process the 'Cb' and 'Cr' channels at lower resolutions.
B.The human visual system is more sensitive to spatial variations in brightness than in color.
C.The 'Y' channel is used to synchronize the audio and video streams.
D.The 'Y' channel contains the color information, which requires more bits to store.
Correct Answer: The human visual system is more sensitive to spatial variations in brightness than in color.
Explanation:
The Y channel represents luminance (brightness), while Cb and Cr represent chrominance (color). Because human eyes have more rod cells (sensitive to brightness) than cone cells (sensitive to color detail), chroma subsampling can compress color data without a noticeable loss in perceived image quality.
Incorrect! Try again.
25When converting from an RGB image to grayscale using the standard NTSC formula , why are the weights unequal?
Conversion between color spaces
Medium
A.To account for the non-linear response of digital camera sensors.
B.Because the human eye has varying sensitivity to different wavelengths, being most sensitive to green.
C.To compensate for the fact that blue pixels use more electrical power in LCD displays.
D.Because the RGB color space is inherently flawed and requires statistical correction.
Correct Answer: Because the human eye has varying sensitivity to different wavelengths, being most sensitive to green.
Explanation:
The human visual system does not perceive all colors with equal brightness. We are most sensitive to green light and least sensitive to blue. The weights in the luminance formula reflect this biological sensitivity to create a perceptually accurate grayscale image.
Incorrect! Try again.
26What happens if the non-linear gamma correction step is omitted during the conversion from linear RGB to sRGB space?
Conversion between color spaces
Medium
A.The color hue will shift by 180 degrees, resulting in a negative image.
B.The image will become completely desaturated, appearing as grayscale.
C.The resulting image will appear unnaturally dark on standard display monitors.
D.The image resolution will be mathematically halved.
Correct Answer: The resulting image will appear unnaturally dark on standard display monitors.
Explanation:
Standard display monitors apply a gamma expansion (typically around 2.2). If linear RGB is not gamma-encoded (compressed) before display, the monitor's expansion will make the midtones of the image appear excessively dark.
Incorrect! Try again.
27When applying color jittering as an augmentation technique to train a robust neural network, which parameter should be perturbed to simulate varying lighting intensities without changing the object's intrinsic color?
Color augmentation techniques
Medium
A.Saturation in the HSV space
B.Hue in the HSV space
C.Value/Brightness in the HSV space
D.The a* channel in the CIELAB space
Correct Answer: Value/Brightness in the HSV space
Explanation:
Changing the Value (or Brightness) alters the perceived illumination intensity of the image without affecting the intrinsic color properties (Hue) or the purity of the color (Saturation).
Incorrect! Try again.
28What is the primary purpose of applying PCA-based color augmentation (like Fancy PCA) on RGB images during deep learning model training?
Color augmentation techniques
Medium
A.To convert the RGB image into a binary mask for faster feature extraction.
B.To perfectly normalize the image so all pixel values equal exactly zero mean and unit variance.
C.To alter RGB intensities proportionally along their principal components, preserving natural illumination variations.
D.To compress the image size and speed up training epochs.
Correct Answer: To alter RGB intensities proportionally along their principal components, preserving natural illumination variations.
Explanation:
Fancy PCA (used in AlexNet) performs Principal Component Analysis on the RGB pixel values and adds multiples of the principal components. This creates realistic color variations that respect the natural covariance of colors in the dataset.
Incorrect! Try again.
29Which algorithm applies the assumption that the spatial average of surface reflectances in a scene is achromatic (gray) to achieve color constancy?
Color constancy
Medium
A.Histogram Equalization
B.Gamut Mapping
C.Gray World Assumption
D.White Patch Retinex
Correct Answer: Gray World Assumption
Explanation:
The Gray World Assumption postulates that given an image with sufficient color variation, the average color of the scene should be neutral gray. Any deviation from gray is assumed to be caused by the color of the illuminant.
Incorrect! Try again.
30An image taken under a strong yellow tungsten light looks unnaturally warm. How does the White Patch (Max-RGB) algorithm attempt to correct this illumination bias?
Color constancy
Medium
A.By finding the maximum pixel value in each color channel and scaling all pixels so these maximums become pure white.
B.By applying a median filter to remove the yellow wavelengths from the image spectrum.
C.By calculating the average of all pixels and subtracting it from the yellow channel.
D.By converting the image to grayscale and colorizing it using a pre-trained neural network.
Correct Answer: By finding the maximum pixel value in each color channel and scaling all pixels so these maximums become pure white.
Explanation:
The White Patch algorithm assumes there is a perfectly reflecting surface in the scene. It estimates the illuminant by taking the maximum value in the R, G, and B channels independently, and then normalizes the image so that this "patch" appears white.
Incorrect! Try again.
31In a standard range image, what does the scalar value at pixel coordinate explicitly represent?
Introduction to range images
Medium
A.The velocity of the object relative to the camera.
B.The amount of light reflected from the object surface to the sensor.
C.The color intensity of the object in the near-infrared spectrum.
D.The physical distance from the sensor to the surface of the scene at that spatial location.
Correct Answer: The physical distance from the sensor to the surface of the scene at that spatial location.
Explanation:
A range image (or depth map) is a 2D grid where each pixel value corresponds to the depth or distance from the sensor to the physical object surface, rather than representing light intensity or color.
Incorrect! Try again.
32Which of the following best describes the structural representation of a standard range image?
Introduction to range images
Medium
A.A fully volumetric 3D voxel grid.
B.A continuous mathematical surface defined by a set of B-spline equations.
C.A 2D array where each element stores a depth value, often referred to as a 2.5D representation.
D.A 1D array of distance measurements sorted by magnitude.
Correct Answer: A 2D array where each element stores a depth value, often referred to as a 2.5D representation.
Explanation:
A range image is structured exactly like a 2D image, but stores depth instead of color. Because it only captures the visible surface from a single viewpoint (and not the back or inside of objects), it is commonly called a 2.5D representation.
Incorrect! Try again.
33Why are edge detection algorithms applied to range images fundamentally different in physical interpretation compared to 2D intensity images?
Difference between 2D intensity images and 3D range images
Medium
A.Range images have fewer pixels, making edge detection mathematically simpler but less accurate.
B.Intensity images only have straight edges, while range images capture curved geometric edges.
C.Edge detection cannot be applied to range images because depth values are not continuous.
D.Edges in range images correspond to physical depth discontinuities, whereas in intensity images they may represent shadows or texture boundaries.
Correct Answer: Edges in range images correspond to physical depth discontinuities, whereas in intensity images they may represent shadows or texture boundaries.
Explanation:
In an intensity image, edges can be caused by changes in lighting, shadows, or surface texture. In a range image, an edge (a sharp change in pixel value) guarantees a physical structural change, such as a jump in depth (occlusion boundary) or a change in surface normal.
Incorrect! Try again.
34How does the presence of ambient illumination changes affect 2D intensity images compared to 3D range images?
Difference between 2D intensity images and 3D range images
Medium
A.Range images are highly sensitive to illumination changes, while intensity images are invariant.
B.Ambient illumination changes only affect the spatial resolution of range images.
C.Intensity images are highly sensitive to illumination changes, while range images are generally invariant to ambient lighting.
D.Both are equally degraded by changes in ambient illumination.
Correct Answer: Intensity images are highly sensitive to illumination changes, while range images are generally invariant to ambient lighting.
Explanation:
Intensity images rely entirely on reflected ambient light, meaning shadows and varying light drastically change the pixel values. Range images map physical geometry and (especially with active sensors) are mostly unaffected by normal changes in room lighting or shadows.
Incorrect! Try again.
35How does a Time-of-Flight (ToF) camera estimate the depth of a scene?
Active range sensors
Medium
A.By capturing two images simultaneously from different angles and computing stereo disparity.
B.By projecting a grid pattern and measuring the geometric distortion of the grid on the object.
C.By analyzing the blur radius of objects moving rapidly across the sensor's field of view.
D.By measuring the phase shift or time delay of an emitted light pulse reflecting off the scene and returning to the sensor.
Correct Answer: By measuring the phase shift or time delay of an emitted light pulse reflecting off the scene and returning to the sensor.
Explanation:
Time-of-Flight (ToF) sensors actively illuminate the scene (usually with infrared light) and calculate depth by measuring the exact time it takes for the light to travel to the object and reflect back to the sensor.
Incorrect! Try again.
36In a structured light active range sensor, what is the primary purpose of projecting a known pattern (e.g., a grid or stripes) onto the scene?
Active range sensors
Medium
A.To directly measure the speed of light reflecting off the object's surface.
B.To illuminate the scene brightly enough so the camera can capture color.
C.To establish pixel correspondences between the projector and the camera for triangulation.
D.To confuse ambient infrared sensors that might interfere with the measurement.
Correct Answer: To establish pixel correspondences between the projector and the camera for triangulation.
Explanation:
Structured light sensors project a known pattern onto the scene. By observing how the pattern deforms over the object's surface from an offset camera, the system can reliably find corresponding points and use triangulation to calculate depth.
Incorrect! Try again.
37Range images obtained from active sensors often contain "flying pixels" (mixed pixels) at object boundaries. Which filter is most effective for removing these artifacts while preserving sharp depth edges?
Preprocessing of range data
Medium
A.Gaussian filter
B.Mean filter
C.Median filter
D.High-pass filter
Correct Answer: Median filter
Explanation:
Flying pixels are outlier depth values created when a sensor's ray hits the edge of an object and part of the background simultaneously. A median filter is a non-linear filter excellent at removing such outliers (salt-and-pepper noise) while preserving sharp, distinct edges.
Incorrect! Try again.
38What is a common technique used to fill in missing depth values (holes) in range images caused by highly reflective surfaces or occlusions?
Preprocessing of range data
Medium
A.Bilateral filtering or spatial interpolation using valid neighboring depth pixels.
B.Converting the depth map to the frequency domain and applying a low-pass filter.
C.Multiplying the depth map by the ambient lighting intensity.
D.Applying a global Fourier transform to reconstruct missing frequencies.
Correct Answer: Bilateral filtering or spatial interpolation using valid neighboring depth pixels.
Explanation:
Missing depth values are typically handled via spatial interpolation (like nearest-neighbor or bilinear) or advanced edge-preserving filters like bilateral filtering, which estimates the missing values based on the valid depth data surrounding the hole.
Incorrect! Try again.
39In autonomous navigation, how is range data primarily utilized for obstacle avoidance?
Applications of range data
Medium
A.By analyzing the color signature of objects to determine if they are moving vehicles.
B.By reading text on traffic signs using optical character recognition (OCR).
C.By constructing a 3D occupancy grid or point cloud to identify the exact distance and geometry of obstacles in the vehicle's path.
D.By tracking the sun's position to estimate the vehicle's global coordinates.
Correct Answer: By constructing a 3D occupancy grid or point cloud to identify the exact distance and geometry of obstacles in the vehicle's path.
Explanation:
Autonomous vehicles use range data (from LiDAR or depth cameras) to build spatial representations of their environment, like 3D occupancy grids, allowing them to detect the physical presence, size, and proximity of obstacles to plan safe paths.
Incorrect! Try again.
40How are range images utilized in industrial quality inspection for manufactured parts?
Applications of range data
Medium
A.By evaluating the color consistency of the paint applied to the manufactured part.
B.By extracting 3D surface geometry from the scan and comparing it against a reference CAD model to detect dimensional defects.
C.By analyzing the 2D shadows cast by the part to estimate its weight.
D.By listening to the acoustic resonance of the part when hit with an infrared laser.
Correct Answer: By extracting 3D surface geometry from the scan and comparing it against a reference CAD model to detect dimensional defects.
Explanation:
In automated inspection, active range sensors capture the precise 3D shape of a manufactured part. This depth data is then aligned and compared with the original CAD model to measure tolerances and detect surface defects, dents, or misalignments.
Incorrect! Try again.
41Metamerism poses a significant challenge in color-based object recognition. Two physically distinct surfaces are considered metamers under a specific illuminant if they exhibit which of the following properties?
Importance of color in computer vision
Hard
A.They possess identical spectral reflectance curves but yield different RGB sensor responses.
B.They possess identical spectral reflectance curves and yield identical sensor responses under all possible illuminants.
C.They possess different spectral reflectance curves but yield identical sensor responses when integrated over the sensor's spectral sensitivities and the illuminant's power distribution.
D.They possess different spectral reflectance curves and yield different sensor responses, but map to the same chromaticity coordinates in the CIE xy diagram.
Correct Answer: They possess different spectral reflectance curves but yield identical sensor responses when integrated over the sensor's spectral sensitivities and the illuminant's power distribution.
Explanation:
Metamerism occurs when two surfaces with fundamentally different spectral reflectance functions produce the same integrated response in a trichromatic visual system (or camera) under a specific light source.
Incorrect! Try again.
42According to the Dichromatic Reflection Model, the total radiance from a dielectric inhomogeneous material is given by . If an algorithm successfully factors out , what is the primary consequence for computer vision tasks?
Importance of color in computer vision
Hard
A.The algorithm completely neutralizes the color of the illuminant, achieving perfect color constancy.
B.The algorithm maps all out-of-gamut colors to the closest boundary on the CIE spectral locus.
C.The algorithm eliminates specular highlights, ensuring the remaining radiance is purely a function of the material's albedo and Lambertian shading.
D.The algorithm removes the Lambertian diffuse reflection, leaving only the material's structural geometry.
Correct Answer: The algorithm eliminates specular highlights, ensuring the remaining radiance is purely a function of the material's albedo and Lambertian shading.
Explanation:
In the Dichromatic Reflection Model, represents the surface (specular) reflection, which typically retains the color of the illuminant, and represents the body (diffuse/Lambertian) reflection. Removing removes specular highlights.
Incorrect! Try again.
43The sRGB color space applies a non-linear transfer function (gamma correction) to linear RGB values. From a mathematical and information-theoretic perspective, what is the primary purpose of applying this specific non-linearity prior to 8-bit quantization?
Color models
Hard
A.To distribute quantization steps more uniformly according to the human visual system's logarithmic sensitivity to luminance, thereby minimizing banding in dark regions.
B.To map the theoretically infinite dynamic range of physical scene radiance to the bounded interval required for digital processing.
C.To compensate for the non-linear voltage-to-luminance response of modern OLED and LCD displays.
D.To orthogonalize the color channels, ensuring that modifications in the Red channel do not affect the perceived brightness of the Blue channel.
Correct Answer: To distribute quantization steps more uniformly according to the human visual system's logarithmic sensitivity to luminance, thereby minimizing banding in dark regions.
Explanation:
Human vision is more sensitive to small changes in dark tones than in bright tones. Gamma encoding redistributes the limited 8-bit quantization levels to allocate more bits to darker regions, perceptually minimizing quantization noise (banding).
Incorrect! Try again.
44In the YCbCr color model, chroma subsampling (e.g., 4:2:0) reduces bandwidth. Which mathematical property of the RGB to YCbCr conversion makes this subsampling perceptually viable?
Color models
Hard
A.The conversion applies a localized Fourier transform, allowing the and channels to inherently represent only low-frequency spatial patterns.
B.The transformation orthogonalizes the space such that the channel contains a weighted sum of RGB optimized for human luminance sensitivity, while and represent purely chromatic differences to which high-frequency human spatial sensitivity is low.
C.The transformation is highly non-linear, pushing high-frequency chromatic data into the channel.
D.The and channels are mathematically derived from the eigenvectors of the human retina's cone responses.
Correct Answer: The transformation orthogonalizes the space such that the channel contains a weighted sum of RGB optimized for human luminance sensitivity, while and represent purely chromatic differences to which high-frequency human spatial sensitivity is low.
Explanation:
YCbCr separates luminance () from chrominance (). The human visual system has much lower spatial acuity for color differences than for luminance differences, allowing and to be spatially subsampled without severe perceived degradation.
Incorrect! Try again.
45When converting from linear RGB to the HSI (Hue, Saturation, Intensity) color space, the Hue becomes mathematically undefined when . How does this singularity impact gradient-based optimization in deep learning models operating directly in HSI space?
Conversion between color spaces
Hard
A.It has no impact, because automatic differentiation engines inherently handle divisions by substituting a gradient of $1.0$.
B.It induces infinite or undefined gradients (NaNs) when traversing the achromatic axis, leading to severe instability during backpropagation.
C.It guarantees that the gradient of the loss with respect to hue is zero, naturally freezing the hue parameters.
D.It requires the optimizer to use complex-valued arithmetic to traverse the hue circle smoothly.
Correct Answer: It induces infinite or undefined gradients (NaNs) when traversing the achromatic axis, leading to severe instability during backpropagation.
Explanation:
At the achromatic axis (), saturation is zero and hue is mathematically undefined (a singularity). This discontinuity causes the derivative of the hue with respect to RGB to become undefined or infinite, injecting NaNs into backpropagation gradients.
Incorrect! Try again.
46The conversion from CIE XYZ to CIE Lab* incorporates a non-linear function (for ). What is the exact geometrical objective of this specific transformation?
Conversion between color spaces
Hard
A.To compress the gamut of highly saturated colors to prevent out-of-bound errors during 8-bit quantization.
B.To achieve perceptual uniformity, ensuring that the Euclidean distance between two coordinates roughly corresponds to the perceived color difference by the human eye.
C.To map the white point of the illuminant exactly to the origin in Lab* space.
D.To decouple the chromaticity coordinates from luminance, allowing for perfectly linear alpha blending.
Correct Answer: To achieve perceptual uniformity, ensuring that the Euclidean distance between two coordinates roughly corresponds to the perceived color difference by the human eye.
Explanation:
CIE Lab* was designed to be perceptually uniform. The cube root transformation approximates the non-linear response of the human eye, meaning a set Euclidean distance anywhere in the space represents a similar perceived color difference.
Incorrect! Try again.
47In Fancy PCA (used in AlexNet) for color augmentation, an RGB pixel is augmented by adding , where and are eigenvectors and eigenvalues of the RGB covariance matrix. What specific invariance is this technique injecting into the model?
Color augmentation techniques
Hard
A.Invariance to spatial frequency shifts in the color channels.
B.Invariance to changes in the intensity and color of the scene illumination by shifting pixel values along the principal axes of the dataset's color distribution.
C.Invariance to local geometric distortions and chromatic aberration.
D.Invariance to extreme nonlinear gamma shifts caused by different camera sensors.
Correct Answer: Invariance to changes in the intensity and color of the scene illumination by shifting pixel values along the principal axes of the dataset's color distribution.
Explanation:
PCA color augmentation alters the intensities of RGB channels based on the principal components of the dataset's color distribution. This simulates realistic variations in lighting intensity and color temperature without creating unnatural color shifts.
Incorrect! Try again.
48When applying color jittering (randomly varying Hue, Saturation, and Value) in HSV space to augment data, which mathematical precaution MUST be taken for the Hue channel that is not required for Saturation or Value?
Color augmentation techniques
Hard
A.The Hue channel must be augmented using modulo arithmetic (e.g., modulo or $1.0$) because it represents an angular, periodic space.
B.The Hue channel must be transformed logarithmically to prevent perceptually massive color shifts.
C.The Hue channel must be clamped strictly to the range before converting back to RGB.
D.The Hue channel must be normalized to have zero mean and unit variance before jittering.
Correct Answer: The Hue channel must be augmented using modulo arithmetic (e.g., modulo or $1.0$) because it represents an angular, periodic space.
Explanation:
Hue is typically represented as an angle on a color wheel (0 to 360 degrees). When adding a jitter value, wrapping around using modulo arithmetic is required to maintain valid colors (e.g., adding 10 degrees to 355 degrees should yield 5 degrees).
Incorrect! Try again.
49The Grey-World assumption is a popular algorithm for color constancy. Under which of the following real-world scenarios will the Grey-World algorithm fail most catastrophically, producing severely distorted colors?
Color constancy
Hard
A.An image of a diverse outdoor landscape containing sky, grass, and dirt.
B.An image of a uniformly bright red brick wall taking up 90% of the camera's field of view.
C.An image with a wide dynamic range containing both extremely dark shadows and bright specular highlights.
D.An image of a highly textured, multi-colored carpet taken under a standard D65 illuminant.
Correct Answer: An image of a uniformly bright red brick wall taking up 90% of the camera's field of view.
Explanation:
The Grey-World algorithm assumes that the average reflectance of the scene is achromatic (grey). If the scene is dominated by a single large, highly chromatic object (like a red wall), the algorithm will incorrectly assume the illuminant is red and shift the entire image towards cyan to compensate.
Incorrect! Try again.
50The von Kries transform models illuminant changes using a diagonal matrix such that . For this diagonal approximation to be strictly true mathematically (without error), what underlying constraint must be satisfied regarding the camera sensors?
Color constancy
Hard
A.The camera sensor spectral sensitivities must be perfectly non-overlapping (narrowband or Dirac delta functions).
B.The camera sensors must possess completely overlapping spectral sensitivities with differing amplitudes.
C.The camera sensors must have identical peak sensitivities at exactly $450$nm, $550$nm, and $650$nm.
D.The camera must have a linear dynamic range exceeding 14 bits per channel.
Correct Answer: The camera sensor spectral sensitivities must be perfectly non-overlapping (narrowband or Dirac delta functions).
Explanation:
The von Kries diagonal model assumes that the effect of an illuminant change can be modeled as independent scalar multipliers on the R, G, and B channels. This independent scaling only holds mathematically true if the sensor spectral sensitivities do not overlap (often modeled as Dirac delta functions).
Incorrect! Try again.
51In a range image, pixel values represent distance from a reference plane or sensor. If a range image is generated using perspective projection, how does the spatial resolution (the physical area covered by a single pixel) change as the depth value increases?
Introduction to range images
Hard
A.The physical area covered by a pixel decreases linearly with depth.
B.The physical area covered by a pixel increases logarithmically with depth.
C.The spatial resolution remains strictly constant regardless of depth.
D.The physical area covered by a pixel increases quadratically with depth.
Correct Answer: The physical area covered by a pixel increases quadratically with depth.
Explanation:
Under perspective projection, the physical width and height that a pixel projects onto in the real world both increase linearly with depth . Therefore, the physical area (width height) expands quadratically () as depth increases.
Incorrect! Try again.
52When computing geometric properties, how does the evaluation of Gaussian Curvature on a 3D range image fundamentally differ from applying edge detection (e.g., Canny) on a 2D intensity image?
Difference between 2D intensity images and 3D range images
Hard
A.There is no fundamental difference; both operators mathematically evaluate the second-order partial derivatives of the image grid.
B.Gaussian Curvature is an extrinsic property dependent on the viewpoint, whereas 2D edge detection is invariant to perspective transformations.
C.Gaussian Curvature requires a perfectly Lambertian surface to be computed accurately, whereas 2D edge detection works on any surface type.
D.Gaussian Curvature measures an intrinsic, viewpoint-invariant geometric property of the underlying surface, whereas 2D edges are heavily dependent on viewpoint, illumination, and albedo.
Correct Answer: Gaussian Curvature measures an intrinsic, viewpoint-invariant geometric property of the underlying surface, whereas 2D edges are heavily dependent on viewpoint, illumination, and albedo.
Explanation:
Gaussian Curvature is an intrinsic geometric property of a 3D surface, meaning it does not change based on how the surface is oriented, illuminated, or colored. In contrast, 2D intensity edges are highly susceptible to shadows, lighting changes, and texture differences.
Incorrect! Try again.
53A key challenge in 2D computer vision is 'scale ambiguity', which is generally resolved in 3D range images. However, 3D range imaging suffers from a distinct physical limitation not typically affecting ideal 2D pinhole cameras. Which of the following is it?
Difference between 2D intensity images and 3D range images
Hard
A.Range images are inherently limited by the speed of light, rendering them useless for moving objects.
B.Range images cannot resolve high-frequency texture information due to chromatic aberration.
C.Range images suffer from perspective foreshortening, making distant objects appear smaller.
D.Active range sensors inherently struggle with multi-path interference and specular reflections that distort geometry.
Correct Answer: Active range sensors inherently struggle with multi-path interference and specular reflections that distort geometry.
Explanation:
Unlike standard 2D cameras capturing passive light, active range sensors (ToF, structured light, LiDAR) emit their own signals. These signals can bounce off multiple surfaces (multi-path) or scatter unpredictably off specular (shiny) surfaces, causing severe geometric distortions in the depth map.
Incorrect! Try again.
54In an Amplitude Modulated Continuous Wave (AMCW) Time-of-Flight sensor, the distance is calculated using the phase shift of the modulated signal at frequency . The measurement is subject to phase wrapping. What is the expression for the unambiguous range interval ?
Active range sensors
Hard
A.
B.
C.
D.
Correct Answer:
Explanation:
In an AMCW ToF sensor, the light travels to the object and back, covering a distance of . Phase wrapping occurs at . Therefore, , where . Solving for gives .
Incorrect! Try again.
55Structured light scanners project known patterns to establish corresponding points. In a highly cluttered scene with strong inter-reflections, a standard binary Gray code projection often fails. Why is phase-shifting (using sinusoidal patterns) often combined with Gray codes to mitigate this?
Active range sensors
Hard
A.Phase-shifting relies on high-frequency temporal changes that inherently cancel out low-frequency inter-reflections.
B.Phase-shifting provides continuous sub-pixel accuracy that relies on relative intensity rather than absolute thresholding, though it requires Gray codes to unwrap the phase globally.
C.Phase-shifting uses distinct color channels to separate the direct signal from the inter-reflected signal.
D.Gray codes cause heavy quantization errors that physically scatter photons, whereas smooth sinusoids do not.
Correct Answer: Phase-shifting provides continuous sub-pixel accuracy that relies on relative intensity rather than absolute thresholding, though it requires Gray codes to unwrap the phase globally.
Explanation:
Binary Gray codes require hard intensity thresholding, which fails when inter-reflections alter local intensities. Phase-shifting calculates correspondence using the relative phase of sinusoids, which is more robust to intensity offsets, achieving sub-pixel accuracy. Gray codes are then just used to resolve the phase ambiguity (unwrapping).
Incorrect! Try again.
56When smoothing a noisy range image (depth map), applying a standard linear Gaussian filter causes severe blurring across object boundaries (depth discontinuities). The Bilateral Filter solves this by modifying the convolution kernel. How does the Bilateral Filter achieve this mathematically?
Preprocessing of range data
Hard
A.It weights adjacent pixels not only by their spatial Euclidean distance but also by the difference in their depth values, assigning lower weights to pixels across depth discontinuities.
B.It applies the Fourier transform to remove high-frequency components before performing spatial convolution.
C.It calculates the surface normals first and only convolves pixels whose normals are strictly parallel.
D.It replaces the weighted average with the median value of the local neighborhood, which naturally preserves edges.
Correct Answer: It weights adjacent pixels not only by their spatial Euclidean distance but also by the difference in their depth values, assigning lower weights to pixels across depth discontinuities.
Explanation:
The bilateral filter combines a spatial kernel (standard Gaussian based on pixel distance) with a range kernel (Gaussian based on intensity or depth difference). If a neighboring pixel has a drastically different depth (an edge), its range weight drops to near zero, preventing it from blurring across the boundary.
Incorrect! Try again.
57In the Iterative Closest Point (ICP) algorithm used for range data registration, using a 'point-to-plane' error metric instead of a 'point-to-point' metric generally results in which of the following optimization behaviors?
Preprocessing of range data
Hard
A.It turns the optimization into a strictly convex problem, preventing convergence to local minima.
B.It restricts transformations to purely translations, eliminating rotational degrees of freedom.
C.It allows the source points to slide smoothly along the target surfaces, leading to significantly faster convergence in flat regions.
D.It eliminates the need for an initial alignment guess, guaranteeing a global optimum.
Correct Answer: It allows the source points to slide smoothly along the target surfaces, leading to significantly faster convergence in flat regions.
Explanation:
The point-to-plane metric minimizes the distance between the source point and the tangent plane of the target point. This allows sliding along planar surfaces without penalty, which massively accelerates convergence compared to point-to-point, which penalizes any movement away from the exact corresponding point.
Incorrect! Try again.
58'Flying pixels' are artifacts in ToF range images occurring at depth discontinuities, caused by the sensor integrating light from both foreground and background objects within a single pixel. Which morphological filter approach is most suited to systematically identify and remove these without eroding legitimate thin structures?
Preprocessing of range data
Hard
A.Applying a Laplacian of Gaussian (LoG) filter and removing zero-crossings.
B.Checking local gradient magnitude; pixels with a depth gradient exceeding a dynamic threshold relative to the surrounding variance are flagged and removed.
C.Applying a standard 3x3 median filter iteratively until convergence.
D.Global intensity thresholding based on the active illumination amplitude.
Correct Answer: Checking local gradient magnitude; pixels with a depth gradient exceeding a dynamic threshold relative to the surrounding variance are flagged and removed.
Explanation:
Flying pixels represent false depth values bridging the foreground and background. They create a ramp-like edge (high gradient). Identifying pixels with steep, unnatural depth gradients compared to the local neighborhood variance is the standard way to detect and mask out these mixed-integration artifacts.
Incorrect! Try again.
59To estimate the surface normal at a specific point in a 3D point cloud (derived from range data), one standard method involves applying Principal Component Analysis (PCA) to the local -nearest neighbors. What does the eigenvector corresponding to the smallest eigenvalue of the local covariance matrix represent?
Applications of range data
Hard
A.The principal axis of the local surface texture.
B.The vector pointing directly toward the sensor origin.
C.The direction of maximum curvature on the surface.
D.The estimated surface normal vector at that point.
Correct Answer: The estimated surface normal vector at that point.
Explanation:
When doing PCA on a local neighborhood of points on a surface, the two largest eigenvalues correspond to the eigenvectors spanning the tangent plane (directions of maximum variance). The smallest eigenvalue corresponds to the direction of minimum variance, which is orthogonal to the tangent plane, i.e., the surface normal.
Incorrect! Try again.
60In 3D object recognition from range data, a 'Spin Image' is a local descriptor. How does the Spin Image achieve invariance to rigid 3D transformations (rotations and translations)?
Applications of range data
Hard
A.By defining the descriptor strictly through the eigenvalues of the global covariance matrix, which are invariant to rotation.
B.By establishing a local cylindrical coordinate system centered at a specific point, aligned with that point's surface normal, and accumulating neighbor points into a 2D histogram of radial distance and elevation.
C.By projecting the 3D data onto the three principal Cartesian planes (XY, YZ, XZ) and extracting 2D SIFT features.
D.By computing a global 3D Fourier transform of the entire object and discarding the phase information.
Correct Answer: By establishing a local cylindrical coordinate system centered at a specific point, aligned with that point's surface normal, and accumulating neighbor points into a 2D histogram of radial distance and elevation.
Explanation:
A Spin Image is created by taking a point and its normal to define a local reference frame. As the reference frame moves and rotates rigidly with the object, the relative cylindrical coordinates (radial distance and height along the normal) of neighboring points remain constant, granting 6-DOF pose invariance.