Unit2 - Subjective Questions
INT345 • Practice Questions with Detailed Answers
Explain the concept of the pinhole camera model. Derive the mathematical relationship between a 3D point and its 2D projection on the image plane.
Pinhole Camera Model:
The pinhole camera is the simplest mathematical model of a camera. It describes the relationship between the 3D coordinates of a point in space and its 2D projection onto the image plane. It assumes that light enters through a single, infinitely small aperture (the pinhole).
Derivation:
- Let the center of projection be the origin of a 3D coordinate system.
- Let the image plane be situated at a distance (focal length) along the Z-axis, i.e., at .
- Consider a 3D point in the world.
- A ray of light from passes through the origin and intersects the image plane at point .
- By the principle of similar triangles formed by the rays, we can relate the coordinates:
- In homogeneous coordinates, this projection can be represented linearly as:
Significance: This model is the basis for multi-view geometry, though it ignores lens distortion effects.
Why are lenses used in cameras instead of a simple pinhole? Discuss the concepts of focal length, aperture, and depth of field in cameras with lenses.
Need for Lenses:
While a pinhole camera produces images without geometric distortion, a true pinhole lets in very little light, requiring long exposure times. Increasing the pinhole size blurs the image. Lenses are introduced to gather more light (increasing brightness) while simultaneously focusing it onto a sharp point on the image sensor.
Key Concepts:
- Focal Length (): The distance between the lens's optical center and the image sensor when focused on an object at infinity. It determines the field of view and magnification. A longer focal length provides a narrower field of view (telephoto), while a shorter one gives a wider view.
- Aperture: The opening in the lens that controls the amount of light entering the camera. It is usually measured in f-stops (e.g., f/2.8, f/8). A larger aperture (smaller f-stop number) lets in more light.
- Depth of Field (DoF): The distance between the nearest and farthest objects in a scene that appear acceptably sharp in an image.
- Shallow DoF: Achieved with a wide aperture (e.g., f/1.8), blurring the background.
- Deep DoF: Achieved with a narrow aperture (e.g., f/16), keeping mostly everything in focus.
Describe the working principle of CCD (Charge-Coupled Device) cameras. What are some common artifacts associated with CCD sensors?
Working Principle of CCD Cameras:
- A CCD (Charge-Coupled Device) sensor is a silicon-based integrated circuit consisting of a dense matrix of photodiodes (pixels).
- Photoelectric Effect: When photons strike these photodiodes, they generate electron-hole pairs. The number of electrons accumulated at each pixel is proportional to the light intensity striking it.
- Charge Transfer: The accumulated charge is then sequentially transferred across the chip through a shifting process (coupling) to a readout node at the edge of the sensor.
- Digitization: The charge is converted into a measurable voltage, amplified, and passed through an Analog-to-Digital Converter (ADC) to form a digital image.
Common Artifacts in CCDs:
- Blooming: Occurs when a pixel receives too much light (overexposure), and its charge capacity overflows into adjacent pixels, creating streaks or halos around bright light sources.
- Smear: Vertical streaks in the image caused by light hitting the sensor during the charge transfer process, especially if the camera lacks a mechanical shutter.
- Dark Current Noise: Electrons generated by thermal energy rather than light, resulting in "grain" or "noise," especially noticeable in low-light, long-exposure shots.
Define the general projective camera model. Explain the components of the camera projection matrix .
General Projective Camera Model:
The general projective camera model maps a 3D point in world coordinates to a 2D point in the image plane using a projection matrix . Mathematically, for a 3D point and its 2D projection (both in homogeneous coordinates):
Components of the Projection Matrix :
The matrix is decomposed into , representing intrinsic and extrinsic parameters.
- Intrinsic Parameters Matrix (): A upper triangular matrix defining camera-specific internal settings.
Where are focal lengths in pixel dimensions, is the principal point offset, and is the skew parameter (often zero). - Extrinsic Parameters (): A matrix representing the camera's pose in the 3D world.
- (Rotation Matrix): A orthogonal matrix that aligns the world coordinate axes with the camera axes.
- (Translation Vector): A vector representing the position of the world origin in camera coordinates.
The matrix has 11 degrees of freedom (5 intrinsic + 3 rotation + 3 translation).
What are affine cameras? How do they differ from general projective cameras?
Affine Cameras:
An affine camera is a simplified camera model that assumes the depth variation of the object is very small compared to its distance from the camera (i.e., the camera is very far away, or the field of view is very narrow). In this case, parallel lines in 3D remain parallel in the 2D projected image. Examples include orthographic, weak-perspective, and paraperspective projections.
Mathematical Representation:
In an affine camera, the last row of the projection matrix is fixed to .
Differences from General Projective Cameras:
- Parallelism: Affine cameras preserve parallel lines, whereas projective cameras map parallel lines to intersecting lines (meeting at vanishing points).
- Perspective Division: In an affine camera, the homogeneous scale factor (the coordinate in the camera frame) is constant for all points, so no non-linear perspective division is required. General projective cameras require division by depth.
- Matrix Structure: The third row of the projection matrix is strictly in affine, while it contains variables in a general projective camera.
What is camera calibration? Briefly explain the significance of estimating intrinsic and extrinsic parameters.
Camera Calibration:
Camera calibration is the process of estimating the parameters of a camera model (both internal and external) that describe how 3D world points are projected onto the 2D image plane. This is usually done by capturing images of a known calibration pattern (like a checkerboard) from various viewpoints.
Significance of Parameters:
- Intrinsic Parameters: These belong to the camera itself (focal length, principal point, skew, and distortion coefficients).
- Significance: Knowing these allows us to correct lens distortions (converting a barrel-distorted image into a rectilinear one) and relates pixel coordinates to normalized camera coordinates. This is crucial for metric measurements from images.
- Extrinsic Parameters: These describe the camera's pose (Rotation and Translation) relative to the 3D world coordinate system.
- Significance: They allow us to map the camera's location and orientation in a 3D space. This is essential for applications like stereo vision (calculating depth), robotics (navigation and object manipulation), and augmented reality (placing virtual objects accurately in the real world).
Explain the concepts of representation in projective coordinates (homogeneous coordinates) for points and lines in 2D planar geometry.
Representation in Projective Coordinates (Homogeneous Coordinates):
Projective geometry uses homogeneous coordinates to represent points and lines, allowing us to represent points at infinity and apply linear matrix operations for transformations like translation and perspective projection.
- Points in :
A point in a 2D Euclidean space is represented in the 2D projective space by a 3-vector , such that and , where . Any scalar multiple (where ) represents the exact same geometric point. - Lines in :
A line in a 2D plane is given by the equation . In projective space, this line is represented by a 3-vector . - Point-Line Incidence:
A point lies on a line if and only if their inner product is zero:
- Intersection and Joining:
- The intersection of two lines and is a point (cross product).
- The line passing through two points and is .
Discuss the concept of ideal points and the line at infinity in 2D projective space.
Ideal Points (Points at Infinity):
In standard Euclidean geometry, parallel lines never intersect. However, in projective geometry, we introduce the concept of "ideal points" where parallel lines are said to meet.
- If a point in homogeneous coordinates is represented as , taking the limit as gives a point .
- This vector represents an ideal point or a point at infinity. It signifies the direction rather than a specific location.
- Two parallel lines and intersect at the ideal point .
Line at Infinity:
- The set of all ideal points (all points where the third coordinate is 0) forms a line called the "line at infinity."
- Mathematically, it is represented by the vector .
- For any ideal point , it is clear that , meaning all ideal points lie on .
- Projective transformations (like homographies) can map the line at infinity to a finite line in the image, which is the mathematical basis for the appearance of a horizon.
State and explain the principle of Duality in 2D projective geometry.
Principle of Duality:
The principle of duality in 2D projective geometry () is a powerful concept stating that for any valid theorem or geometric property, there exists a "dual" theorem obtained by simply swapping the terms "point" and "line", and their corresponding operations.
Explanation:
- In homogeneous coordinates, both points and lines are represented by 3x1 vectors ( for points, for lines).
- The incidence relation is completely symmetric. It can be read as "point lies on line " or "line passes through point " or "line passes through point ".
Examples of Dual Concepts:
- Theorem: Two distinct points define a single line (their intersection via cross product: ).
Dual: Two distinct lines define a single point (their intersection via cross product: ). - Collinear points: A set of points all lying on a single line.
Concurrent lines: A set of lines all passing through a single point.
This principle allows researchers to prove one theorem and automatically gain the proof of its dual for free.
Define a Homography. How many degrees of freedom does a 2D homography have? Justify your answer.
Definition of Homography:
A homography (also known as a projective transformation or collineation) is an invertible mapping from 2D projective space to itself that maps straight lines to straight lines. Mathematically, it is represented by a non-singular matrix , such that a point maps to via the equation:
where and are in homogeneous coordinates.
Degrees of Freedom (DoF):
A 2D homography matrix has 8 degrees of freedom.
- Justification: The matrix is a matrix, which has 9 elements.
- However, because it operates in homogeneous coordinates, the matrix is only defined up to a non-zero scale factor. If we multiply by any scalar , the resulting point represents the exact same 2D geometric point as .
- Therefore, one parameter is constrained by the scale (often handled by setting or by normalizing the matrix such that ).
- This leaves independent parameters (degrees of freedom).
- Because each 2D point mapping provides 2 constraints, we need at least 4 point correspondences (no three of which are collinear) to compute a homography.
List and explain the key properties of a Homography.
Key Properties of Homography:
- Preservation of Collinearity: A homography maps lines to lines. If three points lie on a single line , their mapped points will also lie on a single mapped line .
- Invertibility: A homography matrix must be non-singular (i.e., its determinant is non-zero, ). Thus, the transformation is invertible. If , then .
- Cross-Ratio Preservation: While distances, angles, and ratios of lengths are generally NOT preserved under a homography, the cross-ratio of four collinear points is strictly preserved.
- Transformation of Lines: If points transform according to , then lines transform according to the inverse transpose rule: .
- Scale Invariance: The transformation is homogeneous; and (where is a non-zero scalar) represent the identical projective transformation.
How is homography used in Image Stitching? Outline the basic steps.
Homography in Image Stitching:
Image stitching combines multiple photographic images with overlapping fields of view to produce a single high-resolution panoramic image. When a camera rotates around its optical center without translating, or when the scene is perfectly planar, the relationship between overlapping images is purely a 2D homography.
Basic Steps:
- Feature Detection and Extraction: Detect keypoints (e.g., using SIFT, SURF, ORB) in the images to be stitched.
- Feature Matching: Match these keypoints between image pairs to establish point correspondences.
- Homography Estimation (RANSAC): Because matches can contain outliers, the RANSAC (Random Sample Consensus) algorithm is used. It randomly selects 4 pairs of matching points, computes the homography matrix (often using DLT), and tests how many other matched points agree with this . The with the most inliers is kept.
- Warping: The target image is geometrically warped using the estimated homography matrix to align its coordinate space with the base image.
- Blending: The overlapping aligned images are blended together (e.g., using alpha blending or multi-band blending) to remove visible seams and exposure differences, forming the final panorama.
Explain the concept of Perspective Correction using homography with a real-world example.
Perspective Correction using Homography:
Perspective distortion occurs when capturing an image of a planar object (like a building facade, a document, or a painting) at an oblique angle. Parallel lines in the real world converge to a vanishing point in the image. Perspective correction aims to warp the image so that the planar surface appears as if it were viewed completely front-on (orthogonally).
How it works:
- We identify four points in the distorted image that we know should form a perfect rectangle in the real world (e.g., the four corners of a document).
- We define a target set of four points that form a perfectly axis-aligned rectangle.
- We compute the homography matrix that maps the four distorted points to the four target rectangular points.
- We apply this homography to the entire image. This effectively "un-tilts" the image plane.
Real-World Example:
Document scanning apps on smartphones (like CamScanner). When you photograph a document on a desk, it looks like a trapezoid. The app detects the document's corners, calculates the homography to a perfect rectangle (standard A4 aspect ratio), and warps the pixels to produce a flat, top-down scan of the document.
Describe the process of Image Rectification in the context of stereo vision.
Image Rectification:
Image rectification is a critical preprocessing step in stereo computer vision. Given two images taken from different viewpoints, rectification applies 2D projective transformations (homographies) to both images so that the corresponding epipolar lines become perfectly horizontal and collinear across the two images.
The Process:
- Epipolar Geometry Estimation: Compute the Fundamental matrix from corresponding points between the left and right images.
- Homography Computation: Compute two homography matrices, for the left image and for the right image. These homographies map the epipoles (the points where the baseline intersects the image planes) to infinity along the horizontal axis (-axis).
- Warping: Apply and to the respective images.
Significance:
After rectification, any 3D point projecting onto in the left image will project onto in the right image. The -coordinate is exactly the same. This reduces the complex 2D search for matching pixels to a much simpler and faster 1D search along the same horizontal row, immensely speeding up dense depth/disparity map computation.
Derive the Direct Linear Transformation (DLT) algorithm used for computing the homography matrix from a set of point correspondences.
Direct Linear Transformation (DLT) for Homography:
Given a set of 2D point correspondences and , we want to find such that .
Derivation:
- The equation implies that the vectors and are in the same direction. Thus, their cross product is zero:
- Let the rows of be . Then .
- Writing out the cross product , we get three equations:
- The third equation is linearly dependent on the first two, so we only use the first two.
- Let be the column vector containing the elements of . We can rewrite the two equations as , where is a matrix:
- For points, we stack the matrices to form a matrix . The solution for is the null space of , typically found via Singular Value Decomposition (SVD), taking the right singular vector associated with the smallest singular value.
What are radial and tangential distortions in cameras with lenses? How are they mathematically modeled?
Lens Distortions:
Unlike the ideal pinhole camera, real lenses introduce optical aberrations causing non-linear distortions in the image.
-
Radial Distortion: Caused by the spherical shape of the lens, light rays bend more near the edges of the lens than at the optical center.
- Barrel distortion: Straight lines curve outwards (common in wide-angle/fisheye lenses).
- Pincushion distortion: Straight lines curve inwards (common in telephoto lenses).
- Mathematical Model: Described using a Taylor series expansion around (where is the distance from the optical center). The corrected coordinates are:
where are radial distortion coefficients.
-
Tangential Distortion: Occurs when the lens array is not perfectly parallel to the image sensor plane, often due to manufacturing defects. It makes some areas of the image look closer than others.
- Mathematical Model:
where are tangential distortion coefficients.
- Mathematical Model:
Define Conics in 2D projective space. How do conics and dual conics transform under a homography ?
Conics in Projective Space:
A conic (ellipse, parabola, hyperbola) in is defined as the locus of points satisfying a second-degree equation. In homogeneous coordinates, this is represented by a symmetric matrix :
The matrix has 6 distinct elements, but being defined up to a scale factor, it has 5 degrees of freedom. Five points in a plane uniquely define a conic.
Transformation of Conics:
If points undergo a homography transformation (which implies ), the equation of the conic becomes:
Thus, the conic matrix transforms as:
Dual Conics (Line Conics):
A dual conic is the envelope of lines tangent to the conic . Its equation is . For a non-degenerate conic, .
The dual conic transforms as:
What is the cross-ratio in projective geometry? Prove that the cross-ratio of four collinear points is invariant under projective transformations.
Cross-Ratio:
The cross-ratio is a fundamental numerical invariant of projective geometry. Given four distinct collinear points , their cross-ratio is defined as:
where denotes the signed distance between point and point .
Proof of Projective Invariance:
- Let the four points lie on a line parameterised by a scalar parameter . The points can be represented in homogeneous coordinates as , where .
- The distance between two points is proportional to .
- The cross ratio can be written in terms of these parameters:
- Let an invertible projective transformation (homography) be applied. The line maps to .
- Let and . The transformed points are .
- Notice that the parameter associated with each point remains unchanged under this linear mapping.
- Since the cross-ratio of the transformed points is computed using the exact same parameters , the resulting value is identical.
Thus, the cross-ratio is invariant under homography.
Distinguish between intrinsic and extrinsic parameters in the context of camera calibration. Provide examples of each.
Intrinsic Parameters:
- Definition: These parameters define the internal properties of the camera and its lens. They relate the camera's internal coordinate system to the idealized camera model.
- Invariance: They remain constant as long as the lens and focal length (zoom) are not altered.
- Examples:
- Focal length ()
- Principal point / Image center ()
- Skew factor ()
- Lens distortion coefficients (radial and tangential ).
Extrinsic Parameters:
- Definition: These parameters specify the position and orientation of the camera relative to a known world 3D coordinate system. They map 3D world points into the camera's 3D coordinate frame.
- Invariance: They change every time the camera moves, rotates, or pans in the environment.
- Examples:
- Rotation Matrix (): A matrix representing roll, pitch, and yaw of the camera.
- Translation Vector (): A vector representing the physical location of the camera's optical center in the world frame.
Explain the concept of camera center in the general projective camera model. How can the camera center be derived directly from the projection matrix ?
Camera Center (Optical Center):
In the general projective camera model, the camera center (or center of projection) is the point in 3D space where all rays of light pass through before hitting the image plane. It is the origin of the camera coordinate system.
Derivation from Matrix :
Let the camera matrix be , which is a matrix. Let the camera center in homogeneous world coordinates be .
- By definition, the camera center is the point for which the projection is undefined (it projects to a null vector), because it sits exactly at the origin of the projection rays.
- Therefore, multiplying the projection matrix by the camera center yields the zero vector:
- This means is the 1-dimensional right null-space of the matrix .
- If we write in block form as , where is a non-singular matrix and is the fourth column.
- Let . Then:
- Thus, the non-homogeneous coordinates of the camera center can be explicitly computed as .