Unit3 - Subjective Questions
INT345 • Practice Questions with Detailed Answers
Explain the fundamental concepts of Epipolar Geometry. Define epipole, epipolar line, and epipolar plane.
Epipolar Geometry is the intrinsic projective geometry between two views. It is independent of scene structure and depends only on the cameras' internal parameters and relative pose.
Key definitions:
- Epipole: The point of intersection of the line joining the camera centers (the baseline) with the image plane. Equivalently, the epipole is the image in one view of the camera center of the other view.
- Epipolar Plane: A plane containing the baseline and a 3D scene point . This plane intersects the two image planes.
- Epipolar Line: The intersection of an epipolar plane with the image plane. For a given point in one image, its corresponding point in the other image must lie on the corresponding epipolar line. This is known as the epipolar constraint, which reduces the search for corresponding points from 2D to 1D.
State and explain the epipolar constraint equation using the Fundamental Matrix.
The epipolar constraint relates a point in the first image to its corresponding point in the second image.
Mathematically, it is expressed as:
Where:
- is the homogeneous coordinate of a point in the first image .
- is the homogeneous coordinate of the corresponding point in the second image .
- is the Fundamental Matrix.
Explanation:
The term represents the epipolar line in the second image corresponding to the point in the first image (i.e., ). The condition that the point must lie on this line is given by the dot product being zero: . Substituting , we get . This constraint means that to find a match for in the second image, we only need to search along the 1D line .
Define the Fundamental Matrix. What are its key mathematical properties?
The Fundamental Matrix () is a matrix that encapsulates the epipolar geometry of two uncalibrated cameras. It relates corresponding points in stereo images via .
Key Properties:
- Rank 2: is a matrix but its determinant is zero (). This implies its rank is 2.
- Degrees of Freedom (DOF): has 7 degrees of freedom. A matrix has 9 elements, minus 1 for scale (since it's homogeneous), and minus 1 for the rank-2 constraint ().
- Point-to-line mapping: maps a point in the first image to an epipolar line in the second image: .
- Epipoles: The left and right epipoles and are the null spaces of and respectively. Thus, and .
- Transpose: If is the fundamental matrix for the camera pair , then is the fundamental matrix for the pair .
Distinguish between the Fundamental Matrix () and the Essential Matrix ().
Both matrices relate corresponding points in two views, but they operate in different spaces based on camera calibration.
-
Essential Matrix ():
- Operates on calibrated cameras (camera intrinsics and are known).
- Relates normalized image coordinates: .
- Encodes only the relative pose (rotation and translation ) between the cameras: .
- Has 5 degrees of freedom (3 for rotation, 2 for translation direction).
- Its singular values are of the form .
-
Fundamental Matrix ():
- Operates on uncalibrated cameras (intrinsics are unknown).
- Relates raw pixel coordinates: .
- Encodes both camera intrinsics and relative pose: .
- Has 7 degrees of freedom.
- Its singular values are of the form with no specific constraint on and other than being non-zero.
Describe the basic (unnormalized) 8-point algorithm for estimating the Fundamental Matrix.
The 8-point algorithm computes the Fundamental Matrix from a set of point correspondences.
Steps:
- Formulate the equation: For each corresponding pair and , the epipolar constraint is . Expanding this gives a linear equation in the 9 unknown entries of : .
- Construct matrix : Stack equations from point pairs to form an matrix . The system becomes , where is the 9-vector of 's entries.
- Solve for : Solve using Singular Value Decomposition (SVD). The solution for is the right singular vector corresponding to the smallest singular value of .
- Enforce rank-2 constraint: The estimated might not have rank 2 due to noise. Perform SVD on : . Set the smallest singular value in to 0 to get . Reconstruct the final matrix as .
Why is data normalization required before applying the 8-point algorithm? Explain the steps of the Normalized 8-point algorithm.
Need for Normalization: In the basic 8-point algorithm, pixel coordinates (e.g., ) create entries in matrix of vastly different magnitudes (e.g., can be $250,000$ while the constant term is $1$). This makes the matrix highly ill-conditioned, leading to numerical instability and poor estimates of .
Steps of Normalized 8-point algorithm:
- Translation: Compute the centroid (mean) of the points in each image. Translate the points so their centroid is at the origin .
- Scaling: Scale the translated points so that the average distance from the origin is . This means the average coordinates are roughly . Let and be the transformation matrices for image 1 and 2.
- Compute : Apply the standard 8-point algorithm to the normalized points and to find a fundamental matrix (including the rank-2 enforcement step).
- Denormalization: Transform the matrix back to the original coordinate space using the relation: .
Derive or construct the transformation matrix used for data normalization in the normalized 8-point algorithm.
The normalization transformation consists of a translation followed by a uniform scaling.
- Translation: Let the centroid of the points in the image be . The points are shifted by .
- Scaling: Compute the mean distance of the shifted points from the origin. Let this mean distance be . The scale factor is chosen such that the new mean distance is . Thus, .
The combined 2D affine transformation applied to homogeneous coordinates is:
Multiplying the matrices gives the final transformation matrix :
Explain the Algebraic Minimization Algorithm (using SVD) for solving homogeneous linear equations like in Computer Vision.
Many computer vision problems (like estimating , homography, or triangulation) reduce to solving a homogeneous system of linear equations of the form .
Because of noise, an exact solution (other than the trivial solution ) rarely exists. Instead, we seek an that minimizes the algebraic error . To avoid the trivial solution, we impose a constraint on the scale of , typically .
Algorithm using SVD:
- The problem is to minimize subject to .
- We decompose the matrix using Singular Value Decomposition: , where and are orthogonal matrices and is a diagonal matrix of singular values in descending order.
- The norm becomes . Since is orthogonal, this is equivalent to . Let .
- The constraint becomes , which means (since is orthogonal).
- We now minimize subject to .
- Since are sorted descending, the minimum is achieved by setting .
- Substituting back , the solution for is the last column of (or the right singular vector corresponding to the smallest singular value).
What is Geometric Distance Computation in the context of Fundamental Matrix estimation? Explain the Sampson error.
Geometric Distance Computation involves estimating a model by minimizing physically meaningful errors in the image plane (reprojection error), as opposed to algebraic error which minimizes a mathematical artifact. For the Fundamental matrix, the true geometric error involves finding the closest points on the epipolar lines to the measured points such that .
Because minimizing true geometric error is highly non-linear and computationally expensive, the Sampson error is used as a first-order approximation.
Sampson Error:
For a point correspondence and a given , the algebraic error is . The Sampson error scales this algebraic error based on the gradients of the constraint.
It is given by:
Where denotes the -th component of the vector .
The denominator represents the sum of squared distances from the points to their respective estimated epipolar lines. Minimizing the Sampson error yields results very close to minimizing the true geometric error but is computationally much faster.
Compare Algebraic Error minimization and Geometric Distance minimization in estimating the Fundamental Matrix.
Algebraic Error Minimization:
- Concept: Minimizes a mathematical residual, typically , where contains point coordinates and contains parameters of .
- Solution: Linear, solved directly using Singular Value Decomposition (SVD) (e.g., 8-point algorithm).
- Advantages: Very fast, easy to implement, provides a good initial estimate, guarantees a global minimum for the algebraic cost function.
- Disadvantages: The error term has no physical or geometric meaning. It is sensitive to noise and coordinate scale (unless normalized).
Geometric Distance Minimization:
- Concept: Minimizes the physical distance between measured points and estimated models (e.g., perpendicular distance from points to epipolar lines).
- Solution: Non-linear optimization (like Levenberg-Marquardt) is required.
- Advantages: Statistically optimal, physically meaningful, robust to noise, produces the most accurate estimation of .
- Disadvantages: Computationally expensive, iterative process, requires a good initial guess (often provided by the algebraic method) to avoid falling into local minima.
Explain the concept of Camera Motion (Ego-motion) and how it relates to 3D reconstruction.
Camera Motion (Ego-motion) refers to estimating a camera's position and orientation (pose) relative to a static environment as it moves through space.
Relation to 3D Reconstruction:
- To reconstruct a 3D scene from 2D images, multiple views of the scene are required. These multiple views are often obtained by a single moving camera.
- If the camera's motion (translation and rotation) between frames is known or can be estimated, we establish a baseline and epipolar geometry.
- By matching features (like points or edges) between the frames, and knowing the camera motion, we can use triangulation to determine the depth (3D position) of those features.
- Essentially, 3D reconstruction pipelines rely on two main pillars: estimating structure (3D points) and estimating motion (camera poses). Ego-motion estimation provides the rigid transformation between camera views necessary for the Linear Triangulation step to project rays into space and find their intersection.
Describe the common 2D motion models (Translation, Rigid, Affine, Projective) used in computer vision. State their degrees of freedom.
Motion models describe how an image patch or the entire image transforms from one frame to the next.
- Translation (2 DOF):
- Models simple displacement along X and Y axes.
- Equation: , .
- Preserves orientation, length, and angles.
- Rigid (Euclidean) (3 DOF):
- Models translation plus 2D rotation.
- Equation:
- Preserves distances and angles (shape).
- Affine (6 DOF):
- Models translation, rotation, scaling, and shear.
- Represented by a matrix.
- Preserves parallelism of lines, but not lengths or angles.
- Projective (Homography) (8 DOF):
- Models the transformation of a plane under perspective projection.
- Represented by a matrix (with scale ambiguity).
- Preserves straight lines, but parallelism, lengths, and angles can be distorted.
Define Optical Flow and derive the Optical Flow constraint equation (Brightness Constancy Equation).
Optical Flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or camera. It is a 2D vector field indicating the displacement of pixels.
Derivation of the Constraint Equation:
Assumption: The intensity (brightness) of a pixel representing a moving object remains constant between consecutive frames (Brightness Constancy).
Let be the image intensity at pixel and time . After a small time interval , the pixel moves by and .
Expanding the right side using a first-order Taylor series:
Equating the two and canceling :
Dividing by gives the Optical Flow constraint equation:
Where are spatial image gradients, is the temporal gradient, and are the optical flow vectors.
Explain the Lucas-Kanade method for computing optical flow. What assumptions does it make?
The Lucas-Kanade method is a widely used differential method for estimating optical flow.
The Problem: The optical flow constraint equation has two unknowns ( and ) for every pixel, making it ill-posed (this is the aperture problem).
Lucas-Kanade Assumptions:
- Brightness Constancy: Pixel intensity doesn't change between frames.
- Small Motion: Points do not move very far.
- Spatial Coherence (Local Window): Neighboring pixels belonging to the same surface have similar motion. The method assumes that the flow vector is constant in a small local window (e.g., or ) around the pixel of interest.
Method:
For a window of pixels, we get equations:
for .
This forms an overdetermined linear system , where contains the spatial gradients, , and contains temporal gradients.
The system is solved using the least squares method:
This requires the matrix (the structure tensor) to be invertible, which means the window must contain texture/gradients in at least two directions (like a corner).
Discuss the Aperture Problem in optical flow and how local methods attempt to solve it.
The Aperture Problem:
When viewing a moving edge through a small aperture (or local window), it is impossible to determine the true motion of the object. Only the component of motion perpendicular (normal) to the edge can be observed. Any motion parallel to the edge leaves the appearance unchanged. Mathematically, this corresponds to the optical flow constraint equation being one equation with two unknowns, meaning there are infinitely many solutions for along a line.
How local methods solve it:
Methods like Lucas-Kanade solve the aperture problem by assuming spatial coherence. They assume that a small local neighborhood (window) of pixels moves with the exact same velocity .
By taking multiple pixels in the window, they generate a system of multiple equations for the two unknowns. As long as the window contains edges oriented in different directions (e.g., a corner or a textured patch), the intersection of these constraint lines uniquely determines the true flow vector, effectively overcoming the aperture problem.
How does the Horn-Schunck method differ from the Lucas-Kanade method in calculating optical flow?
Both are differential methods to calculate optical flow, but they use different assumptions to overcome the aperture problem.
-
Lucas-Kanade (Local Method):
- Assumption: Flow is constant in a small local neighborhood (spatial coherence).
- Approach: Solves an overdetermined least-squares problem for each window independently.
- Pros: Robust to noise, fast to compute.
- Cons: Cannot estimate flow in completely flat or 1D edge regions (matrix becomes singular); produces a sparse flow field unless interpolation is used.
-
Horn-Schunck (Global Method):
- Assumption: Flow is smooth over the entire image. Neighboring pixels should have similar velocities.
- Approach: Formulates a global energy minimization problem containing a data term (brightness constancy) and a smoothness term (magnitude of flow gradients).
- Pros: Yields a dense flow field (a vector for every pixel), handles textureless regions by interpolating from boundaries.
- Cons: Computationally expensive (requires iterative solution over the whole image), sensitive to noise, and struggles at object boundaries where flow is actually discontinuous.
Explain the Linear Triangulation method for 3D reconstruction. Provide the underlying mathematical formulation.
Linear Triangulation is a method used to find the 3D position of a point in space given its 2D projections and in two images, and the known camera projection matrices and .
Mathematical Formulation:
The fundamental projection equation is and . Because these are homogeneous coordinates, and represent the same point up to a scale factor. This means they are collinear vectors in 3D projective space, so their cross product is zero:
For the first camera, let and be the -th row of .
The cross product yields three equations, two of which are linearly independent:
Applying the same logic to the second camera and yields two more equations. These four equations can be stacked into a linear system , where is the unknown 3D homogeneous coordinate. Solving this system gives the 3D point.
Detail the steps to solve the linear triangulation problem using the Direct Linear Transformation (DLT) approach.
Steps to solve DLT for Triangulation:
- Given 2D points and , and camera matrices and . Let be the rows of and be the rows of .
- Construct the matrix from the cross-product equations and :
Here, is a matrix and the system is . - Due to noise, an exact intersection of rays rarely exists, so we minimize the algebraic error subject to .
- Apply Singular Value Decomposition (SVD) on matrix : .
- The solution for is the unit right singular vector corresponding to the smallest singular value of . This is the last column of matrix .
- The result is a homogeneous 4D vector . Convert it to standard 3D coordinates by dividing by the last component: .
Why is the linear triangulation method sometimes suboptimal? Distinguish between algebraic and geometric errors in triangulation.
Linear triangulation using DLT is sometimes suboptimal because it minimizes an algebraic error rather than a meaningful geometric error.
- Algebraic Error: In linear triangulation, we solve by minimizing . The entries of depend on image coordinates and camera matrices. This algebraic residual lacks physical meaning. Furthermore, it treats coordinates unequally (e.g., points near the center of the image have different weight contributions compared to points at the edges), which makes it sensitive to projective transformations and noise.
- Geometric Error: The optimal approach minimizes the reprojection error. This means finding a 3D point that projects to points and on the images such that the sum of squared distances is minimized.
Because the reprojection error function is highly non-linear, linear triangulation is typically used to get a fast initial estimate, which is then refined using non-linear optimization (like Bundle Adjustment or Levenberg-Marquardt) to minimize the true geometric error.
Summarize the complete pipeline for Sparse 3D Reconstruction from two stereo images.
The pipeline for sparse 3D reconstruction from a stereo pair involves several sequential steps:
- Feature Detection and Matching: Extract salient features (like SIFT, SURF, ORB) from both images. Match these features to obtain a set of 2D point correspondences .
- Robust Estimation of Epipolar Geometry: Use an algorithm like RANSAC with the Normalized 8-point algorithm to estimate the Fundamental Matrix (). RANSAC helps to reject outliers (incorrect matches).
- Compute Essential Matrix: If intrinsic parameters are known (calibrated cameras), compute the Essential Matrix .
- Camera Pose Estimation (Motion): Decompose the Essential Matrix using SVD to extract the relative camera motion: Rotation and Translation . There are four possible solutions; the correct one is identified by checking which configuration places the 3D points in front of both cameras (Cheirality check).
- Triangulation (Structure): With known camera projection matrices ( and ) and the 2D matches, use Linear Triangulation to compute the 3D coordinates of the points.
- Refinement (Optional but recommended): Apply non-linear optimization (Bundle Adjustment) to jointly refine the 3D point coordinates and camera parameters by minimizing the geometric reprojection error.