Unit 3 - Notes

INT345 9 min read

Unit 3: Stereo Geometry, Camera motion and 3D Reconstruction

1. Epipolar Geometry

Epipolar geometry is the intrinsic projective geometry between two views. It is independent of scene structure and depends only on the cameras' internal parameters and relative pose (rotation and translation). It fundamentally constrains the search space for corresponding points between two images from 2D to 1D.

Key Concepts and Terminology

Baseline: The line segment connecting the optical centers of the two cameras ( $O_1$ and $O_2$ ).
Epipole ( $e_1, e_2$ ): The point of intersection of the baseline with the image plane. Equivalently, the epipole is the image of the optical center of one camera in the other camera's image plane.
Epipolar Plane: The plane formed by a 3D point $X$ and the two optical centers $O_1$ and $O_2$ .
Epipolar Line: The intersection of the epipolar plane with the image plane. All epipolar lines in an image intersect at the epipole.
Epipolar Constraint: If a point $x$ in the first image corresponds to a 3D point $X$ , the corresponding point $x'$ in the second image must lie on the epipolar line corresponding to $x$ .

2. Fundamental Matrix ( $F$ )

The Fundamental Matrix is a $3 \times 3$ matrix of rank 2 that algebraically represents epipolar geometry. It maps a point in one image to its corresponding epipolar line in the other image.

Mathematical Definition

For any pair of corresponding points $x$ (in image 1) and $x'$ (in image 2) expressed in homogeneous coordinates, the fundamental matrix $F$ satisfies the equation:
$x'^T F x = 0$

Properties of the Fundamental Matrix

Size and Rank: It is a $3 \times 3$ matrix with rank 2. The determinant is zero ( $\det(F) = 0$ ).
Degrees of Freedom: It has 7 degrees of freedom (9 elements - 1 for scale - 1 for $\det(F) = 0$ ).
Mapping to Lines: $l' = Fx$ is the epipolar line in the second image corresponding to point $x$ in the first image. Similarly, $l = F^T x'$ is the epipolar line in the first image.
Epipoles: The epipoles are the null spaces of $F$ . $F e = 0$ and $F^T e' = 0$ .

3. Normalized 8-Point Algorithm

The 8-point algorithm is a linear method to compute the Fundamental Matrix using 8 or more corresponding point pairs. The normalized version is crucial because raw pixel coordinates lead to ill-conditioned matrices, causing high numerical instability.

Steps of the Normalized 8-Point Algorithm

Normalization:
- Translate the points in both images so that their centroid is at the origin $(0,0)$ .
- Scale the points so that the average distance from the origin is $\sqrt{2}$ .
- Apply these transformations via matrices $T$ and $T'$ such that $\hat{x} = Tx$ and $\hat{x}' = T'x'$ .
Build the Matrix $A$ :
- Using the constraint $\hat{x}'^T \hat{F} \hat{x} = 0$ , construct a linear system $Af = 0$ , where $f$ is the $9 \times 1$ vector containing the entries of $\hat{F}$ .
- Each point correspondence contributes one row to $A$ : $[x'_i x_i, x'_i y_i, x'_i, y'_i x_i, y'_i y_i, y'_i, x_i, y_i, 1]$ .
Solve for $\hat{F}$ (Linear Solution):
- Find the vector $f$ that minimizes $\|Af\|$ subject to $\|f\| = 1$ .
- The solution is the right singular vector of $A$ corresponding to the smallest singular value (computed via Singular Value Decomposition - SVD).
Enforce Rank-2 Constraint:
- The computed $\hat{F}$ will likely have rank 3 due to noise.
- Perform SVD on $\hat{F} = U \Sigma V^T$ .
- Set the smallest singular value in $\Sigma$ to 0 to get $\Sigma'$ .
- Reconstruct the rank-2 matrix: $\hat{F}_{rank2} = U \Sigma' V^T$ .
Denormalization:
- Transform the fundamental matrix back to the original image coordinate space: $F = T'^T \hat{F}_{rank2} T$ .

4. Algebraic Minimization Algorithm & Geometric Distance Computation

While the 8-point algorithm provides a linear solution based on algebraic error, it does not represent physical, geometric reality optimally. Iterative minimization algorithms are used to refine $F$ by minimizing geometric error.

Algebraic vs. Geometric Error

Algebraic Error: Minimizes $\|Af\|$ . It has no physical meaning and is just a mathematical convenience for linear solvers.
Geometric Distance (Error): Measures the distance between a point and its estimated epipolar line, or the distance between observed points and perfectly consistent projected points.

Geometric Distance Computation

Sampson Error: A first-order approximation to the geometric error. It is computationally cheaper than full reprojection error and highly effective.
$E_{Sampson} = \sum_i \frac{(x_i'^T F x_i)^2}{(Fx_i)_1^2 + (Fx_i)_2^2 + (F^T x_i')_1^2 + (F^T x_i')_2^2}$
where $(Fx)_j$ represents the $j$ -th component of the vector.
Reprojection Error (Gold Standard): Finds the perfectly consistent corresponding points $\hat{x}_i, \hat{x}'_i$ that exactly satisfy $\hat{x}_i'^T F \hat{x}_i = 0$ while being as close as possible to the measured points $x_i, x'_i$ .
$E_{Reproj} = \sum_i (d(x_i, \hat{x}_i)^2 + d(x'_i, \hat{x}'_i)^2)$

Algebraic Minimization Algorithms

To minimize these nonlinear, non-convex geometric cost functions, iterative optimization algorithms are used:

Levenberg-Marquardt (LM) Algorithm: The standard non-linear least-squares optimization technique. It interpolates between the Gauss-Newton algorithm and the method of gradient descent.
Process: Start with an initial guess of $F$ (usually from the normalized 8-point algorithm), parameterize $F$ (often taking care of the rank-2 constraint), and iteratively update the parameters to minimize the Sampson or Reprojection error.

5. Camera Motion and Motion Models

Camera motion describes how a camera moves through a 3D scene, which directly induces apparent motion (optical flow) of objects in the 2D image plane.

Parametric Motion Models

To analyze motion between two frames, we map coordinates $(x,y)$ in frame 1 to $(x',y')$ in frame 2 using motion models of varying complexity:

Translational Model (2 DOF): Assumes pure 2D translation in the image plane.
- $x' = x + t_x$
- $y' = y + t_y$
Euclidean / Rigid Model (3 DOF): Translation plus 2D rotation.
- $\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix}$
Similarity Model (4 DOF): Rigid motion plus uniform scaling.
Affine Model (6 DOF): Accounts for translation, rotation, scaling, and skew. Preserves parallelism.
- $\begin{bmatrix} x' \\ y' \end{bmatrix} = \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} + \begin{bmatrix} t_x \\ t_y \end{bmatrix}$
Projective / Homography Model (8 DOF): Maps any plane to any plane under perspective projection. Preserves straight lines but not parallelism. Highly relevant when a camera rotates purely or observes a planar scene.
- $x' = \frac{h_{11}x + h_{12}y + h_{13}}{h_{31}x + h_{32}y + 1}$ , $y' = \frac{h_{21}x + h_{22}y + h_{23}}{h_{31}x + h_{32}y + 1}$

6. Optical Flow

Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of the object or the camera. It is a 2D vector field $(u, v)$ indicating where every pixel moved.

Brightness Constancy Constraint Equation (BCCE)

The fundamental assumption of optical flow is that the pixel intensity of an object does not change between consecutive frames.
$I(x, y, t) = I(x + \Delta x, y + \Delta y, t + \Delta t)$
Applying a first-order Taylor expansion yields the optical flow equation:
$I_x u + I_y v + I_t = 0$
Where:

$I_x, I_y$ are the spatial image gradients.
$I_t$ is the temporal image gradient.
$u, v$ are the horizontal and vertical optical flow vectors (velocities).

The Aperture Problem

The BCCE provides one equation with two unknowns ( $u, v$ ), meaning the flow cannot be computed locally without further constraints. This is known as the aperture problem.

Algorithms to Compute Optical Flow

Lucas-Kanade Method (Local Method):
- Assumes that optical flow is constant in a small local neighborhood (e.g., a $3 \times 3$ or $5 \times 5$ window) around the pixel.
- Sets up an overdetermined system of equations for the neighborhood and solves it using least squares.
- Works best for small motions and requires distinct features (corners) to avoid singularity.
Horn-Schunck Method (Global Method):
- Introduces a global smoothness constraint. It assumes optical flow is smooth over the entire image.
- Minimizes a global energy functional containing a data term (BCCE) and a smoothness term (penalizing large gradients in the flow field).

7. 3D Reconstruction & Linear Triangulation Method

Once the camera matrices ( $P$ and $P'$ ) and the image correspondences ( $x$ and $x'$ ) are known, 3D reconstruction involves finding the 3D position of the point $X$ . This process is called Triangulation.

Direct Linear Transformation (DLT) Triangulation

In ideal conditions, the rays back-projected from $x$ and $x'$ would intersect exactly at $X$ . Due to noise, they rarely intersect. The Linear Triangulation method finds the best approximation of $X$ algebraically.

Formulation:
- Let $x = P X$ and $x' = P' X$ . In homogeneous coordinates, these denote equality up to a scale factor.
- This collinearity can be expressed using the cross product: $x \times (PX) = 0$ and $x' \times (P'X) = 0$ .
Constructing the Equations:
Let $P_i^T$ be the $i$ -th row of camera matrix $P$ . Expanding $x \times (PX) = 0$ gives three equations, two of which are linearly independent:
$x(P_3^T X) - (P_1^T X) = 0$
$y(P_3^T X) - (P_2^T X) = 0$
Doing the same for the second image ( $x'$ and $P'$ ), we get a system of 4 linear equations for the unknown 3D point $X$ (which is a $4 \times 1$ homogeneous vector).
Matrix Form ( $A X = 0$ ):
$A = \begin{bmatrix} x P_3^T - P_1^T \\ y P_3^T - P_2^T \\ x' P_3'^T - P_1'^T \\ y' P_3'^T - P_2'^T \end{bmatrix}$
Solving via SVD:
- The system $AX = 0$ is solved by finding the unit vector $X$ that minimizes $\|AX\|$ .
- As with the 8-point algorithm, we apply SVD to $A$ ( $A = U \Sigma V^T$ ).
- The solution for $X$ is the last column of $V$ (the right singular vector corresponding to the smallest singular value).
- Convert $X$ back from homogeneous to Euclidean coordinates by dividing by its 4th component.

Unit 2

Unit 4