Given 2 images of an unknown object taken from 2 different views, can we recover the rotation and translation between the 2 views?

The answer is yes! If we can find and match 5 feature points in 2 images, the rotation can be recovered. However translation can only be recovered up to a scale factor. This is because the depth information is lost during the projection. In other words, it is not possible to find how far an object is from the camera from the object’s image.

Furthermore, there are generally 10 solutions to the problem using 5 feature points. To uniquely recover the pose a third view of the object or more feature points is required. Why 5 feature points? It is the minimum number of points required to recover the pose (i.e. rotation and scaled translation) between two arbitrary images.  This riduces the likelihood of false matches that can cause the algorihtm to fail.

Using Quaternions to Represent the Pose Estimation Problem

It is known that 8 point matches admit a linear solution using the famous Essential Matrix.  The existing 5-point algorithms also rely on the Essential matrix to recover the pose. However the Essential matrix has fundamental weaknesses, and introduces these weaknesses into 5-point algorithms that employ it. We propose a new and practical method that eschews the essential matrix by representing the pose estimation problem in the quaternion space and has several advantages:

  • When relative translation between two camera views is zero, the Essential matrix is undefined. Since our proposed algorithm does not use the Essential matrix, this problem is avoided.
  • If the 3D feature points are coplanar, there are generally two solutions for the essential matrix. Since our proposed algorithm does not rely on the Essential matrix, planar structure degeneracy is avoided.
  • Our approach has no secondary decomposition of the Essential matrix into rotation and translation terms; rotation and translation are directly estimated. Furthermore, the depth of the points with respect to both camera frames are simultaneously recovered.
  • The rotation is estimated in the quaternion form. In applications such as computer graphics and controls, quaternions are often the preferred representation of rotation.
  • The algorithm can be easily extended to more than five points. When there are more than six feature points available, the algorithm can be used to uniquely determine the pose difference between two camera views.
  • In simulations, our proposed algorithm shows a good resilience to noise and error in camera calibration.

Applications and Experimental Results

The 5-point pose estimation problem has applications in vision based robot control, also known as visual servoing. Feature points obtained from a vision sensor mounted on a robot are used in the pose estimation algorithm to estimate consecutive robot motions and provide motion feedback to control the robot.

For instance in the experiment bellow, given an image taken from a desired view, the pose estimation algorithm recovers the pose difference between the camera’s current view and the desired view from 5 feature points. The recovered pose is used in the control algorithm as feedback to move the camera to the desired view: