Computers face three main challenges when reconstructing 3-D structures from
pairs of stereoscopic images: 1) finding where things are located from image to image
(correspondences), 2) estimating the original geometry of the cameras, and 3) rendering
the scene. Once a computer has all this information, it becomes possible to recreate the
original 3-D scene.
Figure 1 illustrates a hypothetical reconstruction scenario: we have two images
and correspondences between them. The two planes represent the image planes of the two
cameras, and the O’s correspond to the projection centers. Although the actual scene is
not pictured, it is assumed that the cameras were pointing at the building and placed in
the orientation as shown. This orientation (which normally must be estimated, but is
given in this example) is referred to as the geometry of the cameras.
In this example we are given the relevant information (correspondences and
geometry) to be able to project points out into 3-D. This projection is done by intersecting
a pair of lines (for each pair of correspondences), which are defined by the center of
projection and a correspondence point in each image. Here we have two pairs of
correspondences that have been projected into 3-D, but the goal is to be able to do this for
every possible location in the image.
Once we have a large set of correspondences and have projected them into space
we end up with a large cloud of points that resemble the actual scene. The last step is to
create a surface and texture for this set of points.