The creation of 3D models for use in an interactive virtual
environment is an expensive and tedious process and
is still a challenging problem in computer vision. Typically
the requirement that the virtual environment should
mirror an existing scene demands accurate three dimensional
(3D) geometry, as well as surface materials or tex-
sormann@vrvis.at
tures. Thus, there is a need for a method to directly extract
realistic 3D models from real photographs.
In the fields of photogrammetry and computer vision
many approaches have been developed which allow the
production of photorealistic 3D models [Pollefeys et al.
2000], [Zisserman et al. 2000]. In general, these algorithms
take multiple images of a real environment using
a calibrated camera and then create from these images a
3D structure of the scene. The output of such an algorithm
is a dense point cloud, corresponding to important
features in the scene. These point clouds should be converted
into logical objects in order to create suitable representations
for a virtual environment. Current available
methods for automatic segmentation are not yet robust
enough to build useful geometric models for the visualization,
thus fully automatic segmentation yields to an
ill-posed poblem.
In this paper we discuss how we can make the modeling
process more convenient and efficient. So far there are
two separate research areas in computer vision, one is
the reconstruction problem and the other one the recognition
problem. In our approach we solve the reconstruction
problem by highly redundant information about
the scene, in our case image sequences. The recognition
problem is handed over to a human operator, who is supported
by an intelligent user interface. Thus the operator
can focus on the segmentation and interpretation of the
scene using only one image while the system takes care
about the associated 3D information.
Essentially our interactive modeling system, called VR
Modeler (Virtual Reality Modeler) allows a user to construct
a geometric model of the scene from a set of photographs.
The images are taken with a hand-held digital
consumer camera using short baselines. After some
preprocessing the relative orientation of the image sequences
are calculated fully automatic. Our orientation
method, which is not the topic of this paper, is based on
work described by Horn [Horn 1990], Klaus et al. [Klaus
et al. 2002] and Nister [Nister 2003]. Once we have determined
the relative orientation between all image pairs
we are able to extract 3D information from the photographs
automatically by employing area and feature
based matching techniques. The 3D information consists
of 3D points, 3D lines and 3D surfaces, as illustrated in
Figure 1.
After applying this automatic reconstruction process all