We have proposed a system and methods which build on previous
work for computing depth maps and performing segmentations. To
address inaccuracies in the current depth maps, we have specifically
aimed to make the segmentation refinement, segmentation propagation,
alignment, stereo billboards, and occlusion methods robust to
those inaccuracies. As explained, this is largely achieved by approximating
the objects’ associated 3D point clouds with proxy geometry.
For simplicity, we currently use planar proxy geometry.
Approximating geometry with planar proxies has its limitations.
Planar proxies do not preserve detail depth structure, such as the
grass surface in Figure 13a. As a result of the absence of partial
occlusions the copied object appears to float. Large orientation
changes using planar proxies can introduce distortions. An example
of this is shown in Figure 13b. The copied object (left person)
appears distorted compared to the person in the target image. An alternative
would be to use in-painting [Wang et al. 2008]. However,
high quality in-painting is a difficult task and therefore typically
(a) No partial occlusion (b) Large orientation change
Figure 13: (a) The lack of fine depth structures after planar approximation
makes the copied object appear to float. (b) Large warps
with planar proxy geometry leads to distortions of the copied (left)
object.
limited to only paint in relatively small areas.
Another important problem is that planar stereo billboards may no
longer respect the epipolar geometry, which could result in vertical
disparities that could strongly interfere with the stereopsis. To
evaluate the amount of vertical disparity that is introduced, we use
an object which is not well represented by a plane, shown in Figure
14. The object is copied from the source scene into two target
scenes with the support surface at a different orientation: 10 and
35. For comparison we also show the ground truth images for each
case. The vertical disparities for the 10 case are around 0.8% of
the object height, and for the 35 case around 2.4%. The reader can
evaluate that even for the 35 orientation change the stereo images
can still be comfortably fused. The maximal vertical disparity for
all other result images used in this paper is around 0.5%. Although
the vertical disparity tolerance varies depending on scene content,
for comparison Fukuda et al. [2009] report a tolerance of 45 arcmin
for random dot stereograms. Given a display at 100 dpi, viewed at
a distance of 50 cm, this amounts to a vertical disparity tolerance of
about 26 pixels. The vertical disparity for our 35 case is about 10
pixels. This is well within the reported tolerance, however a more
thorough analysis should be conducted. In summary, our system
produces plausible results for moderate orientation changes. The
limitations for larger orientation changes could be overcome with
more accurate depth reconstruction, but this problem of obtaining
more accurate depth maps is notoriously difficult to solve robustly.
Stereo billboards help to preserve the stereo volume of the copied
source object. However, if the initial depth volume in the source
image is relatively flat, such as for narrow baselines (or interocular),
stereo billboards will not be able to increase the stereo volume
in the target. Furthermore, for large differences in baseline
between source and target, stereo billboards may not be able to preserve
volume. In particular achieving artistic stereo effects such as
hypostereo (gigantism) and hyperstereo (miniaturization) [Koppal
et al. 2010] in copy & paste is an interesting topic for future work.
We may be able to exploit the work by Lang et al. [2010] in such
scenarios.
For plausible appearance of copied objects, we approximate contact
shadows to avoid objects from appearing to float. However, illumination
differences between the source and target images is a larger
problem that we did not address in this paper. This problem is not
specific to 3D, see for example [Lalonde et al. 2007]. Although we
use the color transfer method described by Reinhard et al. [Reinhard
et al. 2001], this does not always give the desired results. For truly
plausible appearance of pasted objects, more information about the
scene illumination should be recovered, and exploited to relight the
objects. The depth map could then also be used for shadow casting
and light attenuation. However, relighting is an active area of
research with no good solution to date.
Segmentations and disparity maps are closely related in that segmentation
boundaries often correspond to depth discontinuities