However, it is unreasonable to expect such a model to be a perfect
match to the depicted object—the visual world is too varied to
ever be captured perfectly no matter how large the dataset. Therefore,
our approach deals with several types of mismatch between
the photographed object and the stock 3D model:
Geometry Mismatch. Interestingly, even among standard, massproduced
household brands (e.g., detergent bottles), there are often
subtle geometric variabilities as manufacturers tweak the shape of
their products. Of course, for natural objects (e.g., a banana), the
geometry of each instance will be slightly different. Even in the
cases when a perfect match could be found (e.g., a car of a specific
make, model, and year), many 3D models are created with artistic
license and their geometry will likely not be metrically accurate, or
there are errors due to scanning.
Apperance Mismatch. Although both artists and scanning techniques
often provide detailed descriptions of object appearance
(surface reflectance), these descriptions may not match the colors
and textures (and aging and weathering effects) of the particular
instance of the object in the photograph.
Illumination Mismatch. To perform realistic manipulations in
3D, we need to generate plausible lighting effects, such as shadows
on an object and on contact surfaces. The environment illumination
that generates these effects is not known a priori, and the user may
not have access to the original scene to take illumination measurements
(e.g., in dynamic environments or for legacy photographs).
Our approach uses the pixel information in visible parts of the
object to correct the three sources of mismatch. The user semiautomatically
aligns the stock 3D model to the photograph using a
real-time geometry correction interface that preserves symmetries
in the object. Using the aligned model and photograph, our approach
automatically estimates environment illumination and appearance
information in hidden parts of the object. While a photograph
and 3D model may still not contain all the information needed
to precisely recreate the scene, our approach sufficiently approximates
the illumination, geometry, and appearance of the underlying
object and scene to produce plausible completion of uncovered areas.
Indeed, as shown by the user study in Section 8, our approach
plausibly reveals hidden areas of manipulated objects.
The ability to manipulate objects in 3D while maintaining realism
greatly expands the repertoire of creative manipulations that can
be performed on a photograph. Users are able to quickly perform
object-level motions that would be time-consuming or simply impossible
in 2D. For example, from just one photograph, users can
cause grandma’s car to perform a backflip, and fake a baby lifting a
heavy sofa. We tie our approach to standard modeling and animation
software to animate objects from a single photograph. In this
way, we re-imagine typical Photoshop edits—such as object rotation,
translation, rescaling, deformation, and copy-paste—as object
manipulations in 3D, and enable users to more directly translate
what they envision into what they can create.
Contributions. Our key contribution is an approach that allows
out-of-plane 3D manipulation of objects in consumer photographs,
while providing a seamless break from the original image. To do
so, our approach leverages approximate object symmetries and a
new non-parametric model of image-based lighting for appearance
completion of hidden object parts and for illumination-aware compositing
of the manipulated object into the image. We make no assumptions
on the structure or nature of the object being manipulated
beyond the fact that an approximate stock 3D model is available.
Assumptions. In this paper, we assume a Lambertian model of
illumination. We do not model material properties such as refraction