This article describes a technique that can reliably align arbitrary 2D depictions
of an architectural site, including drawings, paintings, and historical
photographs, with a 3D model of the site. This is a tremendously difficult
task, as the appearance and scene structure in the 2D depictions can be very
different from the appearance and geometry of the 3D model, for example,
due to the specific rendering style, drawing error, age, lighting, or change of
seasons. In addition, we face a hard search problem: the number of possible
alignments of the painting to a large 3D model, such as a partial reconstruction
of a city, is huge. To address these issues,we develop a newcompact representation
of complex 3D scenes. The 3D model of the scene is represented
by a small set of discriminative visual elements that are automatically learned
from rendered views. Similar to object detection, the set of visual elements,
as well as the weights of individual features for each element, are learned
in a discriminative fashion. We show that the learned visual elements are
reliably matched in 2D depictions of the scene despite large variations in rendering
style (e.g., watercolor, sketch, historical photograph) and structural
changes (e.g., missing scene parts, large occluders) of the scene.We demonstrate
an application of the proposed approach to automatic rephotography
to find an approximate viewpoint of historical paintings and photographs
with respect to a 3D model of the site. The proposed alignment procedure
is validated via a human user study on a new database of paintings and
sketches spanning several sites. The results demonstrate that our algorithm
produces significantly better alignments than several baseline methods