This paper presents a system for 3D reconstruction of
large-scale outdoor scenes based on monocular motion
stereo. Ours is the first such system to run at interactive
frame rates on a mobile device (Google Project Tango
Tablet), thus allowing a user to reconstruct scenes “on the
go” by simply walking around them. We utilize the device’s
GPU to compute depth maps using plane sweep stereo. We
then fuse the depth maps into a global model of the environment
represented as a truncated signed distance function in
a spatially hashed voxel grid. We observe that in contrast to
reconstructing objects in a small volume of interest, or using
the near outlier-free data provided by depth sensors, one
can rely less on free-space measurements for suppressing
outliers in unbounded large-scale scenes. Consequently, we
propose a set of simple filtering operations to remove unreliable
depth estimates and experimentally demonstrate the
benefit of strongly filtering depth maps. We extensively evaluate
the system with real as well as synthetic datasets.