We present a novel framework for jointly tracking a camera in 3D and reconstructing the 3D model of an observed object. Due to the region based approach, our formulation can handle untextured objects, partial occlusions, motion blur, dynamic backgrounds and imperfect lighting. Our formulation also allows for a very efficient implementation which achieves real-time performance on a mobile phone, by running the pose estimation and the shape optimisation in parallel. We use a level set based pose estimation but completely avoid the, typically required, explicit computation of a global distance. This leads to tracking rates of more than 100 Hz on a desktop PC and 30 Hz on a mobile phone. Further, we incorporate additional orientation information from the phone's inertial sensor which helps us resolve the tracking ambiguities inherent to region based formulations. The reconstruction step first probabilistically integrates 2D image statistics from selected keyframes into a 3D volume, and then imposes coherency and compactness using a total variational regularisation term. The global optimum of the overall energy function is found using a continuous max-flow algorithm and we show that, similar to tracking, the integration of per voxel posteriors instead of likelihoods improves the precision and accuracy of the reconstruction.