Since we aim for manual workflows viewed from the user's
perspective, we generally have to deal with close-up images with
frequent or even permanent occlusion of large parts of the
observed image by the hands of the user. As we cannot assume
observability of tools or interaction objects, a profound scene
analysis is often infeasible as the already difficult object detection
is additionally hindered. Instead, we propose a novel measure
derived from image distance that evaluates image properties
jointly without prior interpretation.