For each frame that needs to be processed, a sparse SIFT descriptor must be first computed. Using the hardware and image resolution descripted above, this step requires on average 1200 ms. This process cannot be avoided and is not influenced by the localization. Once the current frame descriptor has been computed, it has to be matched against the artwork templates. Since their descriptors are pre-computed,