As Table I shows, our system, exploiting localization information, provides the best results. It is closely followed by its own variant that does not rely on localization. The small gap in recognition performance is mainly due to the robustness of the method. In fact, the proposed solution has good discriminative capabilities and can effectively identify the correct artwork regardless the number of pieces in the museum dataset. The baseline achieves 100% recall, since it treats every frame as an artwork lacking the detection component. However this performance is the result of a significant number of false positives that lead to a significant loss in terms of accuracy, validating the use of a detection threshold.