When detecting videos, we used accuracy as the evaluation metric. For a clip of thevideo, the final label was determined by the most frequently occurring detection results ofall the frames of the target video, which were counted only if its confidence exceeded thescore threshold: