To show the effectiveness of our approach, three
distance measurement methods are compared:
Euclidean distance from the mean frame (used as
the baseline), Euclidean distance from all frames, and
Euclidean distance from the typical features. For the
global feature, we use the feature vector average of
all frames. Therefore, there is one feature vector per
audio file. The number of typical features after GMM
depends on the number of cluster centers; we choose 5
per audio track.