This representation quantizes the
continuous high-dimensional space of image features
to a manageable vocabulary of “visual words”. This
is achieved, for instance, by grouping the low-level
features collected from an image corpus into a specified
number of clusters using an unsupervised algorithm
such as K-Means (for other methods of generating
the vocabulary see (Jurie and Triggs, 2005)).