Recent years have seen the emergence of hybrid approaches
[18, 11, 10, 1] that capture appearance information through
a collection of local image patches. Shape information is
encoded via spatial relationships between the local patches.
The locations for the local patches are selected with various interest point operators, and are represented either as
raw pixel values [11] or histograms of image gradients [18,
10], termed SIFT descriptors (Scale Invariant Feature Transform).