Feature selection and extraction The definition of a local feature working with images may include a huge variability of options, for instance just varying the size and the shape of the feature. However, we have only considered squared windows of size w × w. Regarding the locations where to extract these patches, the simplest case is to use a fixed sampling grid for all the images. However, this selection leads to a computationally demanding learning due to the huge number of patches obtained. Therefore, we propose to select the patches with high information content, discarding those which associated p(accuracy|x[i] ) is likely to be very low (for instance, uniform patches). According to this procedure, given an image, we have to obtain a binary mask in which each active pixel denotes the center position of a patch to be extracted. The process to create this binary mask for each image is as follows. First of all, we create another image with a Sobel filter which emphasizes edges and translations. After that, we apply a low-pass filter over this image and the values are binarized using a threshold. Finally, we extract the patches centered in each active pixel in the binary mask, removing those that lie outside the image. Each patch is normalized to have zero-mean and unit-variance. The entire process is represented in the Fig. 2. A similar method was proposed in [26] to extract only informative patches.