Our Local-DNN has several parameters to choose. First, we have set the patch size to 13 × 13 pixels. This value was inspired from our experience in other previous works that also use a local feature framework applied to face images, extracting patches similar to the size of an eye in the image [26]. Second, the DNN itself has also several parameters. In this work, we have used hidden layers with 512 ReLU units and we have changed the number of hidden layers to compare the classification performance. A representation of this network can be seen in Fig. 1. Note that the input layer has 169 units because the patch size is 13 × 13. Note that the dimension of the input layer could also be 171 if the patch location information is used. Finally, it should be mentioned that we have used five-fold cross-validation in both databases. The network is trained until the average cross-entropy error on the training data falls below a pre-specified threshold. To figure out this threshold, we train another network with the same architecture but using only 3 folds from the training data and using the remaining fold as a validation set. Then, the cross-entropy threshold value is fixed with the smallest classification error obtained on the validation set. At the end, the test results on the 5 combinations are averaged.