This paper presents a new discriminative model called Local Deep Neural Network (Local-DNN), which is based on two key concepts: local features and deep architectures. This model learns to classify small patches extracted from images using a standard DNN. The final classification of each image is performed using a simple voting scheme that takes into account the contributions from all the patches of that image. The experiments carried out have evaluated the model on the gender recognition problem using unconstrained face images, by following two benchmarks proposed for the LFW and the Gallagher datasets.
The results obtained in the experiments confirm the advantage of learning independently from small regions in the visual field when using DNNs in the problem at a hand. In particular, our Local-DNN model works well with networks with at least two hidden layers to be able to learn from small patches. After that, the final decision rule based on summing posteriors yields slightly better results than the simple voting scheme. It is also worth mentioning the improvement obtained by keeping the topological information of each patch, including in the network the location where it was extracted. However, the use of different weights in the final decision, obtained as an estimation of the probability of accuracy of each patch, did not improve the results. Using this configuration of parameters, our Local-DNN model outperforms other Deep Learning models also evaluated in this work, such as pre-trained DNNs and Deep Convolutional Neural Networks (DCNNs). There is also an improvement over other state-of-the-art results in the LFW dataset, which are obtained using traditional handcrafted features and a Support Vector Machine (SVM) classifier. Actually, we obtain the best result published using this protocol without discarding any image from the original database. The result obtained in the Gallagher’s dataset is also competitive, considering the simplicity and the generalization capability of the model proposed. Finally, the cross-database results obtained using one database for training and the other one for testing demonstrate that our approach can generalize well, and obtains better results than the only previously published cross-database result presented using the same databases.