A graphical representation of this Local-DNN model is shown in Fig. 1. As it can be seen, several patches are extracted from the input image and then are fed into a DNN. The network is formed by an input layer and several fully connected hidden layers. An output layer with C softmax units, that represents the posterior probability of each local patch. Finally, local posteriors are fused at the end. During training, the network works at a patch level by learning to classify each patch with the label of the image that it belongs. During testing all the contributions from the patches extracted of the image are combined in order to classify this image with a final label.