For the first layer, the non-linearity of the decoder is the logistic sigmoid, the corruption process is a masking noise (i.e. each active input has a probability P to be set to 0) and the training criterion is the Kullback-Liebler divergence
For the fi rst layer, the non-linearity of the decoder is the logistic sigmoid, the corruption process is a masking noise (i.e. each active input has a probability P to be set to 0) and the training criterion is the Kullback-Liebler divergence