1. Train the first layer as an RBM that models the raw input x = h(0) as its visible layer.
2. Use that first layer to obtain a representation of the input that will be used as data for the second layer. Two
common solutions exist. This representation can be chosen as being the mean activations p(h(1) = 1jh(0))
or samples of p(h(1)jh(0)).
3. Train the second layer as an RBM, taking the transformed data (samples or mean activations) as training
examples (for the visible layer of that RBM).
4. Iterate (2 and 3) for the desired number of layers, each time propagating upward either samples or mean
values.