Following Helmholtz, we view the human perceptual system as a statistical
inference engine whose function is to infer the probable causes
of sensory input. We show that a device of this kind can learn how to
perform these inferences without requiring a teacher to label each sensory
input vector with its underlying causes. A recognition model is used
to infer a probability distribution over the underlying causes from the
sensory input, and a separate generative model, which is also learned, is
used to train the recognition model (Zemel 1994; Hinton and Zemel 1994;
Zemel and Hinton 1995).
As an example of the generative models in which we are interested,
consider the shift patterns in Figure 1, which are on four 1 x 8 rows of
binary pixels. These were produced by the two-level stochastic hierarchical
generative process described in the figure caption. The task of
learning is to take a set of examples generated by such a process and
induce the model. Note that underlying any pattern there are multiple