The Helmholtz machine can be viewed as a hierarchical generalization
of the type of learning procedure described by Zemel (1994) and Hinton
and Zemel(1994). Instead of using a fixed independent prior distribution
for each of the hidden units in a layer, the Helmholtz machine makes this
prior more flexible by deriving it from the bottom-up activities of units
in the layer above. In related work, Zemel and Hinton (1995) show that a
system can learn a redundant population code in a layer of hidden units,
provided the activities of the hidden units are represented by a point in
a multidimensional constraint space with pre-specified dimensionality.
The role of their constraint space is to capture statistical dependencies
among the hidden unit activities and this can again be achieved in a more
uniform way by using a second hidden layer in a hierarchical generative
model of the type described here.