The standard recurrent neural network
language model (rnnlm) generates sentences
one word at a time and does not
work from an explicit global sentence representation.
In this work, we introduce
and study an rnn-based variational autoencoder
generative model that incorporates
distributed latent representations of
entire sentences. This factorization allows
it to explicitly model holistic properties
of sentences such as style, topic,
and high-level syntactic features. Samples
from the prior over these sentence representations
remarkably produce diverse and
well-formed sentences through simple deterministic
decoding. By examining paths
through this latent space, we are able to
generate coherent novel sentences that interpolate
between known sentences. We
present techniques for solving the di_cult
learning problem presented by this model,
demonstrate its e_ectiveness in imputing
missing words, explore many interesting
properties of the model's latent sentence
space, and present negative results on the
use of the model in language modeling.