One technique that has previously been used for Audio Driven Facial
Animation is to build a joint audio-visual model using Active
Appearance Models (AAMs) to represent possible facial variations
and Hidden Markov Models (HMMs) to select the correct appearance
based on the input audio only[Cosker 2006]. However there
are several questions that remained unanswered. In particular the
choice of clustering technique and the choice of the number of clusters
in HMM may have significant influence over the quality of the
produced videos