2.1.1 Valence and Activation from Linguistics
Schuller [11] presented for the rst time an SER system
that fused linguistic and acoustic features in order to produce
a continuous emotional score in the dimensions valence,
activation, and dominance. The VAM corpus was
used which, like IEMOCAP, consists of emotional dyadic
dialogues. Utterance-level statistical acoustic features that
had previously demonstrated success on the corpus were
used, and a range of vector space modelling methods were
evaluated for the linguistic analysis. An early fusion approach
with Support Vector Regression was used. The most
successful linguistic features were bag of n-grams (BoNG)
and bag of character n-grams (BoCNG). Interestingly, the
addition of linguistic features improved performance in all of
the emotional axes, with valence seeing the most improvement.
Linguistic models were still trained and tested on the
VAM corpus, although speaker independence was preserved
during the experimental cross-validation.