An experimental study was presented that investigated
acoustic modelling configurations for speech recognition
in the Indian languages – Hindi and Marathi. The experi-
mental study was performed using data from a small
vocabulary agricultural commodities task domain that
was collected for configuring spoken dialogue systems.
Two acoustic modelling techniques for mono-lingual
ASR were compared namely – the conventional CDHMM
and the SGMM acoustic modelling technique. The SGMM
mono-lingual models were seen to outperform their
CDHMM counterparts when there is insufficient acoustic
training data. For the Hindi and Marathi language pair,
a multi-lingual SGMM training scenario was presented.
It was concluded that cross-corpus mismatch is an impor-
tant issue that needs to be addressed while building systems
of this nature. Not accounting for cross-corpus mismatch is
seen to decrease the performance in the target language
Hindi which has limited amounts of training data. After
accounting for this cross-corpus mismatch a gain in
multi-lingual SGMM performance is observed. Further,
interesting anecdotal results have been obtained to show
that the parameters in the multi-lingual SGMM system
are able to capture the phonetic characteristics in a struc-
tured and meaningful manner especially across the the
“similar” languages used in this experimental study.
Finally, it was found beneficial to share “similar” con-
text-dependent states from Marathi in order to improve
Hindi language speech recognition performance.