Automatic segmentation of continuous
speech into syllable units is therefore an important
issue. Jittiwarangkul et al. (2000) proposed using several
prosody features including short-time energy, a zero-crossing
rate, and pitch with some heuristic rules for syllable
segmentation. Ratsameewichai et al. (2002) suggested the
use of a dual-band energy contour for phoneme segmentation.
This method decomposed input speech into a
low- and a high-frequency component using wavelet transformation,
computed the time-domain normalized energy
of both components, and introduced some heuristic rules
for selecting endpoints of syllables and phonemes based
on energy contours. Although there is no comparative
experiment using other typical techniques, dividing the
speech signal into detailed frequency bands before applying
energy-based segmentation rules seems to be an effective
approach. A phoneme-segmentation experiment on 1000
Thai isolated-syllables from 10 speakers achieved an average
accuracy of nearly 95%