The frame-based detectors [3]–[6] assume the invariability of F0 over a speech segment of a short duration of several F0 periods, and consequently, it may be possible to blur the information if F0 varies remarkably within the speech frame. The event detectors [7]–[10], on the other hand, rely on F0 marking or epoch detection. They derive the duration of each F0 period by detecting events of glottal cycles, e.g., instants of glottal closures, but their sensitivity to shapes of speech waveform [11] may have them to fail in cases where instants of glottal closures are not very obvious.
Existing methods for detection can roughly be divided into two classes: frame-based and event-based [2].