The signal is cut into 2048 points frames (50ms), and for each
frame, we compute the short-time spectrum. We then use Mel
Frequency Cepstrum ([10]) to estimate the spectral envelope of
each frame. The spectral envelope of a signal is a curve in the
frequency-magnitude space that "envelopes" the peaks of its shorttime
spectrum. In the widely researched, above-mentioned problem
of instrument recognition, it has been demonstrated that this feature
explains a large part of the timbre of instruments ([11]).