A complete multimedia automatic surveillance system would then consist of different modules providing information from different modalities that will be merged by an information fusion system for situation analysis (see figure 1).
In this targeted system, the audio module will use vocal and non vocal manifestations of abnormal situations and will deal with both emotional content [2] and typical events, such as cries, shots or explosions. In this paper we propose an approach to develop an audio key-event detection system. Although our event detection system is currently limited to shot detection, the methodologyand the approach followed for this system could be extended to other classes of characteristic sounds of abnormal situations in a given environment.