In this paper, a robust audio-based shot detection system was introduced. This system represents an essential building block of a complete multimedia surveillance system. It is based on a binary classifier (shot/normal classification) and several experiments were conducted in order to reduce the false rejection and false detection rates. We show that the noise level of the training database has a significant impact on the performance of the system which allows to select the most appropriate noise level of the training database for a targeted false rejection rate. The performance of the system was also significantly improved by considering a hierarchical approach. Future work will be dedicated to the extension of the current system to different types of acoustic events that occur in abnormal situations such as shouts, cries or manifestation of fear.