Automatic musical instrument recognition is an essential part of many tasks such as music indexing, automatic transcription, retrieval and audio database querying. The perception of timbre by humans has been widely studied over the past five decades, but there has been little work on musical instrument identification. Most of the recent works have focused on speech or speaker recognition problems. Automatic sound recognition has two subtasks. The first task is to find a group of features that represents the entire sound with minimum amount of parameters; the second one is to design a classifier that recognizes the sound by using these features. It is clear that, the performance is highly related to information carried by feature set. Hence, many of the recent works focused on to find better feature sets. On the other hand, the classifier part of the problem has not received as much as research interest as feature sets. Brown [1] has reported a system that is able to recognize four woodwind instruments with a performance comparable to human abilities. Eronen [2] classified 30 different instruments with an accuracy of 32% by using MFCC as feature. Eronen used a mixture of k-NN and GMM/HMMs in a hierarchical recognizer. In [3], Eronen reported 68% performance for 27 instruments. Fujinaga and MacMillan [4] classified 23 instruments with 63% accuracy by using genetic algorithm. Martin’s system recognized a wide set of instruments, although it did not perform as well as human subjects in a similar task [5]. In this paper, a new musical sound recognizing system was presented. Main goal of this paper is not to find better features, but develop a better classifier. Linear prediction (LPC) and melfrequency cepstral coefficients (MFCC) are used as feature sets. Both coefficients are well-known, easy to calculate and reported several times as better feature sets than the others [2, 6, 7]. Classifier used in this work is an active learning probabilistic neural network (PNN). In the active learning, the learner is not just a passive observer. The learner has the ability of selecting new instances, which are necessary to raise the generalization performance. Similarly, the learner can refuse the redundant instances from the training set [8]. By combining these two new abilities, the active learner can collect a better training set which is representing the entire sample space well.
Automatic musical instrument recognition is an essential part of many tasks such as music indexing, automatic transcription, retrieval and audio database querying. The perception of timbre by humans has been widely studied over the past five decades, but there has been little work on musical instrument identification. Most of the recent works have focused on speech or speaker recognition problems. Automatic sound recognition has two subtasks. The first task is to find a group of features that represents the entire sound with minimum amount of parameters; the second one is to design a classifier that recognizes the sound by using these features. It is clear that, the performance is highly related to information carried by feature set. Hence, many of the recent works focused on to find better feature sets. On the other hand, the classifier part of the problem has not received as much as research interest as feature sets. Brown [1] has reported a system that is able to recognize four woodwind instruments with a performance comparable to human abilities. Eronen [2] classified 30 different instruments with an accuracy of 32% by using MFCC as feature. Eronen used a mixture of k-NN and GMM/HMMs in a hierarchical recognizer. In [3], Eronen reported 68% performance for 27 instruments. Fujinaga and MacMillan [4] classified 23 instruments with 63% accuracy by using genetic algorithm. Martin’s system recognized a wide set of instruments, although it did not perform as well as human subjects in a similar task [5]. In this paper, a new musical sound recognizing system was presented. Main goal of this paper is not to find better features, but develop a better classifier. Linear prediction (LPC) and melfrequency cepstral coefficients (MFCC) are used as feature sets. Both coefficients are well-known, easy to calculate and reported several times as better feature sets than the others [2, 6, 7]. Classifier used in this work is an active learning probabilistic neural network (PNN). In the active learning, the learner is not just a passive observer. The learner has the ability of selecting new instances, which are necessary to raise the generalization performance. Similarly, the learner can refuse the redundant instances from the training set [8]. By combining these two new abilities, the active learner can collect a better training set which is representing the entire sample space well.
การแปล กรุณารอสักครู่..
