As an important preprocessing technology in text classification, feature selection can improve the scalability,
efficiency and accuracy of a text classifier. In general, a good feature selection method should consider
domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient
and highly sensitive to feature selection, so the research of feature selection specially for it is significant.
This paper presents two feature evaluation metrics for the Naïve Bayesian classifier applied on multiclass
text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments
of text classification with Naïve Bayesian classifiers were carried out on two multi-class texts collections.
As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection
approaches.
As an important preprocessing technology in text classification, feature selection can improve the scalability,
efficiency and accuracy of a text classifier. In general, a good feature selection method should consider
domain and algorithm characteristics. As the Naïve Bayesian classifier is very simple and efficient
and highly sensitive to feature selection, so the research of feature selection specially for it is significant.
This paper presents two feature evaluation metrics for the Naïve Bayesian classifier applied on multiclass
text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments
of text classification with Naïve Bayesian classifiers were carried out on two multi-class texts collections.
As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection
approaches.
การแปล กรุณารอสักครู่..
