is one of the widely used approaches as a term importance
criterion for text data [36]. The information gain of an outcome
O from an attribute A is defined as the expected decrease
in entropy of O conditioned on A. The following equations
can be used to calculate the information gain about a discrete
outcome O from a discrete attribute A, denoted by IG(O, A).
We use H(O) to denote the entropy of O, H(O/A) to denote
the entropy of O given A, and P(a) to denote the probability