A supervised learning algorithm receives a set of labeled
training examples, each with a feature vector
and a class. The presence of irrelevant or redundant
features in the feature set can often hurt the accuracy
of the induced classifier (John et al., 1994). Feature
selection, the process of selecting a feature subset
from the training examples and ignoring features
not in this set during induction and classification, is
an effective way to improve the performance and decrease
the training time of a supervised learning algorithm.
Feature selection typically improves classifier
performance when the training set is small without
significantly degrading performance on large training
sets (Hall, 1999). It is also useful in making the induced
concept more comprehensible to humans, since
concepts that make use of many features are hard to
understand. Feature selection is sometimes essential
to the success of a learning algorithm. For example,
Kushmerick (1999) points out that it is not feasible to
use a nearest-neighbors algorithm on the Internet Advertisements
dataset (described later) because of the
overabundance of features. Feature selection can reduce
the number of features to the extent that such
an algorithm can be applied.