A central problem in machine learning is identifying a representative set of features from
which to construct a classification model for a particular task. This thesis addresses the
problem of feature selection for machine learning through a correlation based approach.
The central hypothesis is that good feature sets contain features that are highly correlated
with the class, yet uncorrelated with each other. A feature evaluation formula, based
on ideas from test theory, provides an operational definition of this hypothesis. CFS
(Correlation based Feature Selection) is an algorithm that couples this evaluation formula
with an appropriate correlation measure and a heuristic search strategy.