Machine Learning Algorithms
Two machine learning algorithms representing two
diverse approaches to learning were used in the
experiments|a probabilistic learner (naive Bayes) and
a decision tree learner (C4.5).
Naive Bayes employs a simpli¯ed version of Bayes
formula to classify each novel example. The posterior
probability of each possible class is calculated using conditional
probabilities for feature values and prior probabilities
of classes estimated from the training data; each
novel instance is assigned the class with the highest
posterior probability. Due to the assumption that feature
values are independent given the class, the naive
Baye's classi¯er's predictive performance can be adversely
a®ected by the presence of redundant features
in the training data.
C4.5 (Quinlan, 1993) is an algorithm that summarises
the training data in the form of a decision tree. Along
with systems that induce logical rules, decision tree algorithms
have proved popular in practice. This is due
in part to their robustness and execution speed, and to
the fact that explicit concept descriptions are produced,
which users can interpret.
C4.5 grows" decision trees recursively, using a
greedy approach to decide which attributes to test at
the nodes of the tree. An information theoretic measure
similar to symmetric uncertainty guides the process.
C4.5 can sometimes over¯t training data, resulting
in large trees. In many cases, feature selection can
result in C4.5 producing smaller trees.