1 Introduction
The task of learning a target concept in a given representation language, from
a set of positive and negative realizations of that concept (examples) and some
background knowledge, is called inductive concept learning (ICL) [31]. If the
representation language is a fragment of first-order logic then it is called inductive
logic programming (ILP) [34].
Real life learning tasks are often described by nominal as well as continuous,
real-valued, attributes. However, most inductive learning systems treat all
attributes as nominal, hence cannot exploit the linear order of real values. This limitation may have a negative effect not only on the execution speed but also
on the learning capabilities of such systems.
In order to overcome these drawbacks, continuous-valued attributes are transformed
into nominal ones by splitting the range of the attribute’s values in
a finite number of intervals. Alternatively, continuous attributes are treated
by means of inequalities, describing attribute subranges, whose boundaries are
computed during the learning process. This process, called discretization, is supervised
when it uses the class labels of examples, and unsupervised otherwise.
Discretization can be applied prior or during the learning process (global and
local discretization, respectively), and can either discretize one attribute at a
time (univariate discretization) or take into account attributes interdependencies
(multivariate discretization) [14].
Researchers in the Machine Learning community have introduced many discretization
algorithms. An overview of various types of discretization algorithms
can be found, e.g., in [20, 21, 28, 30]. Most of these algorithms perform an iterative
greedy heuristic search in the space of candidate discretizations, using
different types of scoring functions for evaluating a discretization. For instance,
the popular Fayyad & Irani discretization algorithm [6, 19] considers one attribute
at a time, uses an information class entropy measure for choosing a
cut point yielding a partition of the attribute domain, applies recursively the
procedure to both the partitions, and uses the minimum description length as
criterion for stopping the recursion.
A typical example showing a drawback of univariate discretization methods
based on class information entropy is the problem of separating the two classes
shown in the figure 1, where positive and negative examples are labeled + and
×.