There has been much work devoted to speeding up search in set mining
(Narendra and Fukunaga, 1977; Webb, 1995; Bayardo, 1998) and there are many
efficient algorithms when all of the data is discrete or categorical. The problem is
that data is not always discrete and is typically a mix of discrete and continuous
variables. A central problem for set mining and one that we address in this paper
is ‘How should continuous values be handled?’
The most common approach to handling continuous values is to discretize
them into a number of disjoint regions and then use the same set-mining algorithm.
Discretization is useful in that it can reduce the number of distinct values,
thereby reducing the complexity of the search and the number of mined results