2. The Problem Domain
Data mining on large data warehouses for nuggets of
decision-support knowledge is becoming crucial for
business organizations. The existing algorithms discuss
the problems that how to mine sequential patterns
quickly and how to maintain the discovered sequential
patterns. Although data mining has been widely and
successfully used in the domain of business operations,
data mining in sport is just in its infancy. So far, the
research on incremental updating of sequential pattern
mining has been focusing on two aspects: on the one
hand, when new transactions and new data-sequences are
appended to the original database, how to deal with the
incremental updating of sequential pattern mining; on the
other hand, when the minimum support threshold
changes and the original database doesn't change, how to
deal with the maintenance problem of sequential pattern
mining. But in the fields of E-Commerce and Web usage
mining, we often delete some information from sequence
database, in order to save storage space or because some
information is not interesting any longer or has become
invalid. Data mining in E-Commerce for Sales
Promotions System is frequently used statistical models
or techniques for data mining in terms of marketing,
sales, and customer relationship management. The tasks
that have been performed in the area of data mining are
as follows: classification, estimation, prediction, and
profiling. The incremental updating of sequential pattern
mining in this circumstance [15], [16] has been paid little
attention in previous studies. When data mining tasks
are outsourced, it is important to protect the following
three elements from which business intelligence and
customer privacy can be drilled down:
(i) The source data which is the database of
all the transactions.
(ii) The mining results which are frequent
item sets as well as their supports in our context.
(iii) The mining requests which are item sets of interests.
In this paper, we investigate the issue and develop a new
algorithm, called MA_D, to deal with the problem that
when some information is deleted from a sequence
database, how to maintain the discovered sequential
patterns. Our algorithm utilizes the information obtained
from prior mining processes and stores the sets of
discovered frequent sequence in the original database for
further mining. Meanwhile, it adopts a new method to
generate the sets of candidate sequence, which cuts down
the size of candidate sets in some extent