2.2. Erasable pattern mining based on tree structure
Erasable pattern mining is a method that discovers all of the
erasable patterns from a given product database according to a
maximum gain threshold given by the user. In particular, unlike
traditional frequent pattern mining approaches, erasable pattern
miners find patterns with gain values smaller than or equal to
a given threshold. META [29] is a representative erasable pattern
mining algorithm based on a Breadth-First Search (BFS) manner.
META performs Apriori-like mining operations to mine erasable
patterns; therefore, it also operates in a generate-and-test manner,
which causes excessive candidate pattern creations. MERIT [30]
is another approach for discovering such patterns on the basis
of tree data structures. The algorithm was proposed to solve
the efficiency problem of META, and it effectively improved the
performance by using its own data structures, WPPC-tree and NCSet.
However, MERIT has several problems as follows. The first
one is the error occurring when the algorithm uses its equivalent
class-based pattern expanding technique. Because of this error,
the algorithm may cause a fatal pattern loss problem during the
mining process [31,23]. The second problem is that it requires two
database scans to mine patterns and does not consider the different
importance of each item in product databases. MERIT+ [23] is a
method solving the pattern loss problem of MERIT by excluding
this defective technique. dMERIT+ [23] is another approach based
on MERIT+. The algorithm additionally employs a hash table
and an advanced version of NC-set, dNC-set, in order to improve
mining efficiency. With more sophisticated considerations for the
ancestor–descendant relations, the algorithm constructs dNC-set
by removing duplicated information that can occur in NC-set. The
hash table of the algorithm is used to map post-order information
to a table form. In addition, there are other erasable pattern mining
algorithms based on list data structures, VME [32] and MEI [31].
VME employs its own data structure, PID_set, to mine erasable
patterns in a different way from the tree-based methods. MEI also
follows a similar way with VME, but it additionally considers the
difference of indexes by using its data structure, dPID_set. In spite of
the various efforts to improve erasable pattern mining techniques,
there is no algorithm for mining erasable patterns over sliding
window-based data streams before the proposed algorithm, WEPS.
In addition, our method is a more advanced approach that can also
deal with different importance of items in data streams.