As discussed in Chapter 5, frequent-pattern mining finds a set of patterns that occur
frequently in a data set, where a pattern can be a set of items (called an itemset),
a subsequence, or a substructure. A pattern is considered frequent if its count satisfies
a minimum support. Scalable methods for mining frequent patterns have been
extensively studied for static data sets. However, mining such patterns in dynamic
data streams poses substantial new challenges. Many existing frequent-pattern mining
algorithms require the system to scan the whole data set more than once, but
this is unrealistic for infinite data streams. How can we perform incremental updates
of frequent itemsets for stream data since an infrequent itemset can become frequent
later on, and hence cannot be ignored? Moreover, a frequent itemset can become
infrequent as well. The number of infrequent itemsets is exponential and so it is
impossible to keep track of all of them.
To overcome this difficulty, there are two possible approaches. One is to keep
track of only a predefined, limited set of items and itemsets. This method has very
limited usage and expressive power because it requires the system to confine the
scope of examination to only the set of predefined itemsets beforehand. The second
approach is to derive an approximate set of answers. In practice, approximate
answers are often sufficient. A number of approximate item or itemset counting
algorithms have been developed in recent research. Here we introduce one such
algorithm: the Lossy Counting algorithm. It approximates the frequency of items
or itemsets within a user-specified error bound, e. This concept is illustrated as
follows.