4. Discovering frequent itemsets from a real-time data stream
Table 1 illustrates example transactions. Assume that the window size is 2 (one window has 2 batches) and the minimum
support threshold is 0.2. window1 consists of Batch1 and Batch2, and window2 consists of Batch2 and Batch3. When Batch3
arrives, since the window is full, we remove Batch1 from CanTree, and add Batch3 to CanTree. Fig. 1 shows the CanTree,
iTable, and lTable for window1 (Batch1 and Batch2). iTable has information for a list of nodes with the same data item in
CanTree of Fig. 1. lTable has information for the last node of each transaction in Fig. 1. Fig. 6 shows CanTree after removing
Batch1. We use lTable to remove Batch1. Table 2 shows iTable and lTable for CanTree in Fig. 6. Fig. 7 shows CanTree after
removing Batch1 and adding Batch3 (CanTree for window2). Table 3 also shows iTable and lTable for CanTree in Fig. 7. We
simply add Batch3 by sorting data items of each transaction, and adding the batch to CanTree. Therefore, we add Batch
without any restructuring of CanTree.
We now describe in detail the CanTree-GTree algorithm with a set of transactions in window1. Fig. 1 shows iTable for
CanTree of window1. We use iTable to construct the GTrees that are needed to discover frequent itemsets. Because the
minimum support threshold is 0.2, the minimum support count is 2 (ceil(8 ∗ 0.2)=2). Only the support counts of data items