We use a “stripes” approach. Each term and its associated pairs are stored in a hashmap H, and the whole is H considered as a value emitted together with the term,which is the key. In contrast, Fan et al.
take an alternative approach and directly emit each term and each cooccurrence term pair.
Obviously, our approach generates much fewer intermediate key-value pairs compared to Fan’s approach. For example, if a document contains ¬m unique terms, our approach generates O(m) number of pairs, while Fan’s approach produces O(m2) pairs.
Since the intermediate outputs produced by the Map ( ) method are sorted locally in order for grouping key-value pairs sharing the same key, in our approach the MapReduce execution framework performs less sorting and thus be more efficient.