The approach first calculates the frequency count of each term in the corpus and
then extracts association rules between pairs of terms. Pairs of terms are considered
correlated if they co-occur more frequently than would be expected if they were distributed independently. These pairs are chosen if they are within a window of 20 tokens
apart. For each correlated pair (A,C), a two-by-two table of the occurrence of Aand C is
constructed and a one-tailed Fisher’s exact p-value is computed. The p-value indicates
whether the two terms co-occur independently by chance.