As suggested by Nation (2001:18), “one way of making a technical vocabulary is to
compare the frequency of words in a specialized text with their frequency in a general
corpus.” Putting such a suggestion into practice, Chujo & Utiyama (2004) and Utiyama et al.
(2004) proposed using multiple statistical measures for comparing the above-mentioned two
kinds of frequencies and for extracting various levels of specialized lists. Such measures
include ‘log-likelihood ratio,’ or LLR, (Dunning, 1993) and ‘mutual information,’ or MI,
(Church & Hanks, 1989) scores. They suggested that LLR, for example, identifies appropriate
level words for intermediate-level or sub-technical words, and MI for
upper-intermediate-level or technical words. Currently, LLR and MI are the two most
commonly used popular statistical measures in the field of corpus linguistics.