The primitive information-based semantic similarity approach was introduced by Resnik
[23] in which the similarity of two concepts is the maximum of the information content
of the concept that subsumes them in the taxonomy hierarchy [Equation 24]. The
information content of a concept depends on the probability of encountering an instance
of that concept in a corpus, and the information content is calculated as negative the log
likelihood of the probability [Equation 28]. That is, the probability of a concept is
determined by the frequency of occurrence of the concept and its subconcepts in the
corpus [Equation 27]. As the information-based measures use corpus statistics, these
similarity measures can be adapted well to particular applications using suitable corpora.
For more information about the pure information-based approach, please refer to Resnik’
work [22]. Following Resnik’s work, some information-based measures were introduced
to improve the performance of pure information-based approach by considering the
weight/strength of edges/links between concept nodes in ontology. The links between
ontology nodes are not equal in term of strength/weight, and link strength can be
determined by local density, information content, and link type [9,26]. The measure of
Jiang and Conrath [9] determines the similarity of two concept nodes by calculate the
“weighted path” between them by summing up all weighted links between them
[Equation 25]. While the measure of Lin [Equation 26] is similar to the measure of Wu
and Palmer [Equation 5]. However, Lin measure uses information content of concept
nodes instead of depth of concept nodes. In fact, the depth is replaced by the “weighted
depth”. Followings are formulas of Resnik, Jiang and Conrath, and Lin measures. They
all use information content (IC) of individual concept nodes C1 and C2 or/and LCS (least
common subsumer) of C1 and C2: