2) Resnik similarity Algorithm
(Philip Resnik) 1995[9], Sun Microsystems Laboratories, presents an alternative to path finding via the notion of information content. This is a measure of specificity assigned to each concept in a hierarchy based on evidence found in a corpus. A concept with high information content is very specific, while concepts with lower information content are associated with more general concepts. The information content of a concept is estimated by counting the frequency of that concept in a large corpus, along with the frequency of all the concepts that are subordinate to it in the hierarchy. The probability of a concept is determined via a maximum likelihood estimate, and the information content is the negative log of this probability.
Resnik defines a measure of similarity that holds that two concepts are semantically related proportional to the amount of information they share. The quantity of shared information is determined by the information content of the lowest concept in the hierarchy that subsumes both the given concepts. x information-content word-similarity:
x Still relies on structure of thesaurus x Refines path-based approach using normalizations based on hierarchy depth
x Represents distance associated with each edge
x Adds probabilistic information derived from a corpus
x Probability of random word being an instance of concept :