Although the structural similarity can represent the
weight of an SI-Tree, it does not take into account the
text similarity like the document relevancy in the IR
literature, since the number of keywords in SI-Tree
indicates its relevancy. Therefore, a text similarity is
developed based on term frequency, inverse document
frequency, and normalized document length. The larger
term frequency gives a larger text similarity score, and
larger inverse document frequencies indicate that SI-
Tree is more closely related to the keyword query.
Moreover, the normalized document length is used to
normalize the overall text similarity score since more
terms in SI-Tree indicate a higher probability that