mainly used to describe how the structural information
is organized. For instance, DBLP[42]
is a typical
data-centric XML document. Most existing studies use
the subtrees, which contain all the keywords (e.g., the
conjunctive predicates) or some of the keywords (e.g.,
the disjunctive predicates), as the results of a given
keyword query. They first compute the LCAs of the
content nodes that directly contain at least one key-
word and then take the subtrees rooted at the LCAs as
the results. However, this involves huge computations
and is inefficient. In addition, existing approaches do
not introduce some relevant nodes that do not contain
any keywords into the results. For example, in Fig. 1,
suppose a user wants to retrieve the authors of those
papers published in 2006 and with the keyword
“XML”, but the user does not know how to input the
keyword for the author, since conferences and journals
have different structures for the author tag. Thus, ex-
isting studies cannot deal with this situation since it is