To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules.
However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains
are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs
as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification
techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query
log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ
retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement
problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than
the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.
2006 Elsevier Ltd. All rights reserved