In order to identify (discriminate) different subject domains within the documents found for each entity,
clustering techniques are used. Recall that the retrieval session is keyword-based (Step 1.2), consequently
the terms (entities) can be part of many domains. Clustering allows finding these domains. The Lingo
algorithm, from the Carrot2 API (Carrot2, 2009), is used since it performs well for both snippets and fulltext
documents. The result of this step is a set of clusters for each entity. In addition, for each cluster a
cluster feature vector (CLFV) is created. A CLFV is a combination of all the DFVs of a cluster. In the
following step, we deal with selecting the relevant cluster w.r.t. the domain of interest.