The evaluation framework was experimental (Lykke & Eslau, p. 91). Testing was performed in a database that comprised 25,384 documents and 10 “realistic” search tasks (SJ
1
1-SJ10) were developed. Relevance was measured by a 4-point scale assessment (highly relevant, fairly relevant, marginally relevant, and irrelevant). Precision and relative recall was calculated for each strategy and search task.