Pseudo-relevance feedback automates the “manual” part of true relevance feedback.
Pseudo-relevance algorithm:
Retrieve a ranked list of hits for the user’s query
Assume that the top k documents are relevant.
Do relevance feedback (e.g., Rocchio)
Works very well on average
But can go horribly wrong for some queries.
Several iterations can cause query drift.