Demeester et al. (2016) address the issue of the generalizability of relevance assessments and how this impacts the reliability of retrieval results in their paper, Predicting Relevance based on Assessor Disagreement: Analysis and Practical Applications for Search Evaluation.