The story is interesting and the results potentially relevant, also due to the paucity of studies on the problem of inference when assessing inter-rater agreement between observers. Nevertheless, I have some major concerns with this paper and, in what follows, I’ll try to suggest possible ways for improving the paper.