In addition to agreement rate, which reflects the overall rater performance at item level, the ETS
Statistical Analysis (SA) team runs analyses based on responses with double ratings in a 3-month
period to identify individual raters whose scoring behavior is inconsistent with that of other raters
such that additional training can be provided to these individuals. Individual raters’ scoring leniency/
severity or scoring scale preferences are evaluated by comparing individual rater means, standard
deviations, and score distributions to the final ratings of the same responses, which can be from
different items in different forms.