Cohen introduced the statistics kappa (Cohen, 1960) and weighted kappa (Cohen, 1968) to measure the degree of agreement between two raters who rate each of a sample of subjects on a nominal scale.
Both kappa and weighted kappa incorporate a correction for the extent of agreement expected by chance alone.
Kappa is useful when the relative seriousness of the different kinds of disagreement can be specified.
Properties of these two statistics have been studied by Everitt (1968) and by Fleeiss, Cohen and Everitt (1969).