To illustrate the typical procedure for testing concurrent criterion validity, Chapter 8 on mental status testing describes several screening tests; their validation represents a major category of criterion validation studies. Here, the test is applied to a varied sample respondents, some of whom suffer the condition of interest (e.g., dementia) and some who do not. The criterion takes the form of a diagnosis made independently by a clinician who has not seen the test result; it could also include information from magnetic resonance imaging scans, neuropsychological assessments or other diagnostic testing. Statistical analyses show how well the test agrees with the diagnosis and also identifies the threshold score on the test that most clearly distinguishes between healthy and sick respondents. Note that in this screening test paradigm, the goal is usually to show how well the new test divides the sample into two groups, healthy and sick, but criterion validation can also be used to show agreement with a scaled score of severity. In the two-category paradigm, two potential errors can occur: the test may fail to identify people who have the disease, or it may falsely classify people without the disease as being sick. The “sensitivity” of a test refers to the proportion of people with the disease who are correctly classified as diseased by the test, while “specificity” refers to the proportion of people without the disease who are so classified by the test result. For those unfamiliar with these terms, the crucial element to recognize is that the denominators are the people who truly have, or do not have, the disease according to the criterion standard. The terms sensitivity and specificity are logical: the sensitivity of the test indicates whether the test can sense or detect the presence of the disease, whereas a specific test identifies only that disease and not another condition. Specificity corresponds to “discriminal validity” in the language of psychometrics. Accordingly, testing specificity may involve comparing scores for people with the disease with those of others who have different diseases, rather than to people who are completely healthy; as is so often the case in research, the choice of comparison group is subtle but critical.