Content validity
Once the items have been generated from these various sources, the scale developer is ideally left with far more items than will intimately end up on the scale. In chapter 5 we will discuss various statistical techniques to select the best items from this pool. For the moment, though, we address the converse of this, ensuring that the scale has enough items and adequately covers the domain under investigation. The technical term for this is content validity, although some theorists have argued that 'content relevance ' and 'content coverage' would be more accurate descriptors (Messick 1980). These concepts arose from achievement resting, where students are assessed to determine if they have learned the material in a specific content area; final examinations are the prime example. With this in mind, each item on the test should relate to one of the course objective (content relevance). Items which are not related to the content of the course introduce error in the measurements, in that they discriminate among the students on some dimension other than the one purportedly tapped by the test; a dimension that can be totally irrelevant to the test. Conversely, each part of the syllabus should be represented by one or more questions ( content coverage). If not, then students may differ in some important respects, but this would not be reflected in the final score. Table 3.1 shows how these two components of content validity can be checked in a course of, for example, cardiology. Each row reflects a different item on the test, and each column a different content area. Every item is examined in turn, and a mark placed in the appropriate column (s). Although a single number does not emerge at the end, as with other types of validity estimates, the visual display yields much information.