Wordiness should make the items more time-consuming so that fewer could
be included in a test of given duration. Obviously a test composed of simple
items wi ll yield more independent scorable responses per hour of testing time and
148 EBEL
hence will tend to yield more reliable scores than a test composed of complex
items. Simple test items should also be easier to comprehend and present fewer
ambiguities or occasions for misinterpretation by the examinees. Because of
these differences one would expect scores of higher reliability from simple than
from complex items in tests of similar duration. Experimental studies by Howard
(1943) and by Ebel (1953) have confirmed these expectations. It seems difficult
to obtain scores of reasonable reliability in tests of reasonable duration if the test
items are situation based. This has been true of patient- management problems in
medicine (Skakun, 1979), of air crew problems derived from critical incidents in
military aviation, and of simulations in legal education (Alderman, Evans, &
Wilder, 1981). There seems to be an inverse relation between the realism of the
problem situations in the test and the reliability of the scores yielded by the test.