INTRODUCTION Reliability can be defined broadly as the degree to which measures are free from error and therefore yield consistent results (6). One of the basic methods of reliability estimation is the test-retest correlation coefficient. Within the field of advertising research the reliability of alternative copy testing methods is often of interest and as Silk (7) notes, the measure of reliability most often reported for such studies in the advertising literature is the' test-retest correlation. Other areas of marketing research are, of course, also
concerned with reliability estimation. For example, Peter (6) cites several consumer behavior area studies which include test-retest reliability estimates reported for the measurement scales used. Both Silk (7) and Peter (6) recognize the potential for serious problems when employing the test-retest correlation coefficient as a reliability measure — problems concerned with nonuniform lengths of time between the test and retest measures, the occurrence of a shift in true score values between measures, and in general, a lack of controls to achieve comparable test and retest conditions. One particular practice in the advertising literature called into question by Silk (7) is that of using data originally gathered for purposes other than error measurement as the basis for computing reliability measures for a particular copy testing method. This practice pays no attention to the requirements that test-retest data need to satisfy to produce meaningful results. Expressing concern over its uncriticct! application in the literature. Silk describes the theoretical conditions under which a correlation coefficient between observed test and retest scores is equivalent to a reliability coefficient. Silk then proceeds to develop diagnostic tests which can alert the researcher to certain departures from the assumptions necessary for the test-retest correlation to be used as a reliability estimate. While a researcher may be able to assure more valid measures of reliability by following the suggestions and diagnostic checks for score distribution stability over time recommended by Silk, it is still possible that computed reliabilities may be strongly influenced by other effects discussed in this paper. The author believes these additional considerations are generally unrecognized and unappreciated; furthermore, they may make comparisons of reliability between various studies highly questionable. In addition it will be shown that even a single study's reliability estimate can be strongly influenced by factors having little to do with "freedom from error" or "consistent results", terms used by Peter (6) in defining reliability. The problem setting used to illustrate our concerns will be test-retest scores for advertising copy methods although other marketing and advertising research areas could be chosen.