Procedures
We used the theory-building and theory-testing
axes shown in Figure 1 to code the articles. Both
axes were conceptualized as “nearly interval”
scales (Schwab, 2005), with the anchor descriptions
in the figure used to reduce ambiguity, as in a
behaviorally anchored rating scale (Smith & Kendall,
1963). The first step in data collection involved
ensuring that the scales in Figure 1 would
allow us to code the AMJ articles in a reliable
manner. To check reliability, both authors coded
articles from the 1983 volume—a volume that was
not included in our review. This volume included
50 empirical articles. We checked interrater reliability
using the ICC(1) form of the intraclass correlation
(James, 1982; Shrout & Fleiss, 1979). The
magnitude of the ICC(1) can be interpreted as the
reliability associated with a single assessment of an
article’s building or testing rating, with high values
being around .30 (Bliese, 2000). The ICC(1) for our
theory building rating was .51, and the ICC(1) for
our theory testing rating was .59. Having established
adequate reliability, the first author coded
half of each issue included in our review, and the
second author coded the other half of those
same issues.