ARTICLES: DATA SELECTION AND ANALYSIS
The advent of very large data sources, including Compustat, CRSP, and EDGAR, plus
computerized statistical software packages, has led to an explosion of statistically oriented papers in
accounting and elsewhere. Unfortunately, the ability to run classical null hypothesis statistical tests
(NHSTs) has often been accompanied by a failure to recognize that consideration of the economic
significance of the investigation is its primary purpose. The research program should not be driven
by the availability of a specific data set or adaptable computer program but instead by the research
question(s). We will have more to say about the sampling process itself after examining aspects of
the data analysis.
Data Analysis: Randomness and Tests of Significance in Regression Studies
The regression papers in our study typically examine an existing data set that relates to the
researcher’s posed research. The selection of the topic, which should precede the data-gathering
process, is described in the paper’s introduction.13 An early activity is to assure that the data are
rigorously checked for correctness. In addition, the author should document what has been done in
this regard. Special attention needs to be given to the theory behind the research selection, not only
of the topic but also because it will have an impact on the variables that are to be used and their
measurement. In particular, the independent variables should reflect the theory and not be selected
based on prior tests of their significance for selection, as is not uncommon. This work needs to be
done before the statistical analysis is performed. The actual process to this point is critically
important to the reader in evaluating the value of the research, but, unfortunately, is seldom if ever