Missing data are unavoidable in epidemiological and clinical research but their potential to undermine the validity of research results has often been overlooked in the medical literature.1 This is partly because statistical methods that can tackle problems arising from missing data have, until recently, not been readily accessible to medical researchers. However, multiple imputation—a relatively flexible, general purpose approach to dealing with missing data—is now available in standard statistical software,2 3 4 5 making it possible to handle missing data semiroutinely. Results based on this computationally intensive method are increasingly reported, but it needs to be applied carefully to avoid misleading conclusions.
In this article, we review the reasons why missing data may lead to bias and loss of information in epidemiological and clinical research. We discuss the circumstances in which multiple imputation may help by reducing bias or increasing precision, as well as describing potential pitfalls in its application. Finally, we describe the recent use and reporting of analyses using multiple imputation in general medical journals, and suggest guidelines for the conduct and reporting of such analyses.
Consequences of missing data
Researchers usually address missing data by including in the analysis only complete cases —those individuals who have no missing data in any of the variables required for that analysis. However, results of such analyses can be biased. Furthermore, the cumulative effect of missing data in several variables often leads to exclusion of a substantial proportion of the original sample, which in turn causes a substantial loss of precision and power.