Introduction
Goodness-of-fit tests are used to assess whether data
are consistent with a hypothesized null distribution.
The c2 test is the best-known parametric goodnessof-
fit test, while the most popular nonparametric
tests are the classic test proposed by Kolmogorov
and Smirnov followed closely by several variants on
Cramér-von Mises tests.
In their most basic forms, these nonparametric
goodness-of-fit tests are intended for continuous hypothesized
distributions, but they have also been
adapted for discrete distributions. Unfortunately,
most modern statistical software packages and programming
environments have failed to incorporate
these discrete versions. As a result, researchers would
typically rely upon the c2 test or a nonparametric
test designed for a continuous null distribution. For
smaller sample sizes, in particular, both of these
choices can produce misleading inferences.
This paper presents a revision of R’s ks.test()
function and a new cvm.test() function to fill this
void for researchers and practitioners in the R environment.
This work was motivated by the need for such
goodness-of-fit testing in a study of Olympic figure
skating scoring (Emerson and Arnold, 2011). We first
present overviews of the theory and general implementation
of the discrete Kolmogorov-Smirnov and
Cramér-von Mises tests. We discuss the particular implementation
of the tests in R and provide examples.
We conclude with a short discussion, including the
state of existing continuous and two-sample Cramérvon
Mises testing in R.