In this work, we argued that the development of representative standardized corpora for digital forensics research is
essential for the long-term scientific health and legal
standing of the field. We developed a baseline taxonomy of
such corpora and outlined the legal and ethical hurdles that
complicate their development. And we present a number of
data sets that attempt to cover the spectrum of scenarios and
have made them openly available to researchers. Special care
has been taken to document the source of the data, as well as
to avoid as many legal restrictions on its distribution as
possible.