OPEN DATA HAS tremendous potential for science,
but, in human subjects research, there is a tension
between privacy and releasing high-quality open data.
Federal law governing student privacy and the release
of student records suggests that anonymizing student
data protects student privacy. Guided by this standard,
we de-identified and released a dataset from 16 massive
open online courses (MOOCs) from MITx and HarvardX
on the edX platform. In this article, we show that these
and other de-identification procedures necessitate
changes to datasets that threaten replication and