3. Data uniformity
A key goal of the PDB is to make the archive as consistent and
error-free as possible. As indicated above, all new depositions
are reviewed carefully by annotators before release. Errors
found subsequent to release by authors and PDB users are
addressed as rapidly as possible. Minor errors result in
revisions to the entry which are annotated within the entry;
major errors lead to a superceding entry or entry withdrawal.
Corrections and updates to entries are sent to deposit@
rcsb.rutgers.edu.
`Legacy data', that is, data submitted prior to October 1998,
comply with several different PDB formats, and variation
exists in how the same features are described for different
structures within each format. The inconsistency of formats
and nomenclature conventions make it dif®cult to consistently
parse these data and query across the archive. As an
immediate solution to the query problem, particular records
across all entries in the archive were corrected; these included
citation, R factor and resolution (Bhat et al., 2001). These
corrections were loaded into the database and thus it was
possible to query on these features and obtain accurate results
(Table 3). However, these data were not available in the PDB
®les. To provide uniform data for each structure we used the
software that was developed and tested for primary processing
and revalidated all the data in the archive. Corrections were
made to nomenclature and special attention was paid to
consistency of the chemical description of the macromolecule
and the ligand. Examples of the types of errors that were
found and corrected are shown in Table 4.
The corrected ®les were released in mmCIF format and
can be found at ftp://beta.rcsb.org/pub/pdb/uniformity/data/
mmCIF/ (Westbrook et al., 2002).
The original PDB ®les will
continue to be available as they
are a historical record and have
been the basis of many research
projects. Software is available
from the PDB to transform the
mmCIF ®les to PDB-formatted
®les. In the future, these ®les will
form the basis of the PDB databases
accessible via the WWW.