The public domain at Monash University uses the ARROW software solution (Treloar
and Groenewegen, 2007). This was designed for document objects, and ingesting of
large datasets is currently being trialled. It is possible that a separate public data
repository may be needed, but at present the existing institutional repository appears
able to fulfil both functions.
Two examples from different domains will serve to illustrate how we are currently
applying the Data Curation Continuum at Monash.
The first example comes from the domain of Protein Crystallography. The end result
of applying this model and the associated migration process is a paper in the
prestigious Science journal (Rosado et al. 2007), where the final published version
points to a dataset that has been migrated across the curation boundary into the
ARROW Repository (see http://arrow.monash.edu.au/hdl/1959.1/5863). This process
was somewhat ad-hoc and involved a lot of manual work and creative problemsolving
by Andrew Harrison, the Monash ARROW Librarian. This was in part caused
by the size of the datasets involved. The entire repository object totalled 36 GB in
size, (after compression!) with many datastreams being 2 GB in size. This is
significantly larger than the software was initially designed for, although it is being
reconfigured to support larger file sizes. Procedures are also being put in place that
will allow the researchers themselves to undertake much of the work of lodging the
dataset objects, with the ARROW Librarian performing more of a quality control and
authorisation function. Under this approach, the researchers will provide the quality
control over the technical metadata and the library staff will review (and augment)
the descriptive metadata.
The second example comes from the domain of musicology. We have some
researchers who are working with archival recordings of Jewish music performance.
They currently have about 400 GB of digitised audio content up on their LaRDS
space, being used for their own private research within their research team. They are
VALA2008 Conference 9
now migrating a subset of this content into ARROW for publication. This will be a
progressive migration as copyright is gradually sorted through. The estimate is that
approx 10% (40 GB) will eventually be published. From there it will be further
harvested