Curators Training and HPC Adoption
In order to make the metadata extraction workflow an integral part of the test collection’s data management activities, the ICA team needed to learn how to perform all its steps independently. As the curators were mostly unfamiliar with working in an HPC environment, initial training on the resources at TACC along with basic Linux commands for dealing with data transfers, file permissions, and running of scripts was required. Although the learning curve was somewhat steep for those with no prior Linux experience, it required about two days of practice to become proficient at copying and syncing the data collection, basic trouble-shooting and the running of batch scripts. After an initial one-on-one training session, and some self-paced online Linux tutorials, the curators were ready to transfer the data collection from a remote server to the TACC system. Some further instructions and correspondent scripts were given for running DROID serially and in parallel, and both tasks were accomplished successfully.
IJDC | General Article
26 | Data Management and High Performance Computing doi:10.2218/ijdc.v9i2.331
Most of the instructions from the trial runs of these methods came in the form of cookbook style recipes. These are easy enough to follow and can be deconstructed to understand what each parameter entails, but if errors are generated during any steps of the routines, it is difficult for novice users to troubleshoot. A ticket system is in place for questions, but further training would be needed to give the users enough confidence to ask for help in a meaningful way. Further training to obtain a deeper understanding of the systems architecture and the use of advanced Linux commands and batch scripting would go a long way toward ensuring the adoption of these resources by non-traditional HPC users.
Conclusions and Future Wor