The parallel run with two instances of DROID, each working on one subdirectory of the parent directory of the data collection, took about four hours to complete. Figure 3 shows the comparison of the time involved in running multiple DROID instances with different levels and numbers of subdirectories. It should be noted here that the level of nesting of the directory hierarchy, along with the number of files in each directory, has an impact on the overall runtime of the DROID jobs. It is clear (from Figure 3) that in our test case, running 31 instances of DROID (as there are 31 directories at the second level of nesting) led to the shortest time-to-result with the optimal load-balancing scheme – one hour and fifteen minutes in total. The rate-determining step, the one that took longest to complete, was related to one of the subdirectories of a directory at level-1 of nesting-hierarchy. Since each compute-node that was involved in the parallel runs of DROID had 32 GB of memory only, we ran only one instance of DROID on this compute-node. It is possible to run up to 16 instances of DROID on each compute node of Stampede, as there are 16 cores on each node. Each instance of DROID could be running on one core of the compute-node. However, the size of the directory associated with each DROID instance determines the total number of instances of DROID that can be launched on a node. This is because all the instances of the DROID software running on a node will have to use the shared memory available on the compute-node, which is 32 GB in case of Stampede. Therefore, to prevent the computational job from crashing due to memory-starvation on a node, it is important to distribute the parallel runs over multiple compute nodes such that each DROID instance has enough memory available to it.