4. APPLICATIONS & Damasc
Scientists rely on file systems to store and manipulate large volumes
of data. Since the POSIX interface provides little support for
structured data (beyond directories and files), scientists typically
rely on external data management libraries or APIs (e.g., HDF5)
that provide higher level abstractions to support access and updates
to structured data in the development of scientific applications.
Damasc can be leveraged by existing applications without rewriting
the application code, as long as data management libraries (e.g.,
NetCDF [20], NetCDF-4 [34], HDF5 [13], MPI-IO [10]) are rewritten
to interface Damasc. That is, rewrite only the external libraries
to issue declarative queries instead of reading/writing byte extents.
For instance, HDF API can be implemented on top of Damasc. In
this way, any existing application that makes use of the HDF API
can interface with Damasc without any changes to its code.
A consequence of having a common interface (i.e., Damasc) to
multiple applications is that Damasc can synthesize access information
across these applications. The aggregated access pattern
across applications can be analyzed to derive a more effective layout
and storage of data for the access pattern. For instance, the
striping strategy for a file may take into account the corresponding
query patterns in order to improve locality of access within stripes.
This type of optimization becomes very difficult when the file system
only observes low-level information such as reads and writes
to byte extents.