We conclude. The amount of data stored is growing 1-2 orders
of magnitude faster than the speeds of the computers accessing
that data and thus moving the data to the processors is becoming
a significant overhead to large-scale data processing. User-level
libraries and databases provide much-needed functionality to applications,
but suffer from performance overheads. User-level libraries
requires data to be moved to the host processors and hide
important data layout and access pattern information from the file
system, limiting the opportunity for effective prefetching, intelligent
data layout, and other possible optimization. Databases provide
important functionality, but cannot provide the raw throughput
required of high-performance data intensive applications. Damasc
will address these shortcomings by adding a layer of data management
functionality to file systems. This has several key benefits
over existing file systems and databases. It will provide additional
information to the storage system, allowing intelligent datastructure
and access-pattern aware data layout and storage management.
It will also facilitate in-place processing of the data using
knowledge of the data layout. This will initially be used for data
management operations such as indexing, querying, and provenance
management. Eventually, we hope this will lay the foundation
for future full-scale distributed processing over the storage
system. Our eventual goal is to provide standard interfaces to support
arbitrary applications running on remote processors or directly
over storage system.