Self-Organizable Indexing. We expect that indexing will be critical
for the optimized evaluation of declarative queries. However,
selecting the appropriate indices is a difficult optimization task that
must balance the benefit of indexing against the overhead in storage
and computation (indices have to be kept up-to-date with respect to
modifications in the indexed file). To avoid delegating this problem
to applications, we extend Damasc with a service that continuously
monitors the querying patterns of applications and automatically
installs indices to speed up query processing. The set of constructed
indices may thus evolve over time to match the queries issued by
applications. We term this paradigm self-organizable indexing.
Clearly, we expect indices to be created based on the access patterns
of recently evaluated queries. This points to the idea of hy-brid, partial indexing, where different parts of the same file may
be indexed using different structures. Let us consider a bibliography
file as an example. The file system may choose to build an
inverted-list index on the keywords appearing in title entries of publications,
and a path index [22] that records the paths found in the
semi-structured representation of the file. The two indices may be
combined in the same query plan or used in isolation—this decision
depends on the query and is left again to the Query Executor.
Overall, Damasc’s indexing paradigm is significantly different than
existing approaches in file systems [2, 15, 37, 19], which typically
index all information in a file using a single type of physical structure.
At the same time, it introduces an additional challenge in the
problem of online index selection, requiring the development of
novel solutions.
Indices bring important benefits to query processing but they also
incur overhead when files are updated. The self-organizable indexing
module will take such overheads into account when the indices
are created. A different approach is to apply updates to indices
only if the system load permits it. This implies that certain parts
of the index may become out-of-date temporarily and thus unavailable
for query processing. This can also be viewed as a case of
partial indexing, except that partialness refers now to the validity
of information in the index.