2.2 Related Work
The state-of-the-art in m-BCF computation features a
dual-tree algorithm, which follows a general approach: a
kd-tree [9] or a region Oct-tree [22] is used to partition the
space. A node of the tree caches the count of particles within
each partition. Starting from some level l0 of the tree, the
algorithm attempts to resolve all m-node tuples, and in case
the m nodes do not resolve, all their child nodes will be
tried. An attempt to resolve m nodes succeeds when all the
m-particle tuples (each coming from one of the m nodes) fall
into one bucket in the BCF. For the case of 2-BCF, it has
been mathematically proved that the dual-degree algorithm
has complexity of O(N
5
3 ). Although superior to brute force
approach, its running time is undesirably long for large N.
Approximate query processing is a general technique to
address the issue of time and complexity of queries in the
simulation data sets. Since most of the queries in such data
bases are analytical and no exact answers are of interest, approximate
techniques seem suitable. There have been some
efforts to answer the queries using random sampling [4], histograms
[12] and wavelets [3]. These techniques keep a synopses
of the data for answering the queries. Use of data
synopses is a promising approach for answering queries in
decision support systems, where the focus is on answering
the query as fast as possible to improve the response time.
However, following are some issues about such approaches:
(1) the underlying disk space usage is not addressed. (2)
the solutions available in literature are focused on relational
databases only. There are no such databases for scientific
data. Thus, the task of approximate query processing is
challenging. In this thesis we propose to compress the data
in a way to facilitate efficient query processing while saving
disk storage space (details in section 3).