3.3 Spatial Locality for Compression
The processing of continuous SDH query using quad-tree
is an efficient way of answering the RDF like queries. Basically,
by splitting the space into fixed size blocks, we are
expecting to get regions that do not change for some time
in the simulation. This actually identifies the spatial locality
of particles that are not changing over a period of time.
Once we identify such regions, the data storage space can
be saved by compressing the cells.
A level li of the quad-tree that has maximum number
of cells retaining almost constant number of particles over a
fraction of time is identified. Basically, we look for maximum
number of cells that have the density ratio r close to 1. Then
we can apply a compression technique like DCT. The cells
that have the r close to 1 are not required to decompress for
answering queries. This is similar to compression of images
and videos by segmenting into blocks. However, choosing
the correct level li is critical for success of the technique. It
has to be determined experimentally as it is data specific.
In our experiments of molecular simulations, it is observed
that level 9 of the quad-tree has most of the cells that have
particle count close to 1. Therefore, there is a potential
for achieving compression by exploiting the spatial locality
of the data. Following are the two important issues to be
addressed in this part of the thesis work.
• Analyzing the quad-tree structure to check the feasibility
of combining the locality features.
• Combining the advantages of temporal and spatial locality
to achieve better compression and facilitating queries on
compressed data.
3.4 Initial Results
The proposed method of compression analyzes the dimensions
of different variances in the data. By applying the DCT
along these dimensions we are able to achieve higher compression
ratio. The technique is able to achieve compression
ratio of about 17, which is significant improvement over the
DCT only method. Figure 5 shows the result by comparing
the compression ratio and the error (root mean square error
(RMSE)) introduced in the compressed data. The results
are obtained on a particle simulation data sets of 280, 000
atoms using a window size (W) of 8 frames.
The effect of compression on the analytical query result is
shown in figure 6. The result of RDF query on both original
and decompressed data is plotted. It shows the error introduced
in compressed data is minimal and the results are well
acceptable. We also applied the PCA-only [21] compression
on the same data by selecting only top two eigen vectors. A
compression ratio of 4.5 is achieved while producing RMSE
of about 2.2. Therefore, the proposed technique achieve best
compression and bears low computational cost.