systems. However, intrinsic mismatches between
OLAP-style querying and SQL (e.g., lack of sequential
processing, column aggregation) can cause performance
bottlenecks for OLAP servers.
• MOLAP Servers: These servers directly support the
multidimensional view of data through a
multidimensional storage engine. This makes it possible
to implement front-end multidimensional queries on the
storage layer through direct mapping. An example of
such a server is Essbase (Arbor). Such an approach has
the advantage of excellent indexing properties, but
provides poor storage utilization, especially when the
data set is sparse. Many MOLAP servers adopt a 2-level
storage representation to adapt to sparse data sets and
use compression extensively. In the two-level storage
representation, a set of one or two dimensional subarrays
that are likely to be dense are identified, through the use
of design tools or by user input, and are represented in
the array format. Then, the traditional indexing structure
is used to index onto these “smaller” arrays. Many of the
techniques that were devised for statistical databases
appear to be relevant for MOLAP servers.
SQL Extensions
Several extensions to SQL that facilitate the expression and
processing of OLAP queries have been proposed or
implemented in extended relational servers. Some of these
extensions are described below.
• Extended family of aggregate functions: These include
support for rank and percentile (e.g., all products in the
top 10 percentile or the top 10 products by total Sale) as
well as support for a variety of functions used in
financial analysis (mean, mode, median).
• Reporting Features: The reports produced for business
analysis often requires aggregate features evaluated on a
time window, e.g., moving average. In addition, it is
important to be able to provide breakpoints and running
totals. Redbrick’s SQL extensions provide such
primitives.
• Multiple Group-By: Front end tools such as
multidimensional spreadsheets require grouping by
different sets of attributes. This can be simulated by a set
of SQL statements that require scanning the same data
set multiple times, but this can be inefficient. Recently,
two new operators, Rollup and Cube, have been
proposed to augment SQL to address this problem29.
Thus, Rollup of the list of attributes (Product, Year, City )
over a data set results in answer sets with the following
applications of group by: (a) group by (Product, Year,
City) (b) group by (Product, Year), and (c) group by
Product. On the other hand, given a list of k columns, the
Cube operator provides a group-by for each of the 2k
combinations of columns. Such multiple group-by
operations can be executed efficiently by recognizing