If some dimension tables are themselves used as central tables for other stars, the
overall schema is called a “snowflake schema”. A set of star schemas with shared dimension
tables is sometimes called a “galaxy”.
Data warehouses and data marts are used for online analytical processing (OLAP)
and data mining. The term “OLAP” was introduced by Edgar Codd to describe interactive
analysis of dimensioned and aggregated data for decision support (Codd et al.
1993). Data mining involves deeper analysis of the data, typically using sophisticated
statistical techniques and complex algorithms for detecting patterns. Nowadays, many
tools for OLAP and data mining are in use. Either topic deserves a book in itself. The
remainder of this section provides a brief overview of OLAP.
There are three main approaches to OLAP. Each uses base data as well as aggregated
data (e.g., sales figures might be summed and grouped at various levels). Multidimensional
OLAP (MOLAP) stores both base and aggregated data in multidimensional
structures rather than tables. Relational OLAP (ROLAP) stores both base
and aggregated data in relational tables. Hybrid OLAP (HOLAP) stores base data in
relational tables and aggregated data in multidimensional structures. Some DBMSs,
such as Microsoft SQL Server, support all three kinds of OLAP.
The multidimensional structures used for OLAP are popularly known as cubes. In
geometry, a cube is a three-dimensional box structure. In OLAP theory, a cube can
have as many dimensions as you like. Cubes provide an intuitive way to visualize and
browse data, as well as fast access to aggregate data. Let’s consider a simple example.
With reference to the star schema in Figure 16.3, suppose we wanted to list the
number of units sold in the years 2007 through 2009 for each geographic region and
each item category. As an exercise you might like to formulate the SQL query for this.
You need to perform the natural join of the four tables, group by saleYear, region and
category, and compute sum(qty). An extract of a possible result is shown in Table 16.1,
assuming only two categories (SW = Software, HW = Hardware) and four regions (N
= North, S = South, E = East, W = West). The fact type underlying this table is the
quaternary: Year in Region had sales of items of Category in NrUnits. The full table display of
the