brief overview over related work is given. Section 3 starts introducing the family of XCube formats. Besides the basic standards for storing dimensions, classification hierarchies and facts also some advanced, dynamic standards are dealt with, that allow to build a basic service infrastructure. Section 5 closes with an outlook how XCube can be combined with Web Services and some remarks on our prototype. 2. MOTIVATING USE CASES Before explaining XCube this section is meant to sketch some typical use cases that could benefit from our approach. The driving force between all the scenarios presented below is that many data warehouses exist containing lots of valuable information spread over different departments of the same company or even over different companies. The exchange of data between these heterogeneous systems could be done much easier if there was a unique format for describing the data warehouse data in a standardized way. 2.1 The “Download” Use Case A web server offers data warehouse cubes for download (figure 1). Every client data warehouse interested in this data can download it and integrate it into its local database. First of all the description of the multidimensional schema of the cube is downloaded, in a second step the master data is exchanged, i.e. the classification hierarchies of the single dimensions. Finally the transaction data is sent to the requestor and integrated into its local data warehouse. The integration on the client side is easy as long as any expression in the new cube does not conflict with those of existing data cubes. If on the other hand for example a dimension already exists the exact relationship between the two has to be carefully checked, which leads to the problem of semantic integration. An interesting tool for integrating data from various sources could be the use of standardized reference dimensions. 2.2 The “Query” Use Case As the amount of data of a multidimensional cube is usually rather large (especially the transaction data part) the possibility to analyze data cubes online, i.e. on the (web) server side, is highly desirable to reduce the consumption of network capacities. In the online query case only schema and master data have to be downloaded from the web server completely which are smaller by magnitudes (steps 1 and 2 in figure 2). With this description in hand the client application (not only data warehouses but e.g. OLAP tools) can decide which subset of the data cube it needs and can then send an according query to the web server (step 3). The server then computes the desired transaction data subcube and sends it to the client (step 4). If the client application is a data warehouse that wants to integrate the result into its own database, this use case is a generalization of the download case presented in section 2.1.
For the query scenario it is important to think about an update strategy for schema and master data on the client side because when the client keeps this data for a longer time it can become outdated. The problem of conflicting expressions has to be treated in the same way as mentioned above.
2.3 The “Generating” Use Case The generating case is about how to create data cubes that can be offered by a web server. In principle data of any format from all kinds of data sources can be converted to a multidimensional cube. The conversion process is the same as when integrating a new source into a warehouse: the data has to undergo the complete ETL workflow. The more interesting case is generating an online cube from an existing data warehouse. Here the expensive integration task is already done and it only has to be decided which subset of the data is to be published. The transformation of multidimensional data (independent of relational or multidimensional storage) is simple and can be done automatically. For implementing this case the introduction of a data mart holding only that subset of data that is meant for going online might be useful. Another interesting design choice for implementation is if transforming the warehouse data into online cubes is done statically or dynamically on demand by a set of SQL queries. 2.4 Requirements for Representing Online Data Cubes The last three subsections gave a set of examples motivating the necessity of a web based format for exchanging data cubes. This paper describes XCube, a family of XML schemas to precisely describe these online cubes. As the cubes are supposed to be transferred over the Internet and contain highly structured data, using XML ([1], [2]) seems reasonable. Before explaining the XCube standards in the next section a list of requirements to a format expressing online data cubes is introduced which can be easily derived from the scenarios presented above. 1. The format has to support a multidimensional data model. 2. The conceptual distinction between the description of schema, master or dimension data and transaction or fact data has to be supported. 3. The format has to be trans