Describing the Data Warehouse A data warehouse is a repository of historical data that are organized by subject to support decision makers in the organization. Data warehouses facilitate business intelligence activities, such as data mining and decision support. The basic characteristics of a data warehouse include:
x Organized by business dimension or subject. Data are organized by subject (for example, by customer, vendor, product, price level, and region) and contain information relevant for decision support and data analysis.
x Consistent. Data in different databases may be encoded differently. For example, gender data may be encoded 0 and 1 in one operational system and “m” and “f” in another. In the data warehouse, though, all data must be coded in a consistent manner.
x Historical. The data are kept for many years so that they can be used for trends, forecasting, and making comparisons over time.
x Nonvolatile. Data are not updated after they are entered into the warehouse.
x Use online analytical processing. Typically, organizational databases are oriented toward handling transactions. That is, databases use online transaction processing (OLTP), where business transactions are processed online as soon as they occur. The objectives are speed and efficiency which are critical to a successful Internet-based business operation. Data warehouses, which are not designed to support OLTP but to support decision makers, use online analytical processing. Online analytical processing (OLAP) involves the analysis of accumulated data by end users.
x Multidimensional. Typically the data warehouse uses a multidimensional data structure. Recall that relational databases store data in two—dimensional tables. In contrast, data warehouses store data in more than two dimensions. For this reason, the data are said to be stored in a multidimensional structure. A common representation for this multidimensional structure is the data cube.