In the context of BI and data mining, the phrase “Extract, Transform, and Load”
(ETL) is used to describe the process that involves: (a) extracting data from outside
sources, (b) transforming it to fit operational needs (dealing with syntactical and semantical issues while ensuring predefined quality levels), and (c) loading it into
the target system, e.g., a data warehouse or relational database. A data warehouse
is a single logical repository of an organization’s transactional and operational data.
The data warehouse does not produce data but simply taps off data from operational
systems. The goal is to unify information such that it can be used for reporting,
analysis, forecasting, etc. Figure 4.1 shows that ETL activities can be used to populate a data warehouse. It may require quite some efforts to create the common view required for a data warehouse. Different data sources may use different keys, formatting conventions, etc. For example, one data source may identify a patient by her last name and birth date while another data source uses her social security number.One data source may use the date format “31-12-2010” whereas another uses the format “2010/12/31”.