File-based, application-led approaches to data storage often lead to problems. The duplication of data over many files, each being the responsibility of a different person or department, can lead to update difficulties and the presence of inconsistent data in the organization. The same data may also be represented in different storage formats in different files and the files themselves may have different organization and access characteristics. The dependence of application programs on the files that is serve them increases the difficulty of changing data storage structures without having to change the programs that access them.
The database approach, on the other hand, recognizes the important of developing an integrated store of data structured in a meaningful manner for the organization. The database contains data stored with minimal redundancy and organized in a manner that is a logical reflection of the relationships between the entities on which data is held.
Database management systems are sophisticated software packages that maintain the database and present an interface to users and user programs that is independent of physical storage details. This logical presentation of the data facilitates user enquiries and applications program development – programmers need be concerned with what data is required for an application also allows physical reorganization of the database without the need for application program changes. Commercial database systems define the logical structure of the database using a data definition language (DDL) and allow data alterations through a data manipulation language (DML). Other facilities provided are data dictionaries, accounting utilities, concurrency control, backup, recovery and security features.
In understanding database systems, it is useful to identify three separate levels at which data maybe represented:
1. The conceptual schema (an overall logical view of the database);
2. The external schema (a logical presentation of part of the database in the way most suitable to meet a user’s requirements);
3. The internal schema (the representation of storage and access characteristics for the data).
Three data models have had significant impact on the development of commercial database management systems software. They are, chronologically, the hierarchical, network and relational models.
Both the hierarchical and network models impose restrictions on the way relationships can be represented and data accessed. The hierarchical is more limiting, restricting data to tree structures using downward-pointing 1:n relationships. Network structures do not allow the direct representation of m:n relationships. Relational database management systems are table-based logical representations of data structures that allow simple and powerful data manipulation. The advantages of relational systems in terms of their slow speed of operation. This makes them unsuitable for high-volume, transaction-based data processing.
The way that a data model is developed for an organization and the design of a database to incorporate this model is reserved for Chapter 13 on data analysis and modeling. The entity-relationship modeling approach will be used, and the techniques of normalization (often associated with the design of effective relational databases) will be explained there.
Recently, a great deal of interest has been expressed in the development of data warehouses. These repositories of aggregated data lie outside the day-to-day transaction-processing systems. They provide a series of time-snapshots of data, which can be extracted and presented in many formats. Various techniques have evolved to search (or mind) the data. Data mining can be a valuable source of new knowledge to an organization as trends and patterns can be detected that would not otherwise be evident. Data warehouse can prove to be a high-cost solution, but they often provide improvements in customer relationship management and can lead to significant competitive business advantage.