1. Introduction
Data quality issues have taken on increasing importance in recent years. In our research, we have discovered
that many ‘‘data quality’’ problems are actually ‘‘data misinterpretation’’ problems—that is, problems
with data semantics. To illustrate how complex this can become, consider Fig. 1. This data summarizes the
P/E ratio for DaimlerChrysler obtained from four different financial information sources—all obtained on
the same day within minutes of each other. Note that the four sources gave radically different values for
P/E ratio.
The obvious questions to ask are: ‘‘Which source is correct?’’ and ‘‘Why are the other sources wrong—i.e.,
of bad data quality?’’ The possibly surprising answer is: they are all correct!
The issue is, what do you really mean by ‘‘P/E ratio’’.1 The answer lies in the multiple interpretations and
uses of the term ‘‘P/E ratio’’ in financial circles. The earnings are for the entire year in some sources but in one
source are only for the last quarter. Even when earnings are for a full year, are they