Improving data quality through effective use of data semantics
Data quality issues have taken on increasing importance in recent years. In our research, we have discovered that many “data quality” problems are actually “data misinterpretation” problems—that is, problems caused by heterogeneous data semantics. In this paper, we first identify semantic heterogeneities that, when not resolved, often cause data quality problems. We discuss the especially challenging problem of aggregational ontological heterogeneity, which concerns how complex entities and their relationships are aggregated. Then we illustrate how COntext INterchange (COIN) technology can be used to capture data semantics and reconcile semantic heterogeneities, thereby improving data quality.