The quality of a large real world data set depends on a number of issues [9, 39, 40], but the
source of the data is the crucial factor. Data entry and acquisition is inherently prone to errors
both simple and complex. Much effort can be given to this front-end process, with respect to
reduction in entry error, but the fact often remains that errors in a large data set are common.
Unless an organization takes extreme measures in an effort to avoid data errors the field errors
rates are typically around 5% or more