Collapse of the data silo is related to the larger phenomenon
of overall data aggregation on the web.
A growing pool of online structured data, better
tools, and, again, a large economic driver are all
pushing organizations to aggregate and use data
from a multitude of sources.
Finally, one of the most significant shifts in the
software industry is the explicit transition toward
agile software development. Agile development
includes iterative software development methodologies
in which both requirements and solutions
evolve during the development of the software.
Agile methodologies are in contrast to waterfall
methodologies, which imply that requirements are
well known before development starts (Larman
and Basili 2003). More effective agile processes are
important not only for software development, but
for organizations of all types and sizes. The necessity
to keep organizations aligned with opportunities
and external competitive forces is forcing this
reality. Being agile allows an organization to adapt
more rapidly to external forces, which in turn
increases the chances of survival.
For companies, the above trends are resulting in
more effective use of enterprise software as well as
more efficient business operations. Effectiveness is
driving adoption across the business landscape,
across industries, and from very small companies
up to the Global 2000. Efficiency is driving application
acceptance and usage within the company.
This combination of effectiveness and efficiency,
driving adoption and usage, is fueling enormous
growth of structured business data.
Large volumes of structured business data
require significant effort to maintain the quality of
the data. For instance, with customer-relationship
management (CRM) systems (deployed under SaaS
or traditional software installations), ActivePrime
and its partners have found that data quality has
become the number one issue that limits return on
investment. As the volume of data grows, the pain
experienced from poor quality data grows more
acute. Data quality has been an ongoing issue in
the IT industry for the past 30 years, and it continues
and is expanding as an issue, fueling the
growth of the data-quality industry to one billion
dollars in 2008. It is also estimated that companies
are losing 6 percent of sales because of poor management
of customer data (Experian QAS 2006).
Several competing definitions of data quality
exist. The pragmatic definition is considered here;
specifically, if data effectively and efficiently sup-
Articles
66 AI MAGAZINE
ports an organization’s analysis, planning, and
operations, then that data is considered of high
quality. In addition, data cleansing is defined as
manual or automated processes that are expected
to increase the quality of data.