Introduction
Big data is both a marketing and a technical term referring to a valuable enterprise asset—information. Big data
represents a trend in technology that is leading the way to a new approach in understanding the world and making
business decisions. These decisions are made based on very large amounts of structured, unstructured and complex
data (e.g., tweets, videos, commercial transactions) which have become difficult to process using basic database and
warehouse management tools. Managing and processing the ever-increasing data set requires running specialized
software on multiple servers. For some enterprises, big data is counted in hundreds of gigabytes; for others, it is in
terabytes or even petabytes, with a frequent and rapid rate of growth and change (in some cases, almost in real time).
In essence, big data refers to data sets that are too large or too fast-changing to be analyzed using traditional relational
or multidimensional database techniques or commonly used software tools to capture, manage and process the data at a
reasonable elapsed time.
According to COBIT® 5, information is effective if it meets the needs of the information consumer (who is considered
a stakeholder). In the case of big data, the enterprise is the stakeholder, and one of its primary stakes is information
quality. The stakes can be related to information goals in the COBIT 5 enabler model, which divides them into three
subdimensions of quality, described later in this white paper. The better the quality of the data, the better the decisions
based on the data—ultimately creating value for the enterprise. Therefore, big data management must ensure the quality
of the data throughout the data life cycle.
Data are collected to be analyzed to find patterns and correlations that may not be initially apparent, but may be useful
in making business decisions. This process is called big data analytics. These data are often personal data that are useful
from a marketing perspective in understanding the likes and dislikes of potential buyers and in analyzing and predicting
their buying behavior. Personal data can be categorized as:
• Volunteered data—Created and explicitly shared by individuals (e.g., social network profiles)
• Observed data—Captured by recording the actions of individuals (e.g., location data when using cell phones)
• Inferred data—Data about individuals based on analysis of volunteered or observed information (e.g., credit scores)