3.5.2 Data transparency and governance
Big Data use cases often build upon a smart combination
of individual data sources which jointly provide new
perspectives and insights. But in many companies the
reality is that three major challenges must be addressed
to ensure successful implementation.
First, to locate data that is already available in the company,
there must be full transparency of information
assets and ownership. Secondly, to prevent ambiguous
data mapping, data attributes must be clearly structured
and explicitly defined across multiple databases. And
thirdly, strong governance on data quality must be
maintained. The validity of mass query results is likely
to be compromised unless there are effective cleansing
procedures to remove incomplete, obsolete, or duplicate
data records. And it is of utmost importance to assure
high overall data quality of individual data sources
because – with the boosted volume, variety, and velocity
of Big Data – it is more difficult to implement efficient
validation and adjustment procedures.
3.5.3 Data privacy
In the conceptual phase of every Big Data project, it is
essential to consider data protection and privacy issues.
Personal data is often revealed when exploiting information
assets, especially when attempting to gain customer
insight. Use cases are typically elusive in countries with
strict data protection laws, yet legislation is not the only
constraint. Even when a use case complies with prevailing
laws, the large-scale collection and exploitation of data
often stirs public debate and this can subsequently
damage corporate reputation and brand value.
3.5.4 Data science skills
A key to successful Big Data implementation is mastery
of the many data analysis and manipulation techniques
that turn vast raw data into valuable information. The
skillful application of computational mathematics makes
or breaks reliable and meaningful insights. In most
industries, the required mathematical and statistical
skill set is scarce. In fact, a talent war is underway, as
more and more companies recognize they must source
missing data science skills externally. Very specialized
knowledge is required to deploy the right techniques
for each particular data processing problem, so organizations
must invest in new HR approaches in support
of Big Data initiatives.
3.5.5 Appropriate technology usage
Many data processing problems currently hyped
as “Big Data challenges” could, in fact, have been
technically solved five years ago. But back then, the
required technology investment would have shattered
every business case. Now at a fraction of the cost,
raw computing power has exponentially increased,
and advanced data processing concepts are available,
enabling a new dimension of performance. The most
prominent approaches are in-memory data storage
and distributed computing frameworks. However,
these new concepts require adoption of entirely new
technologies.
For IT departments to implement Big Data projects
therefore requires a thorough evaluation of established
and new technology components. It needs to be
established whether these components can support a
particular use case, and whether existing investments
can be scaled up for higher performance. For example,
in-memory databases (such as the SAP HANA system)
are very fast but have a limited volume of data storage,
while distributed computing frameworks (such as the
Apache Hadoop framework) are able to scale out to a
huge number of nodes but at the cost of delayed data
consistency across multiple nodes.
In summary, these are the five success factors that
must be in place for organizations to leverage data
for better business performance. Big Data is ready
to be used.