The analysis of Big Data involves multiple distinct phases as shown in the figure below, each of
which introduces challenges. Many people unfortunately focus just on the analysis/modeling phase:
while that phase is crucial, it is of little use without the other phases of the data analysis pipeline. Even
in the analysis phase, which has received much attention, there are poorly understood complexities in
the context of multi-tenanted clusters where several users’ programs run concurrently. Many significant
challenges extend beyond the analysis phase. For example, Big Data has to be managed in context,
which may be noisy, heterogeneous and not include an upfront model. Doing so raises the need to track
provenance and to handle uncertainty and error: topics that are crucial to success, and yet rarely
mentioned in the same breath as Big Data. Similarly, the questions to the data analysis pipeline will
typically not all be laid out in advance. We may need to figure out good questions based on the data.
Doing this will require smarter systems and also better support for user interaction with the analysis
pipeline. In fact, we currently have a major bottleneck in the number of people empowered to ask
questions of the data and analyze it [NYT2012]. We can drastically increase this number by supporting
3
many levels of engagement with the data, not all requiring deep database expertise. Solutions to
problems such as this will not come from incremental improvements to business as usual such as
industry may make on its own. Rather, they require us to fundamentally rethink how we manage data
analysis.
The analysis of Big Data involves multiple distinct phases as shown in the figure below, each ofwhich introduces challenges. Many people unfortunately focus just on the analysis/modeling phase:while that phase is crucial, it is of little use without the other phases of the data analysis pipeline. Evenin the analysis phase, which has received much attention, there are poorly understood complexities inthe context of multi-tenanted clusters where several users’ programs run concurrently. Many significantchallenges extend beyond the analysis phase. For example, Big Data has to be managed in context,which may be noisy, heterogeneous and not include an upfront model. Doing so raises the need to trackprovenance and to handle uncertainty and error: topics that are crucial to success, and yet rarelymentioned in the same breath as Big Data. Similarly, the questions to the data analysis pipeline willtypically not all be laid out in advance. We may need to figure out good questions based on the data.Doing this will require smarter systems and also better support for user interaction with the analysispipeline. In fact, we currently have a major bottleneck in the number of people empowered to askquestions of the data and analyze it [NYT2012]. We can drastically increase this number by supporting 3many levels of engagement with the data, not all requiring deep database expertise. Solutions toปัญหานี้จะได้มาจากการปรับปรุงแบบเพิ่มหน่วยธุรกิจตามปกติเช่นอุตสาหกรรมอาจทำให้ของนั้นเอง ค่อนข้าง พวกเขาต้องการเรา rethink วิธีที่เราจัดการข้อมูลพื้นฐานวิเคราะห์
การแปล กรุณารอสักครู่..