nodes on other nodes. The JobTracker receives heartbeats from
TaskTrackers. If a TaskTracker fails to communicate with the
JobTracker for a preset period of time, TaskTracker expiry interval,
the JobTracker assumes failure and schedules all map/reduce tasks
of the failed node on other TaskTrackers. This approach is different
from most parallel databases which abort unfinished queries upon a
node failure and restart the entire query processing (using a replica
node instead of the failed node).
By inheriting the scheduling and job tracking features of
Hadoop, HadoopDB yields similar fault-tolerance and straggler
handling properties as Hadoop.
To test the effectiveness of HadoopDB in failure-prone and heterogeneous
environments in comparison to Hadoop and Vertica,
we executed the aggregation query with 2000 groups (see Section
6.2.4) on a 10-node cluster and set the replication factor to two for
all systems. For Hadoop and HadoopDB we set the TaskTracker
expiry interval to 60 seconds. The following lists system-specific
settings for the experiments.
Hadoop (Hive): HDFS managed the replication of data. HDFS
replicated each block of data on a different node selected uniformly
at random.
HadoopDB (SMS): As described in Section 6, each node contains
twenty 1GB-chunks of the UserVisits table. Each of these
20 chunks was replicated on a different node selected at random.
Vertica: In Vertica, replication is achieved by keeping an extra copy
of every table segment. Each table is hash partitioned across the
nodes and a backup copy is assigned to another node based on a
replication rule. On node failure, this backup copy is used until the
lost segment is rebuilt.
For fault-tolerance tests, we terminated a node at 50% query
completion. For Hadoop and HadoopDB, this is equivalent to failing
a node when 50% of the scheduled Map tasks are done. For
Vertica, this is equivalent to failing a node after 50% of the average
query completion time for the given query.
To measure percentage increase in query time in heterogeneous
environments, we slow down a node by running an I/O-intensive
background job that randomly seeks values from a large file and
frequently clears OS caches. This file is located on the same disk
where data for each system is stored.
We observed no differences in percentage slowdown between
HadoopDB with or without SMS and between Hadoop with or without
Hive. Therefore, we only report results of HadoopDB with SMS
and Hadoop with Hive and refer to both systems as HadoopDB and
Hadoop from now on.
The results of the experiments are shown in Fig. 11. Node failure
caused HadoopDB and Hadoop to have smaller slowdowns than
Vertica. Vertica’s increase in total query execution time is due to
the overhead associated with query abortion and complete restart.
In both HadoopDB and Hadoop, the tasks of the failed node are
distributed over the remaining available nodes that contain replicas
of the data. HadoopDB slightly outperforms Hadoop. In Hadoop
TaskTrackers assigned blocks not local to them will copy the data
first (from a replica) before processing. In HadoopDB, however,
processing is pushed into the (replica) database. Since the number
of records returned after query processing is less than the raw size of
data, HadoopDB does not experience Hadoop’s network overhead
on node failure.
In an environment where one node is extremely slow, HadoopDB
and Hadoop experience less than 30% increase in total query execution
time, while Vertica experiences more than a 170% increase
in query running time. Vertica waits for the straggler node to complete
processing. HadoopDB and Hadoop run speculative tasks on
TaskTrackers that completed their tasks. Since the data is chunked
(HadoopDB has 1GB chunks, Hadoop has 256MB blocks), multiple
TaskTrackers concurrently process different replicas of unprocessed
blocks assigned to the straggler. Thus, the delay due to processing
those blocks is distributed across the cluster.
In our experiments, we discovered an assumption made by
Hadoop’s task scheduler that contradicts the HadoopDB model.
In Hadoop, TaskTrackers will copy data not local to them from
the straggler or the replica. HadoopDB, however, does not move
PostgreSQL chunks to new nodes. Instead, the TaskTracker of the
redundant task connects to either the straggler’s database or the
replica’s database. If the TaskTracker connects to the straggler’s
database, the straggler needs to concurrently process an additional
query leading to further slowdown. Therefore, the same feature
that causes HadoopDB to have slightly better fault tolerance
than Hadoop, causes a slightly higher percentage slow down in
heterogeneous environments for HadoopDB. We plan to modify
the current task scheduler implementation to provide hints to
speculative TaskTrackers to avoid connecting to a straggler node
and to connect to replicas instead.
7.1 Discussion
It should be pointed out that although Vertica’s percentage
slowdown was larger than Hadoop and HadoopDB, its total query
time (even with the failure or the slow node) was still lower than
Hadoop or HadoopDB. Furthermore, Vertica’s performance in the
absence of failures is an order of magnitude faster than Hadoop and
HadoopDB (mostly because its column-oriented layout of data is a
big win for the small aggregation query). This order of magnitude
of performance could be translated to the same performance as
Hadoop and HadoopDB, but using an order of magnitude fewer
nodes. Hence, failures and slow nodes become less likely for
Vertica than for Hadoop and HadoopDB. Furthermore, eBay’s
6.5 petabyte database (perhaps the largest known data warehouse
worldwide as of June 2009) [4] uses only 96 nodes in a sharednothing
cluster. Failures are still reasonably rare at fewer than 100
nodes.
We argue that in the future, 1000-node clusters will be commonplace
for production database deployments, and 10,000-node
clusters will not be unusual. There are three trends that support
this prediction. First, data production continues to grow faster than
Moore’s law (see Section 1). Second, it is becoming clear that
from both a price/performance and (an increasingly important)
power/performance perspective, many low-cost, low-power servers
are far better than fewer heavy-weight servers [14]. Third, there
โหนบนโหนอื่น ๆ JobTracker การเต้นจากที่ได้รับTaskTrackers ถ้า TaskTracker การสื่อสารด้วยการJobTracker สำหรับรอบระยะเวลา ช่วงเวลาหมดอายุ TaskTracker กำหนดล่วงหน้าJobTracker ถือว่าล้มเหลว และกำหนดการงานแผนที่/ลดทั้งหมดของโหนล้มเหลวบน TaskTrackers อื่น ๆ วิธีการนี้จะแตกต่างกันจากฐานข้อมูลแบบขนานส่วนใหญ่ที่ยกเลิกแบบสอบถามยังไม่เสร็จสิ้นตามความล้มเหลวโหนและเริ่มการประมวลผลแบบสอบถามทั้งหมด (โดยใช้แบบจำลองโหนดแทนโหนล้มเหลว)โดยการจัดกำหนดการและติดตามลักษณะของงานที่สืบทอดอย่างไร Hadoop, HadoopDB ทำให้ยอมรับข้อบกพร่องคล้ายกันและ stragglerจัดการคุณสมบัติเป็นอย่างไร Hadoopการทดสอบประสิทธิภาพของ HadoopDB ในความล้มเหลวมักแตกต่างกันสภาพแวดล้อม โดยอย่างไร Hadoop และ Verticaเราดำเนินการแบบสอบถามรวมกับ 2000 กลุ่ม (ดูหัวข้อ6.2.4) 10-โหนคลัสเตอร์และการตั้งค่าการจำลองแบบปัจจัยการระบบทั้งหมด อย่างไร Hadoop และ HadoopDB ที่เราตั้งใน TaskTrackerช่วงการหมดอายุจะ 60 วินาที ต่อไปนี้รายการระบบเฉพาะการตั้งค่าการทดลองอย่างไร Hadoop (กลุ่ม): HDFS จัดการการจำลองแบบของข้อมูล HDFSแต่ละบล็อคของข้อมูลบนโหนดอื่นเลือกสม่ำเสมอเมื่อเทียบเคียงการจำลองแบบแล้วสุ่มHadoopDB (SMS): ตามที่อธิบายไว้ในส่วน 6 แต่ละโหนดประกอบด้วย20 1GB ก้อนของตาราง UserVisits แต่ละเหล่านี้ก้อน 20 ถูกจำลองแบบแล้วบนโหนดอื่นที่สุ่มเลือกVertica: ใน Vertica จำลองสามารถทำได้ โดยการทำสำเนาof every table segment. Each table is hash partitioned across thenodes and a backup copy is assigned to another node based on areplication rule. On node failure, this backup copy is used until thelost segment is rebuilt.For fault-tolerance tests, we terminated a node at 50% querycompletion. For Hadoop and HadoopDB, this is equivalent to failinga node when 50% of the scheduled Map tasks are done. ForVertica, this is equivalent to failing a node after 50% of the averagequery completion time for the given query.To measure percentage increase in query time in heterogeneousenvironments, we slow down a node by running an I/O-intensivebackground job that randomly seeks values from a large file andfrequently clears OS caches. This file is located on the same diskwhere data for each system is stored.We observed no differences in percentage slowdown betweenHadoopDB with or without SMS and between Hadoop with or withoutHive. Therefore, we only report results of HadoopDB with SMSand Hadoop with Hive and refer to both systems as HadoopDB andHadoop from now on.The results of the experiments are shown in Fig. 11. Node failurecaused HadoopDB and Hadoop to have smaller slowdowns thanVertica. Vertica’s increase in total query execution time is due tothe overhead associated with query abortion and complete restart.In both HadoopDB and Hadoop, the tasks of the failed node aredistributed over the remaining available nodes that contain replicasof the data. HadoopDB slightly outperforms Hadoop. In HadoopTaskTrackers assigned blocks not local to them will copy the datafirst (from a replica) before processing. In HadoopDB, however,processing is pushed into the (replica) database. Since the numberof records returned after query processing is less than the raw size ofdata, HadoopDB does not experience Hadoop’s network overheadon node failure.In an environment where one node is extremely slow, HadoopDBand Hadoop experience less than 30% increase in total query executiontime, while Vertica experiences more than a 170% increasein query running time. Vertica waits for the straggler node to completeprocessing. HadoopDB and Hadoop run speculative tasks onTaskTrackers that completed their tasks. Since the data is chunked(HadoopDB has 1GB chunks, Hadoop has 256MB blocks), multipleTaskTrackers concurrently process different replicas of unprocessedblocks assigned to the straggler. Thus, the delay due to processingthose blocks is distributed across the cluster.In our experiments, we discovered an assumption made byHadoop’s task scheduler that contradicts the HadoopDB model.In Hadoop, TaskTrackers will copy data not local to them fromthe straggler or the replica. HadoopDB, however, does not movePostgreSQL chunks to new nodes. Instead, the TaskTracker of theredundant task connects to either the straggler’s database or thereplica’s database. If the TaskTracker connects to the straggler’sdatabase, the straggler needs to concurrently process an additionalquery leading to further slowdown. Therefore, the same featurethat causes HadoopDB to have slightly better fault tolerancethan Hadoop, causes a slightly higher percentage slow down inheterogeneous environments for HadoopDB. We plan to modifythe current task scheduler implementation to provide hints tospeculative TaskTrackers to avoid connecting to a straggler nodeand to connect to replicas instead.7.1 DiscussionIt should be pointed out that although Vertica’s percentageslowdown was larger than Hadoop and HadoopDB, its total querytime (even with the failure or the slow node) was still lower thanHadoop or HadoopDB. Furthermore, Vertica’s performance in theabsence of failures is an order of magnitude faster than Hadoop andHadoopDB (mostly because its column-oriented layout of data is abig win for the small aggregation query). This order of magnitudeof performance could be translated to the same performance asHadoop and HadoopDB, but using an order of magnitude fewernodes. Hence, failures and slow nodes become less likely forVertica than for Hadoop and HadoopDB. Furthermore, eBay’s6.5 petabyte database (perhaps the largest known data warehouseworldwide as of June 2009) [4] uses only 96 nodes in a sharednothingcluster. Failures are still reasonably rare at fewer than 100nodes.We argue that in the future, 1000-node clusters will be commonplacefor production database deployments, and 10,000-nodeclusters will not be unusual. There are three trends that supportthis prediction. First, data production continues to grow faster thanMoore’s law (see Section 1). Second, it is becoming clear thatfrom both a price/performance and (an increasingly important)power/performance perspective, many low-cost, low-power serversare far better than fewer heavy-weight servers [14]. Third, there
การแปล กรุณารอสักครู่..
