nodes on other nodes. The JobTracke

nodes on other nodes. The JobTracker receives heartbeats from
TaskTrackers. If a TaskTracker fails to communicate with the
JobTracker for a preset period of time, TaskTracker expiry interval,
the JobTracker assumes failure and schedules all map/reduce tasks
of the failed node on other TaskTrackers. This approach is different
from most parallel databases which abort unfinished queries upon a
node failure and restart the entire query processing (using a replica
node instead of the failed node).
By inheriting the scheduling and job tracking features of
Hadoop, HadoopDB yields similar fault-tolerance and straggler
handling properties as Hadoop.
To test the effectiveness of HadoopDB in failure-prone and heterogeneous
environments in comparison to Hadoop and Vertica,
we executed the aggregation query with 2000 groups (see Section
6.2.4) on a 10-node cluster and set the replication factor to two for
all systems. For Hadoop and HadoopDB we set the TaskTracker
expiry interval to 60 seconds. The following lists system-specific
settings for the experiments.
Hadoop (Hive): HDFS managed the replication of data. HDFS
replicated each block of data on a different node selected uniformly
at random.
HadoopDB (SMS): As described in Section 6, each node contains
twenty 1GB-chunks of the UserVisits table. Each of these
20 chunks was replicated on a different node selected at random.
Vertica: In Vertica, replication is achieved by keeping an extra copy
of every table segment. Each table is hash partitioned across the
nodes and a backup copy is assigned to another node based on a
replication rule. On node failure, this backup copy is used until the
lost segment is rebuilt.
For fault-tolerance tests, we terminated a node at 50% query
completion. For Hadoop and HadoopDB, this is equivalent to failing
a node when 50% of the scheduled Map tasks are done. For
Vertica, this is equivalent to failing a node after 50% of the average
query completion time for the given query.
To measure percentage increase in query time in heterogeneous
environments, we slow down a node by running an I/O-intensive
background job that randomly seeks values from a large file and
frequently clears OS caches. This file is located on the same disk
where data for each system is stored.
We observed no differences in percentage slowdown between
HadoopDB with or without SMS and between Hadoop with or without
Hive. Therefore, we only report results of HadoopDB with SMS
and Hadoop with Hive and refer to both systems as HadoopDB and
Hadoop from now on.
The results of the experiments are shown in Fig. 11. Node failure
caused HadoopDB and Hadoop to have smaller slowdowns than
Vertica. Vertica’s increase in total query execution time is due to
the overhead associated with query abortion and complete restart.
In both HadoopDB and Hadoop, the tasks of the failed node are
distributed over the remaining available nodes that contain replicas
of the data. HadoopDB slightly outperforms Hadoop. In Hadoop
TaskTrackers assigned blocks not local to them will copy the data
first (from a replica) before processing. In HadoopDB, however,
processing is pushed into the (replica) database. Since the number
of records returned after query processing is less than the raw size of
data, HadoopDB does not experience Hadoop’s network overhead
on node failure.
In an environment where one node is extremely slow, HadoopDB
and Hadoop experience less than 30% increase in total query execution
time, while Vertica experiences more than a 170% increase
in query running time. Vertica waits for the straggler node to complete
processing. HadoopDB and Hadoop run speculative tasks on
TaskTrackers that completed their tasks. Since the data is chunked
(HadoopDB has 1GB chunks, Hadoop has 256MB blocks), multiple
TaskTrackers concurrently process different replicas of unprocessed
blocks assigned to the straggler. Thus, the delay due to processing
those blocks is distributed across the cluster.
In our experiments, we discovered an assumption made by
Hadoop’s task scheduler that contradicts the HadoopDB model.
In Hadoop, TaskTrackers will copy data not local to them from
the straggler or the replica. HadoopDB, however, does not move
PostgreSQL chunks to new nodes. Instead, the TaskTracker of the
redundant task connects to either the straggler’s database or the
replica’s database. If the TaskTracker connects to the straggler’s
database, the straggler needs to concurrently process an additional
query leading to further slowdown. Therefore, the same feature
that causes HadoopDB to have slightly better fault tolerance
than Hadoop, causes a slightly higher percentage slow down in
heterogeneous environments for HadoopDB. We plan to modify
the current task scheduler implementation to provide hints to
speculative TaskTrackers to avoid connecting to a straggler node
and to connect to replicas instead.
7.1 Discussion
It should be pointed out that although Vertica’s percentage
slowdown was larger than Hadoop and HadoopDB, its total query
time (even with the failure or the slow node) was still lower than
Hadoop or HadoopDB. Furthermore, Vertica’s performance in the
absence of failures is an order of magnitude faster than Hadoop and
HadoopDB (mostly because its column-oriented layout of data is a
big win for the small aggregation query). This order of magnitude
of performance could be translated to the same performance as
Hadoop and HadoopDB, but using an order of magnitude fewer
nodes. Hence, failures and slow nodes become less likely for
Vertica than for Hadoop and HadoopDB. Furthermore, eBay’s
6.5 petabyte database (perhaps the largest known data warehouse
worldwide as of June 2009) [4] uses only 96 nodes in a sharednothing
cluster. Failures are still reasonably rare at fewer than 100
nodes.
We argue that in the future, 1000-node clusters will be commonplace
for production database deployments, and 10,000-node
clusters will not be unusual. There are three trends that support
this prediction. First, data production continues to grow faster than
Moore’s law (see Section 1). Second, it is becoming clear that
from both a price/performance and (an increasingly important)
power/performance perspective, many low-cost, low-power servers
are far better than fewer heavy-weight servers [14]. Third, there

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

โหนบนโหนอื่น ๆ JobTracker การเต้นจากที่ได้รับTaskTrackers ถ้า TaskTracker การสื่อสารด้วยการJobTracker สำหรับรอบระยะเวลา ช่วงเวลาหมดอายุ TaskTracker กำหนดล่วงหน้าJobTracker ถือว่าล้มเหลว และกำหนดการงานแผนที่/ลดทั้งหมดของโหนล้มเหลวบน TaskTrackers อื่น ๆ วิธีการนี้จะแตกต่างกันจากฐานข้อมูลแบบขนานส่วนใหญ่ที่ยกเลิกแบบสอบถามยังไม่เสร็จสิ้นตามความล้มเหลวโหนและเริ่มการประมวลผลแบบสอบถามทั้งหมด (โดยใช้แบบจำลองโหนดแทนโหนล้มเหลว)โดยการจัดกำหนดการและติดตามลักษณะของงานที่สืบทอดอย่างไร Hadoop, HadoopDB ทำให้ยอมรับข้อบกพร่องคล้ายกันและ stragglerจัดการคุณสมบัติเป็นอย่างไร Hadoopการทดสอบประสิทธิภาพของ HadoopDB ในความล้มเหลวมักแตกต่างกันสภาพแวดล้อม โดยอย่างไร Hadoop และ Verticaเราดำเนินการแบบสอบถามรวมกับ 2000 กลุ่ม (ดูหัวข้อ6.2.4) 10-โหนคลัสเตอร์และการตั้งค่าการจำลองแบบปัจจัยการระบบทั้งหมด อย่างไร Hadoop และ HadoopDB ที่เราตั้งใน TaskTrackerช่วงการหมดอายุจะ 60 วินาที ต่อไปนี้รายการระบบเฉพาะการตั้งค่าการทดลองอย่างไร Hadoop (กลุ่ม): HDFS จัดการการจำลองแบบของข้อมูล HDFSแต่ละบล็อคของข้อมูลบนโหนดอื่นเลือกสม่ำเสมอเมื่อเทียบเคียงการจำลองแบบแล้วสุ่มHadoopDB (SMS): ตามที่อธิบายไว้ในส่วน 6 แต่ละโหนดประกอบด้วย20 1GB ก้อนของตาราง UserVisits แต่ละเหล่านี้ก้อน 20 ถูกจำลองแบบแล้วบนโหนดอื่นที่สุ่มเลือกVertica: ใน Vertica จำลองสามารถทำได้ โดยการทำสำเนาof every table segment. Each table is hash partitioned across thenodes and a backup copy is assigned to another node based on areplication rule. On node failure, this backup copy is used until thelost segment is rebuilt.For fault-tolerance tests, we terminated a node at 50% querycompletion. For Hadoop and HadoopDB, this is equivalent to failinga node when 50% of the scheduled Map tasks are done. ForVertica, this is equivalent to failing a node after 50% of the averagequery completion time for the given query.To measure percentage increase in query time in heterogeneousenvironments, we slow down a node by running an I/O-intensivebackground job that randomly seeks values from a large file andfrequently clears OS caches. This file is located on the same diskwhere data for each system is stored.We observed no differences in percentage slowdown betweenHadoopDB with or without SMS and between Hadoop with or withoutHive. Therefore, we only report results of HadoopDB with SMSand Hadoop with Hive and refer to both systems as HadoopDB andHadoop from now on.The results of the experiments are shown in Fig. 11. Node failurecaused HadoopDB and Hadoop to have smaller slowdowns thanVertica. Vertica’s increase in total query execution time is due tothe overhead associated with query abortion and complete restart.In both HadoopDB and Hadoop, the tasks of the failed node aredistributed over the remaining available nodes that contain replicasof the data. HadoopDB slightly outperforms Hadoop. In HadoopTaskTrackers assigned blocks not local to them will copy the datafirst (from a replica) before processing. In HadoopDB, however,processing is pushed into the (replica) database. Since the numberof records returned after query processing is less than the raw size ofdata, HadoopDB does not experience Hadoop’s network overheadon node failure.In an environment where one node is extremely slow, HadoopDBand Hadoop experience less than 30% increase in total query executiontime, while Vertica experiences more than a 170% increasein query running time. Vertica waits for the straggler node to completeprocessing. HadoopDB and Hadoop run speculative tasks onTaskTrackers that completed their tasks. Since the data is chunked(HadoopDB has 1GB chunks, Hadoop has 256MB blocks), multipleTaskTrackers concurrently process different replicas of unprocessedblocks assigned to the straggler. Thus, the delay due to processingthose blocks is distributed across the cluster.In our experiments, we discovered an assumption made byHadoop’s task scheduler that contradicts the HadoopDB model.In Hadoop, TaskTrackers will copy data not local to them fromthe straggler or the replica. HadoopDB, however, does not movePostgreSQL chunks to new nodes. Instead, the TaskTracker of theredundant task connects to either the straggler’s database or thereplica’s database. If the TaskTracker connects to the straggler’sdatabase, the straggler needs to concurrently process an additionalquery leading to further slowdown. Therefore, the same featurethat causes HadoopDB to have slightly better fault tolerancethan Hadoop, causes a slightly higher percentage slow down inheterogeneous environments for HadoopDB. We plan to modifythe current task scheduler implementation to provide hints tospeculative TaskTrackers to avoid connecting to a straggler nodeand to connect to replicas instead.7.1 DiscussionIt should be pointed out that although Vertica’s percentageslowdown was larger than Hadoop and HadoopDB, its total querytime (even with the failure or the slow node) was still lower thanHadoop or HadoopDB. Furthermore, Vertica’s performance in theabsence of failures is an order of magnitude faster than Hadoop andHadoopDB (mostly because its column-oriented layout of data is abig win for the small aggregation query). This order of magnitudeof performance could be translated to the same performance asHadoop and HadoopDB, but using an order of magnitude fewernodes. Hence, failures and slow nodes become less likely forVertica than for Hadoop and HadoopDB. Furthermore, eBay’s6.5 petabyte database (perhaps the largest known data warehouseworldwide as of June 2009) [4] uses only 96 nodes in a sharednothingcluster. Failures are still reasonably rare at fewer than 100nodes.We argue that in the future, 1000-node clusters will be commonplacefor production database deployments, and 10,000-nodeclusters will not be unusual. There are three trends that supportthis prediction. First, data production continues to grow faster thanMoore’s law (see Section 1). Second, it is becoming clear thatfrom both a price/performance and (an increasingly important)power/performance perspective, many low-cost, low-power serversare far better than fewer heavy-weight servers [14]. Third, there

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

โหนดบนโหนอื่น ๆ JobTracker ได้รับหัวใจจาก
TaskTrackers ถ้า TaskTracker ล้มเหลวในการสื่อสารกับ
JobTracker สำหรับระยะเวลาที่กำหนดไว้ล่วงหน้าของเวลา TaskTracker ช่วงเวลาหมดอายุ
JobTracker ถือว่าล้มเหลวและตารางเวลาทุกแผนที่ /
ลดงานของโหนดล้มเหลวในTaskTrackers อื่น ๆ วิธีการนี้จะแตกต่างจากฐานข้อมูลแบบขนานที่สุดที่ยกเลิกคำสั่งที่ยังไม่เสร็จเมื่อความล้มเหลวของโหนดและเริ่มต้นการประมวลผลแบบสอบถามทั้งหมด(โดยใช้แบบจำลองโหนดแทนโหนดล้มเหลว). โดยการสืบทอดการจัดตารางเวลาและงานคุณลักษณะการติดตามของHadoop, HadoopDB ผลตอบแทนถัวเฉลี่ย fault- ที่คล้ายกันความอดทนและความล้าหลังจัดการคุณสมบัติเป็น Hadoop. เพื่อทดสอบประสิทธิภาพของ HadoopDB ในความล้มเหลวได้ง่ายและต่างกันสภาพแวดล้อมในการเปรียบเทียบกับHadoop และ Vertica, เราดำเนินการแบบสอบถามที่มีการรวมกลุ่ม 2000 (ดูมาตรา6.2.4) ในคลัสเตอร์ 10 โหนดและ ตั้งค่าปัจจัยการจำลองแบบสองสำหรับทุกระบบ สำหรับ Hadoop และ HadoopDB เราตั้ง TaskTracker ช่วงเวลาหมดอายุถึง 60 วินาที รายการต่อไปนี้เฉพาะระบบการตั้งค่าสำหรับการทดลอง. Hadoop (ไฮฟ์): HDFS จัดการการจำลองแบบของข้อมูล HDFS จำลองแบบบล็อกของข้อมูลในแต่ละโหนดที่แตกต่างกันเลือกที่สุ่ม. HadoopDB (SMS): ตามที่อธิบายไว้ในมาตรา 6 แต่ละโหนดมียี่สิบ1GB-ชิ้นของตาราง UserVisits แต่ละเหล่านี้20 ชิ้นถูกจำลองแบบบนโหนดที่แตกต่างกันเลือกโดยการสุ่ม. Vertica: ใน Vertica จำลองจะประสบความสำเร็จโดยการเก็บรักษาสำเนาพิเศษของส่วนทุกตาราง ตารางแต่ละครั้งจะถูกแบ่งพาร์ติชันกัญชาข้ามโหนดและสำเนาสำรองที่ได้รับมอบหมายไปยังโหนดอื่นอยู่บนพื้นฐานของกฎการจำลองแบบ ในความล้มเหลวของโหนดนี้สำเนาสำรองถูกนำมาใช้จนส่วนที่หายไปจะถูกสร้างขึ้นมาใหม่. สำหรับการทดสอบความทนทานต่อความผิดพลาดที่เรายกเลิกโหนดที่ 50% แบบสอบถามเสร็จสิ้น สำหรับ Hadoop และ HadoopDB นี้จะเทียบเท่ากับความล้มเหลวในโหนดเมื่อ50% ของแผนที่กำหนดงานที่จะทำ สำหรับVertica นี้จะเทียบเท่ากับความล้มเหลวในโหนดหลัง 50% ของค่าเฉลี่ยของเวลาแล้วเสร็จแบบสอบถามสำหรับแบบสอบถามที่กำหนด. การวัดเปอร์เซ็นต์การเพิ่มเวลาในการสอบถามในที่แตกต่างกันในสภาพแวดล้อมที่เราชะลอตัวลงโหนดโดยใช้ I / O มากงานพื้นหลังพยายามที่สุ่มค่าจากไฟล์ขนาดใหญ่และบ่อยล้างแคช OS แฟ้มนี้อยู่บนดิสก์เดียวกันที่มีข้อมูลสำหรับแต่ละระบบจะถูกเก็บไว้. เราสังเกตเห็นความแตกต่างในการชะลอตัวร้อยละระหว่างไม่มีHadoopDB มีหรือไม่มี SMS และระหว่าง Hadoop มีหรือไม่มีไฮฟ์ ดังนั้นเราจะรายงานผลการ HadoopDB กับ SMS และ Hadoop กับไฮฟ์และการอ้างอิงถึงระบบทั้ง HadoopDB และHadoop จากนี้ไป. ผลที่ได้จากการทดลองที่มีการแสดงในรูป 11. โหนดความล้มเหลวที่เกิดHadoopDB Hadoop และจะมีการชะลอตัวมีขนาดเล็กกว่าVertica Vertica เพิ่มขึ้นในเวลาที่ดำเนินการแบบสอบถามรวมเป็นเพราะค่าใช้จ่ายที่เกี่ยวข้องกับการทำแท้งแบบสอบถามและเริ่มต้นใหม่ที่สมบูรณ์. ทั้งในและ HadoopDB Hadoop งานล้มเหลวของโหนดที่มีการกระจายไปต่อมน้ำใช้ได้ที่เหลือที่มีแบบจำลองของข้อมูล HadoopDB มีประสิทธิภาพดีกว่าเล็กน้อย Hadoop ใน Hadoop TaskTrackers บล็อกที่ได้รับมอบหมายไม่ได้ในท้องถิ่นเพื่อให้พวกเขาจะคัดลอกข้อมูลครั้งแรก(จากแบบจำลอง) ก่อนที่จะประมวลผล ใน HadoopDB แต่การประมวลผลจะถูกผลักเข้าไปใน(จำลอง) ฐานข้อมูล เนื่องจากจำนวนของระเบียนกลับมาหลังจากการประมวลผลแบบสอบถามน้อยกว่าขนาดดิบของข้อมูลHadoopDB ไม่ได้สัมผัสกับค่าใช้จ่ายในเครือข่าย Hadoop ของความล้มเหลวโหนด. ในสภาพแวดล้อมที่หนึ่งโหนดช้ามาก HadoopDB และประสบการณ์ Hadoop น้อยกว่าการเพิ่มขึ้น 30% รวม ดำเนินการแบบสอบถามเวลาในขณะที่ประสบการณ์Vertica เพิ่มขึ้นกว่า 170% ในการสอบถามเวลาการทำงาน Vertica รอโหนดล้าหลังที่จะเสร็จสิ้นการประมวลผล HadoopDB Hadoop และเรียกใช้งานในการเก็งกำไรTaskTrackers ที่เสร็จงานของพวกเขา เนื่องจากข้อมูลจะถูก chunked (HadoopDB มีชิ้น 1GB, Hadoop มีบล็อก 256MB) หลายTaskTrackers พร้อมการประมวลผลแบบจำลองที่แตกต่างกันของที่ยังไม่ได้บล็อกได้รับมอบหมายให้ล้าหลัง ดังนั้นความล่าช้าเนื่องจากการประมวลผลบล็อกที่มีการกระจายทั่วคลัสเตอร์. ในการทดลองของเราที่เราค้นพบสมมติฐานที่ทำโดยกำหนดการงาน Hadoop ที่ขัดแย้งกับรูปแบบ HadoopDB. ใน Hadoop, TaskTrackers จะคัดลอกข้อมูลได้ในท้องถิ่นให้กับพวกเขาจากล้าหลังหรือแบบจำลอง HadoopDB แต่ไม่ได้ย้ายชิ้นPostgreSQL ไปยังต่อมน้ำใหม่ แทน TaskTracker ของงานที่ซ้ำซ้อนเชื่อมต่อกับฐานข้อมูลทั้งล้าหลังของหรือฐานข้อมูลแบบจำลอง ถ้า TaskTracker เชื่อมต่อไปยังล้าหลังของฐานข้อมูลที่ล้าหลังต้องการที่จะดำเนินการควบคู่กันไปเพิ่มแบบสอบถามที่นำไปสู่การชะลอตัวต่อไป ดังนั้นคุณลักษณะเดียวกันที่ทำให้เกิด HadoopDB ที่จะมีความอดทนความผิดที่ดีกว่าเล็กน้อยกว่าHadoop สาเหตุร้อยละที่สูงกว่าเล็กน้อยชะลอตัวลงในสภาพแวดล้อมที่แตกต่างกันสำหรับHadoopDB เราวางแผนที่จะปรับเปลี่ยนการดำเนินงานการจัดตารางเวลางานปัจจุบันที่จะให้คำแนะนำในการTaskTrackers เก็งกำไรเพื่อหลีกเลี่ยงการเชื่อมต่อกับโหนดล้าหลังและการเชื่อมต่อกับแบบจำลองแทน. 7.1 การอภิปรายมันควรจะชี้ให้เห็นว่าถึงแม้เปอร์เซ็นต์Vertica ของการชะลอตัวมีขนาดใหญ่กว่าHadoop และ HadoopDB รวมของ แบบสอบถามเวลา(แม้จะมีความล้มเหลวหรือโหนดช้า) ก็ยังคงต่ำกว่าHadoop หรือ HadoopDB นอกจากนี้ผลการดำเนินงาน Vertica ในกรณีที่ไม่มีความล้มเหลวเป็นลำดับความสำคัญได้เร็วกว่าHadoop และHadoopDB (ส่วนใหญ่เป็นเพราะรูปแบบของคอลัมน์ที่มุ่งเน้นข้อมูลที่เป็นผู้ชนะที่ยิ่งใหญ่สำหรับการสอบถามการรวมขนาดเล็ก) ลำดับความสำคัญนี้ของประสิทธิภาพการทำงานจะได้รับการแปลเป็นประสิทธิภาพเช่นเดียวกับHadoop และ HadoopDB แต่ใช้ลำดับความสำคัญน้อยลงโหนด ดังนั้นความล้มเหลวและต่อมน้ำช้ากลายเป็นโอกาสน้อยสำหรับVertica กว่า Hadoop และ HadoopDB นอกจากนี้อีเบย์6.5 ฐานข้อมูล petabyte (อาจจะเป็นคลังข้อมูลที่ใหญ่ที่สุดที่รู้จักกันทั่วโลกในฐานะของมิถุนายน2009) [4] ใช้เพียง 96 โหนดใน sharednothing กลุ่ม ความล้มเหลวยังคงหายากพอสมควรที่น้อยกว่า 100 โหนด. เรายืนยันว่าในอนาคตกลุ่ม 1000 โหนดจะเป็นเรื่องธรรมดาสำหรับการใช้งานฐานข้อมูลการผลิตและ10,000 โหนดกลุ่มจะไม่ผิดปกติ มีสามแนวโน้มที่สนับสนุนการมีคำทำนายนี้ ครั้งแรกที่ผลิตข้อมูลยังคงเติบโตได้เร็วกว่ากฎของมัวร์ (ดูมาตรา 1) ประการที่สองมันจะกลายเป็นที่ชัดเจนว่าทั้งจากราคาที่ผลการดำเนินงาน / และ (ที่มีความสำคัญมากขึ้นเรื่อย ๆ ) พลังงาน / มุมมองของประสิทธิภาพการทำงานจำนวนมากต้นทุนต่ำเซิร์ฟเวอร์ที่ใช้พลังงานต่ำดีกว่าน้อยกว่าเซิร์ฟเวอร์น้ำหนักหนัก[14] ประการที่สามมี

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

โหนดที่โหนดอื่น ๆ ได้รับการ jobtracker Heartbeats จาก
tasktrackers . ถ้า tasktracker ล้มเหลวที่จะสื่อสารกับ
jobtracker ในช่วงเวลาที่ตั้งไว้ล่วงหน้าของเวลา tasktracker หมดอายุช่วง
jobtracker ถือว่าความล้มเหลวและตารางเวลาทั้งหมดแผนที่ / ลดงาน
ของโหนดใน tasktrackers ล้มเหลวอื่น ๆ วิธีการนี้จะแตกต่างจากส่วนใหญ่ขนาน
ฐานข้อมูลซึ่งยกเลิกแบบสอบถามเสร็จเมื่อ
ความล้มเหลวของโหนดและเริ่มต้นการประมวลผลแบบสอบถามทั้งหมด ( โดยใช้แบบจําลอง
ปมแทนที่จะล้มเหลวโหนด ) .
โดยรับจัดงานและติดตามคุณสมบัติของ
Hadoop hadoopdb ผลผลิต , ความอดทนความผิดคล้ายกัน และคนล้าหลัง
การจัดการคุณสมบัติเป็น Hadoop .
เพื่อทดสอบประสิทธิภาพของ hadoopdb ในความล้มเหลวมักจะแตกต่างกัน
และสภาพแวดล้อมในการเปรียบเทียบ และ Hadoop ฐาน
,เราจัดการกับการสอบถามกลุ่ม 2000 ( ดูมาตรา
6.2.4 ) ในกลุ่ม 10 โหนดและปัจจัยที่สองสำหรับการตั้งค่า
ทุกระบบ และสำหรับ Hadoop hadoopdb เราตั้ง tasktracker
หมดอายุช่วง 60 วินาที รายชื่อระบบเฉพาะ
การตั้งค่าการทดลอง
Hadoop ( รังผึ้ง ) : hdfs จัดการการจำลองแบบของข้อมูล hdfs
ซ้ำแต่ละบล็อกของข้อมูลในโหนดที่แตกต่างกันโดยการสุ่มเลือก
.
hadoopdb ( SMS ) : ตามที่ระบุไว้ในมาตรา 6 , แต่ละโหนดประกอบด้วย
ยี่สิบ 1GB chunks ของ uservisits ตาราง แต่ละเหล่านี้
20 ชิ้น เป็นจำนวนที่แตกต่างกันของโหนดที่เลือกแบบสุ่ม
ฐาน : ในฐาน การจําลองแบบได้โดยรักษาสำเนาพิเศษ
ของตารางทุกเซ็กเมนต์แต่ละตาราง hash พาร์ติชันข้าม
โหนดและสำเนาสำรองจะถูกมอบหมายไปยังอีกโหนดตาม
ซ้ำกฎ ในต่อมความล้มเหลวนี้รองไว้ใช้จนกว่าจะหาย ส่วนจะสร้างใหม่
.
สำหรับการทดสอบความอดทนความผิดเรายุติปมที่สมบูรณ์สอบถาม
50% และสำหรับ Hadoop hadoopdb นี้จะเทียบเท่ากับความล้มเหลว
โหนดเมื่อ 50% ของราคางาน กำหนดเสร็จ ฐานสำหรับ
,นี้จะเทียบเท่ากับความล้มเหลวโหนดหลัง 50% ของจำนวนแบบสอบถามเพื่อให้เสร็จทัน

สอบถาม เพื่อวัดเปอร์เซ็นต์เพิ่มในแบบสอบถามเวลาในสภาพแวดล้อมที่แตกต่างกัน
เราช้าลงโหนด โดยวิ่งเป็น I / o-intensive
พื้นหลังงานที่สุ่มและค่าจากไฟล์ขนาดใหญ่และ
บ่อยล้าง OS แคช . ไฟล์นี้จะอยู่ในดิสก์เดิมที่ข้อมูล

แต่ละระบบจะถูกเก็บไว้เราพบว่าไม่มีความแตกต่างในอัตราระหว่างร้อยละ
hadoopdb ที่มีหรือไม่มี SMS และระหว่าง Hadoop กับหรือไม่
รังผึ้ง ดังนั้น เราจะรายงานผล hadoopdb กับ SMS
Hadoop กับรังและหมายถึงระบบทั้ง hadoopdb Hadoop และ

จาก ผลการทดลองแสดงในรูปที่ 11 ต่อมความล้มเหลว
ทำให้ hadoopdb Hadoop จะชะลอตัวและมีขนาดเล็กกว่า
ฐาน .ฐานเพิ่มในแบบสอบถามเวลาประหารทั้งหมด เนื่องจากค่าใช้จ่ายที่เกี่ยวข้องกับการทำแท้ง

สอบถามและให้เริ่มต้นใหม่ ทั้งในและ hadoopdb Hadoop งานของโหนดล้มเหลว
แจกที่เหลือของโหนดที่ประกอบด้วยแบบจำลอง
ของข้อมูล hadoopdb เล็กน้อยโปรย Hadoop . ใน Hadoop
tasktrackers มอบหมายบล็อกไม่ท้องถิ่นเพื่อพวกเขาจะคัดลอกข้อมูล

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.