is now, more than ever, a requireme

is now, more than ever, a requirement to perform data analysis
inside of the DBMS, rather than pushing data to external systems
for analysis. Disk-heavy architectures such as the eBay 96-node
DBMS do not have the necessary CPU horsepower for analytical
workloads [4].
Hence, awaiting us in the future are heavy-weight analytic
database jobs, requiring more time and more nodes. The probability
of failure in these next generation applications will be far larger
than it is today, and restarting entire jobs upon a failure will be
unacceptable (failures might be common enough that long-running
jobs never finish!) Thus, although Hadoop and HadoopDB pay a
performance penalty for runtime scheduling, block-level restart,
and frequent checkpointing, such an overhead to achieve robust
fault tolerance will become necessary in the future. One feature
of HadoopDB is that it can elegantly transition between both ends
of the spectrum. Since one chunk is the basic unit of work, it can
play in the high-performance/low-fault-tolerance space of today’s
workloads (like Vertica) by setting a chunk size to be infinite, or in
high fault tolerance by using more granular chunks (like Hadoop).
In future work, we plan to explore the fault-tolerance/performance
tradeoff in more detail.
8. CONCLUSION
Our experiments show that HadoopDB is able to approach the
performance of parallel database systems while achieving similar
scores on fault tolerance, an ability to operate in heterogeneous environments,
and software license cost as Hadoop. Although the
performance of HadoopDB does not in general match the performance
of parallel database systems, much of this was due to the
fact that PostgreSQL is not a column-store and we did not use data
compression in PostgreSQL. Moreover, Hadoop and Hive are relatively
young open-source projects. We expect future releases to
enhance performance. As a result, HadoopDB will automatically
benefit from these improvements.
HadoopDB is therefore a hybrid of the parallel DBMS and
Hadoop approaches to data analysis, achieving the performance
and efficiency of parallel databases, yet still yielding the scalability,
fault tolerance, and flexibility of MapReduce-based systems.
The ability of HadoopDB to directly incorporate Hadoop and
open source DBMS software (without code modification) makes
HadoopDB particularly flexible and extensible for performing data
analysis at the large scales expected of future workloads.
9. ACKNOWLEDGMENTS
We’d like to thank Sergey Melnik and the three anonymous reviewers
for their extremely insightful feedback on an earlier version
of this paper, which we incorporated into the final version. We’d
also like to thank Eric McCall for helping us get Vertica running
on EC2. This work was sponsored by the NSF under grants IIS-
0845643 and IIS-08444809.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

is now, more than ever, a requirement to perform data analysisinside of the DBMS, rather than pushing data to external systemsfor analysis. Disk-heavy architectures such as the eBay 96-nodeDBMS do not have the necessary CPU horsepower for analyticalworkloads [4].Hence, awaiting us in the future are heavy-weight analyticdatabase jobs, requiring more time and more nodes. The probabilityof failure in these next generation applications will be far largerthan it is today, and restarting entire jobs upon a failure will beunacceptable (failures might be common enough that long-runningjobs never finish!) Thus, although Hadoop and HadoopDB pay aperformance penalty for runtime scheduling, block-level restart,and frequent checkpointing, such an overhead to achieve robustfault tolerance will become necessary in the future. One featureof HadoopDB is that it can elegantly transition between both endsof the spectrum. Since one chunk is the basic unit of work, it canplay in the high-performance/low-fault-tolerance space of today’sworkloads (like Vertica) by setting a chunk size to be infinite, or inhigh fault tolerance by using more granular chunks (like Hadoop).In future work, we plan to explore the fault-tolerance/performancetradeoff in more detail.8. CONCLUSIONOur experiments show that HadoopDB is able to approach theperformance of parallel database systems while achieving similarscores on fault tolerance, an ability to operate in heterogeneous environments,and software license cost as Hadoop. Although the
performance of HadoopDB does not in general match the performance
of parallel database systems, much of this was due to the
fact that PostgreSQL is not a column-store and we did not use data
compression in PostgreSQL. Moreover, Hadoop and Hive are relatively
young open-source projects. We expect future releases to
enhance performance. As a result, HadoopDB will automatically
benefit from these improvements.
HadoopDB is therefore a hybrid of the parallel DBMS and
Hadoop approaches to data analysis, achieving the performance
and efficiency of parallel databases, yet still yielding the scalability,
fault tolerance, and flexibility of MapReduce-based systems.
The ability of HadoopDB to directly incorporate Hadoop and
open source DBMS software (without code modification) makes
HadoopDB particularly flexible and extensible for performing data
analysis at the large scales expected of future workloads.
9. ACKNOWLEDGMENTS
We’d like to thank Sergey Melnik and the three anonymous reviewers
for their extremely insightful feedback on an earlier version
of this paper, which we incorporated into the final version. We’d
also like to thank Eric McCall for helping us get Vertica running
on EC2. This work was sponsored by the NSF under grants IIS-
0845643 and IIS-08444809.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

คือตอนนี้มากขึ้นกว่าเดิม,
ความต้องการที่จะดำเนินการวิเคราะห์ข้อมูลภายในของDBMS
แทนที่จะผลักดันข้อมูลไปยังระบบภายนอกสำหรับการวิเคราะห์ สถาปัตยกรรมดิสก์หนักเช่นอีเบย์ 96 โหนด
DBMS ไม่ได้มีแรงม้า CPU
ที่จำเป็นสำหรับการวิเคราะห์ปริมาณงาน[4]. ดังนั้นรอเราในอนาคตที่มีน้ำหนักหนักวิเคราะห์งานฐานข้อมูลต้องใช้เวลามากขึ้นและมากขึ้นโหนด ความน่าจะเป็นของความล้มเหลวในการใช้งานรุ่นต่อไปเหล่านี้จะมีขนาดใหญ่กว่าที่เป็นอยู่ในวันนี้และเริ่มต้นใหม่งานทั้งบนความล้มเหลวที่จะเป็นที่ยอมรับไม่ได้(ความล้มเหลวที่อาจจะพบได้บ่อยพอที่ยาวทำงานงานไม่เสร็จ!) ดังนั้นแม้ว่า Hadoop และ HadoopDB จ่าย ลงโทษประสิทธิภาพสำหรับการจัดตารางรันไทม์เริ่มต้นระดับบล็อกและ checkpointing บ่อยเช่นค่าใช้จ่ายในการบรรลุความแข็งแกร่งทนทานต่อความผิดที่จะกลายเป็นสิ่งจำเป็นในอนาคต หนึ่งคุณลักษณะของ HadoopDB คือว่ามันสามารถอย่างหรูหราเปลี่ยนแปลงระหว่างปลายทั้งสองของสเปกตรัม ตั้งแต่หนึ่งก้อนเป็นหน่วยพื้นฐานของการทำงานก็สามารถเล่นในที่มีประสิทธิภาพสูง / พื้นที่ต่ำผิดความอดทนของวันนี้ปริมาณงาน(เช่น Vertica) โดยการตั้งค่าขนาดก้อนที่จะไม่มีที่สิ้นสุดหรือในความอดทนความผิดสูงโดยใช้ที่ละเอียดยิ่งขึ้นชิ้น (เช่น Hadoop). ในการทำงานในอนาคตเราวางแผนที่จะสำรวจความผิดความอดทน / ประสิทธิภาพถ่วงดุลอำนาจในรายละเอียดเพิ่มเติม. 8 สรุปผลการทดลองของเราแสดงให้เห็นว่า HadoopDB สามารถที่จะเข้าใกล้ประสิทธิภาพการทำงานของระบบฐานข้อมูลแบบขนานที่คล้ายกันขณะที่การบรรลุคะแนนในความอดทนความผิด, ความสามารถในการทำงานในสภาพแวดล้อมที่แตกต่างกัน, และค่าใช้จ่ายใบอนุญาตซอฟต์แวร์เป็น Hadoop แม้ว่าผลประกอบการของ HadoopDB ไม่ได้ในการแข่งขันทั่วไปประสิทธิภาพการทำงานของระบบฐานข้อมูลแบบขนานมากนี้เป็นเพราะความจริงที่ว่าPostgreSQL ไม่ได้เป็นร้านคอลัมน์และเราไม่ได้ใช้ข้อมูลบีบอัดในPostgreSQL นอกจากนี้ Hadoop และ Hive ค่อนข้างหนุ่มโครงการโอเพนซอร์ส เราคาดว่ารุ่นอนาคตที่จะเพิ่มประสิทธิภาพการทำงาน เป็นผลให้ HadoopDB โดยอัตโนมัติจะได้รับประโยชน์จากการปรับปรุงเหล่านี้. HadoopDB จึงเป็นไฮบริดของ DBMS ขนานและวิธีHadoop เพื่อการวิเคราะห์ข้อมูลการบรรลุผลการดำเนินงานและประสิทธิภาพของฐานข้อมูลแบบขนานยังคงยอมยืดหยุ่น, ความอดทนความผิดและความยืดหยุ่นของ MapReduce ระบบชั่น. ความสามารถของ HadoopDB โดยตรงรวม Hadoop และโอเพนซอร์สซอฟต์แวร์DBMS (โดยไม่มีการดัดแปลงรหัส) ทำให้HadoopDB โดยเฉพาะอย่างยิ่งความยืดหยุ่นและขยายสำหรับการดำเนินการข้อมูลการวิเคราะห์ที่เครื่องชั่งขนาดใหญ่ที่คาดหวังของปริมาณงานในอนาคต. 9 กิตติกรรมประกาศเราอยากจะขอบคุณ Sergey Melnik และสามแสดงความคิดเห็นที่ไม่ระบุชื่อสำหรับความคิดเห็นที่ชาญฉลาดมากของพวกเขาในรุ่นก่อนหน้าของบทความนี้ซึ่งเรารวมอยู่ในรุ่นสุดท้าย เราต้องการยังต้องการที่จะขอบคุณเอริคคอลที่ช่วยเราได้รับ Vertica ทำงานบนEC2 งานนี้ได้รับการสนับสนุนจาก NSF ภายใต้ทุน IIS- 0845643 และ IIS-08444809

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

คือตอนนี้มากขึ้นกว่าที่เคยต้องการที่จะดำเนินการวิเคราะห์ข้อมูล
ภายในของระบบจัดการฐานข้อมูล แทนที่จะผลักดันข้อมูลไปยังระบบภายนอก
สำหรับการวิเคราะห์ ดิสก์หนักสถาปัตยกรรมเช่น eBay 96 โหนด
DBMS ไม่ได้มีแรงม้า CPU จำเป็นสำหรับงานวิเคราะห์
[ 4 ] .
ดังนั้น รอเราในอนาคตจะหนักวิเคราะห์
ฐานข้อมูลงานที่ต้องใช้เวลาและโหนดขึ้น ความน่าจะเป็น
ความล้มเหลวในรุ่นต่อไปโปรแกรมจะไกลขนาดใหญ่
กว่าก็คือวันนี้และเริ่มงานทั้งหมดเมื่อล้มเหลวจะ
รับไม่ได้ ( ความล้มเหลวอาจจะธรรมดามาก ที่ปัญหา
งานไม่เสร็จ ! ดังนั้น , แม้ว่าและ Hadoop hadoopdb จ่ายค่าปรับสำหรับการรันไทม์
ผลงานระดับบล็อกและเริ่มต้นใหม่
checkpointing บ่อย , เช่นค่าใช้จ่ายเพื่อให้บรรลุประสิทธิภาพ
ความอดทนความผิดจะเป็นสิ่งจำเป็นในอนาคต
คุณลักษณะหนึ่งของ hadoopdb ก็คือว่ามันสามารถที่หรูหราเปลี่ยนแปลงระหว่างปลายทั้งสอง
ของสเปกตรัม ตั้งแต่หนึ่งชิ้นเป็นหน่วยพื้นฐานของงาน มันสามารถ
เล่นในประสิทธิภาพสูง / ต่ำ ทนทานต่อความผิดพลาดในพื้นที่ของงานวันนี้
( เป็นฐาน ) โดยการตั้งค่าขนาดก้อนเป็นอนันต์ หรือ
ความอดทนความผิดสูงโดยใช้ชิ้นละเอียดมากขึ้น ( เช่น Hadoop ) .
ในการทำงานในอนาคตเราวางแผนที่จะสำรวจความผิดความอดทน / การแสดง
ข้อเสียในรายละเอียดเพิ่มเติม .
8 สรุปผลการทดลอง พบว่า hadoopdb

สามารถวิธีการประสิทธิภาพของระบบฐานข้อมูลแบบขนานในขณะที่การบรรลุคะแนนที่คล้ายกัน
ความอดทนความผิดมีความสามารถในการใช้งานในสภาพแวดล้อมที่แตกต่างกัน
,ใบอนุญาตซอฟต์แวร์และค่าใช้จ่ายเป็น Hadoop . แม้ว่าประสิทธิภาพของ hadoopdb ไม่ได้

ในราคาทั่วไปประสิทธิภาพของระบบฐานข้อมูลแบบขนาน มาก นี้คือเนื่องจากข้อเท็จจริงที่ว่า PostgreSQL
ไม่ใช่ร้านเสา และเราไม่ได้ใช้การบีบอัดข้อมูล
ใน PostgreSQL นอกจากนี้ Hadoop และกลุ่มค่อนข้าง
โครงการโอเพนซอร์สยัง เราคาดว่าอนาคต

เพิ่มประสิทธิภาพ ผลhadoopdb โดยอัตโนมัติจะได้รับประโยชน์จากการปรับปรุงเหล่านี้
.
hadoopdb จึงเป็นลูกผสมของ DBMS แบบขนานและ
Hadoop แนวทางการวิเคราะห์ข้อมูล การบรรลุประสิทธิภาพ
และประสิทธิภาพของฐานข้อมูลแบบขนาน แต่ยังให้ผลผลิตและ scalability ,
เทพบดี และความยืดหยุ่นของระบบที่ใช้ mapreduce .
ความสามารถโดยรวม hadoopdb Hadoop และ
DBMS เป็นซอฟต์แวร์รหัสเปิด ( โดยไม่ต้องดัดแปลงรหัส ) ทำให้
hadoopdb โดยเฉพาะอย่างยิ่งมีความยืดหยุ่นและขยายเพื่อวิเคราะห์
ที่ขนาดใหญ่ระดับที่คาดหวังของลูกค้าในอนาคต .
9 ขอบคุณ
เราขอขอบคุณ Sergey เมลนิคและสามนิรนามแสดงความคิดเห็น
ความคิดเห็นมากที่ลึกซึ้งของพวกเขาในรุ่นก่อนหน้านี้
กระดาษนี้ ซึ่งเราได้รวมอยู่ในรุ่นสุดท้ายเราต้องการ
ยังต้องการที่จะขอบคุณอีริค แม็คคอล ที่ช่วยเราได้ฐานวิ่ง
บน EC2 . งานนี้สนับสนุนโดย NSF ภายใต้ทุน IIS -
0845643 และ iis-08444809 .

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.