query. Although this build accepts

query. Although this build accepts a SQL query that joins, filters
and aggregates tuples from two tables, such a query fails during
execution. Additionally, we noticed that the query plan for joins of
this type uses a highly inefficient execution strategy. In particular,
the filtering operation is planned after joining the tables. Hence,
we are only able to present hand-coded results for HadoopDB and
Hadoop for this query.
In HadoopDB, we push the selection, join, and partial aggregation
into the PostgreSQL instances with the following SQL:
SELECT sourceIP, COUNT(pageRank), SUM(pageRank),
SUM(adRevenue) FROM Rankings AS R, UserVisits AS UV
WHERE R.pageURL = UV.destURL AND
UV.visitDate BETWEEN ‘2000-01-15’ AND ‘2000-01-22’
GROUP BY UV.sourceIP;
We then use a single Reduce task in Hadoop that gathers all of
the partial aggregates from each PostgreSQL instance to perform
the final aggregation.
The parallel databases execute the SQL query specified in [23].
Although Hadoop has support for a join operator, this operator
requires that both input datasets be sorted on the join key. Such
a requirement limits the utility of the join operator since in many
cases, including the query above, the data is not already sorted and
performing a sort before the join adds significant overhead. We
found that even if we sorted the input data (and did not include the
sort time in total query time), query performance using the Hadoop
join was lower than query performance using the three phase MR
program used in [23] that used standard ‘Map’ and ‘Reduce’ operators.
Hence, for the numbers we report below, we use an identical
MR program as was used (and described in detail) in [23].
Fig. 9 summarizes the results of this benchmark task. For
Hadoop, we observed similar results as found in [23]: its performance
is limited by completely scanning the UserVisits dataset on
each node in order to evaluate the selection predicate.
HadoopDB, DBMS-X, and Vertica all achieve higher performance
by using an index to accelerate the selection predicate
and having native support for joins. These systems see slight
performance degradation with a larger number of nodes due to the
final single node aggregation of and sorting by adRevenue.
6.2.6 UDF Aggregation Task
The final task computes, for each document, the number of inward
links from other documents in the Documents table. URL
links that appear in every document are extracted and aggregated.
HTML documents are concatenated into large files for Hadoop
(256MB each) and Vertica (56MB each) at load time. HadoopDB
was able to store each document separately in the Documents table
using the TEXT data type. DBMS-X processed each HTML
document file separately, as described below.
The parallel databases should theoretically be able to use a userdefined
function, F, to parse the contents of each document and

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

query. Although this build accepts a SQL query that joins, filtersand aggregates tuples from two tables, such a query fails duringexecution. Additionally, we noticed that the query plan for joins ofthis type uses a highly inefficient execution strategy. In particular,the filtering operation is planned after joining the tables. Hence,we are only able to present hand-coded results for HadoopDB andHadoop for this query.In HadoopDB, we push the selection, join, and partial aggregationinto the PostgreSQL instances with the following SQL:SELECT sourceIP, COUNT(pageRank), SUM(pageRank),SUM(adRevenue) FROM Rankings AS R, UserVisits AS UVWHERE R.pageURL = UV.destURL ANDUV.visitDate BETWEEN ‘2000-01-15’ AND ‘2000-01-22’GROUP BY UV.sourceIP;We then use a single Reduce task in Hadoop that gathers all ofthe partial aggregates from each PostgreSQL instance to performthe final aggregation.The parallel databases execute the SQL query specified in [23].Although Hadoop has support for a join operator, this operatorrequires that both input datasets be sorted on the join key. Sucha requirement limits the utility of the join operator since in manycases, including the query above, the data is not already sorted andperforming a sort before the join adds significant overhead. Wefound that even if we sorted the input data (and did not include thesort time in total query time), query performance using the Hadoopjoin was lower than query performance using the three phase MRprogram used in [23] that used standard ‘Map’ and ‘Reduce’ operators.Hence, for the numbers we report below, we use an identicalMR program as was used (and described in detail) in [23].Fig. 9 summarizes the results of this benchmark task. ForHadoop, we observed similar results as found in [23]: its performanceis limited by completely scanning the UserVisits dataset oneach node in order to evaluate the selection predicate.HadoopDB, DBMS-X, and Vertica all achieve higher performanceby using an index to accelerate the selection predicateand having native support for joins. These systems see slightperformance degradation with a larger number of nodes due to thefinal single node aggregation of and sorting by adRevenue.6.2.6 UDF Aggregation TaskThe final task computes, for each document, the number of inwardlinks from other documents in the Documents table. URLlinks that appear in every document are extracted and aggregated.HTML documents are concatenated into large files for Hadoop(256MB each) and Vertica (56MB each) at load time. HadoopDBwas able to store each document separately in the Documents tableusing the TEXT data type. DBMS-X processed each HTMLdocument file separately, as described below.The parallel databases should theoretically be able to use a userdefinedfunction, F, to parse the contents of each document and

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การสอบถาม แม้ว่าจะสร้างนี้ยอมรับแบบสอบถาม SQL
ที่รวมตัวกรองและมวลtuples
จากสองตารางแบบสอบถามดังกล่าวล้มเหลวในระหว่างการดำเนินการ
นอกจากนี้เราสังเกตเห็นว่าแผนแบบสอบถามสำหรับการเข้าร่วมของประเภทนี้จะใช้กลยุทธ์การดำเนินการที่ไม่มีประสิทธิภาพสูง
โดยเฉพาะอย่างยิ่งการดำเนินการกรองมีการวางแผนหลังจากที่เข้าร่วมโต๊ะ ดังนั้นเราเป็นเพียงสามารถที่จะนำเสนอผลมือรหัสสำหรับ HadoopDB และ Hadoop สำหรับแบบสอบถามนี้. ใน HadoopDB เราผลักดันการเลือกเข้าร่วมและการรวมบางส่วนลงไปในกรณีPostgreSQL กับ SQL ต่อไปนี้: SELECT sourceIP, COUNT (PageRank) SUM (PageRank) SUM (adRevenue) จากการจัดอันดับเป็น R, UserVisits AS ยูวีWHERE R.pageURL = UV.destURL และUV.visitDate ระหว่าง '2000/01/15' และ '2000/01/22' GROUP BY UV.sourceIP ; จากนั้นเราจะใช้งานลดเดียวใน Hadoop ที่รวบรวมทั้งหมดของมวลบางส่วนออกจากกันเช่นPostgreSQL เพื่อดำเนินการรวมสุดท้าย. ฐานข้อมูลแบบขนานดำเนินการแบบสอบถาม SQL ที่ระบุไว้ใน [23]. แม้ว่า Hadoop มีการสนับสนุนผู้ประกอบการเข้าร่วมนี้ ผู้ประกอบการต้องการให้ทั้งสองชุดข้อมูลป้อนข้อมูลถูกจัดเรียงในคีย์เข้าร่วม ดังกล่าวจำเป็นต้อง จำกัด ยูทิลิตี้ของผู้ประกอบการเข้าร่วมตั้งแต่ในหลายกรณีรวมทั้งแบบสอบถามดังกล่าวข้างต้นข้อมูลที่ไม่เรียงแล้วและดำเนินการเรียงลำดับก่อนที่จะเข้าร่วมเพิ่มค่าใช้จ่ายอย่างมีนัยสำคัญ เราพบว่าแม้ว่าเราจะเรียงข้อมูลเข้า(และไม่ได้รวมเวลาการจัดเรียงในเวลาแบบสอบถามทั้งหมด) ประสิทธิภาพการค้นหาโดยใช้ Hadoop เข้าร่วมต่ำกว่าประสิทธิภาพการค้นหาโดยใช้สามเฟส MR โปรแกรมที่ใช้ในการ [23] ที่ใช้มาตรฐาน ' แผนที่ 'และ' ลด 'ผู้ประกอบการ. ดังนั้นสำหรับตัวเลขที่เรารายงานดังต่อไปนี้เราจะใช้เหมือนโปรแกรม MR เป็นถูกใช้ (และอธิบายในรายละเอียด) ใน [23]. รูป 9 สรุปผลของงานมาตรฐานนี้ สำหรับHadoop เราสังเกตผลที่คล้ายกันที่พบใน [23]: ประสิทธิภาพการทำงานจะถูกจำกัด โดยสมบูรณ์สแกนชุด UserVisits บน. แต่ละโหนดเพื่อประเมินวินิจฉัยการเลือกHadoopDB, DBMS-X และ Vertica ทั้งหมดเกิดประสิทธิภาพสูงขึ้นโดยใช้ดัชนีเพื่อเร่งการวินิจฉัยการเลือกและมีการสนับสนุนพื้นเมืองสำหรับร่วม ระบบเหล่านี้ดูเล็กน้อยลดประสิทธิภาพการทำงานที่มีจำนวนมากของโหนดเนื่องจากการรวมโหนดเดียวสุดท้ายของและการเรียงลำดับโดยadRevenue. 6.2.6 UDF รวมงานงานสุดท้ายคำนวณสำหรับแต่ละเอกสารจำนวนขาเข้าการเชื่อมโยงจากเอกสารอื่นๆ ใน เอกสารตาราง URL ลิงก์ที่ปรากฏในเอกสารทุกฉบับมีการสกัดและรวม. เอกสาร HTML จะถูกตัดแบ่งเป็นไฟล์ขนาดใหญ่สำหรับ Hadoop (256MB) และ Vertica (56MB แต่ละคน) ที่เวลาในการโหลด HadoopDB ก็สามารถที่จะเก็บเอกสารแต่ละแยกในตารางเอกสารใช้ชนิดข้อมูลของข้อความ DBMS-X ประมวลผล HTML แต่ละไฟล์เอกสารแยกตามที่อธิบายไว้ด้านล่าง. ฐานข้อมูลแบบขนานในทางทฤษฎีควรจะสามารถที่จะใช้ userdefined ฟังก์ชั่น, F, ที่จะแยกเนื้อหาของแต่ละเอกสารและ

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

สอบถาม แม้ว่าการสร้างนี้ยอมรับแบบสอบถาม SQL ที่เข้าร่วม , ตัวกรองและจากมวลรวมที่มี
2 โต๊ะ เช่นแบบสอบถามล้มเหลวในระหว่าง
การดําเนินการ นอกจากนี้เราจะสังเกตเห็นว่าแผนแบบสอบถามสำหรับรวมของ
ชนิดนี้ใช้กลยุทธ์การดำเนินการอย่างไม่มีประสิทธิภาพ โดย
การกรองการวางแผนหลังจากเข้าร่วมโต๊ะ ดังนั้น
เราเป็นเพียงสามารถที่จะแสดงผลและส่งรหัส hadoopdb Hadoop สำหรับแบบสอบถามนี้
.
ใน hadoopdb เราดันเลือกเข้าร่วม และบางส่วนของ
ใน PostgreSQL อินสแตนซ์กับ SQL ต่อไปนี้ :
เลือก sourceip นับ ( PageRank ) ผลรวม ( Sum ( PageRank )
adrevenue ) จากการจัดอันดับเป็น R uservisits เป็น UV ที่ r.pageurl = uv.desturl และ

uv.visitdate ระหว่าง ' ' และ ' '
2000-01-22 2000-01-15กลุ่มยูวีด้วย sourceip ;
เราใช้ครั้งเดียวลดงานใน Hadoop ที่รวบรวมทั้งหมดของ
มวลรวมบางส่วนจากแต่ละอินสแตนซ์ของ PostgreSQL แสดง

สุดท้าย . ฐานข้อมูลขนานรันแบบสอบถาม SQL ที่ระบุใน [ 23 ] .
ถึงแม้ว่า Hadoop มีการสนับสนุนเข้าร่วมผู้ประกอบการผู้ประกอบการ
ต้องที่นี้ ทั้งใส่ข้อมูลถูกเรียงบนเข้าร่วมที่สำคัญ เช่น
ความต้องการจำกัด ประโยชน์ของผู้ประกอบการเข้าร่วมเนื่องจากในหลายกรณี
รวมทั้งคำถามข้างต้น ข้อมูลไม่ได้เรียงและ
แสดงเรียงก่อนที่จะเข้าร่วมเพิ่มค่าใช้จ่ายอย่างมีนัยสำคัญ เรา
พบว่าแม้เราเรียงลำดับข้อมูลที่ป้อนเข้า ( ไม่รวมเวลาในการจัดเรียง
เวลาทั้งหมด ) , ประสิทธิภาพแบบสอบถามโดยใช้ Hadoop
เข้าร่วมต่ำกว่าประสิทธิภาพการใช้สามขั้นตอนในการใช้โปรแกรมคุณ
[ 23 ] ที่ใช้มาตรฐาน ' แผนที่ ' และ ' ลด ' ผู้ประกอบการ .
ดังนั้นสำหรับตัวเลขที่เรารายงานด้านล่าง เราใช้เหมือนกัน
นายโปรแกรมที่ใช้ ( และอธิบายในรายละเอียด ) [ 23 ] .
รูปที่ 9 สรุป ผล นี้ มาตรฐานงาน สำหรับ
Hadoop เราสังเกตผลที่คล้ายกันที่พบใน [ 23 ] :
ประสิทธิภาพของจะถูก จำกัด โดยสมบูรณ์ การสแกนชุดข้อมูลบน uservisits
แต่ละโหนดเพื่อประเมินการภาคแสดง hadoopdb
,
dbms-x และฐานทั้งหมดให้บรรลุประสิทธิภาพสูง โดยการใช้ดัชนีเพื่อเร่งการภาคแสดง
และมีการสนับสนุนพื้นเมืองสำหรับเข้าร่วม ระบบเหล่านี้เห็นการเสื่อมประสิทธิภาพเล็กน้อย
ที่มีขนาดใหญ่จำนวนของโหนดเนื่องจาก
สุดท้ายรวมของโหนดเดียวและจัดเรียงโดย adrevenue .

6.2.6 UDF รวมงานงานสุดท้ายคำนวณสำหรับแต่ละเอกสาร จำนวนของการเชื่อมโยงขาเข้า
จากเอกสารอื่น ๆ ในเอกสารที่โต๊ะ การเชื่อมโยง URL
ที่ปรากฏในเอกสารทุกจะแยกและรวมอยู่ในเอกสาร HTML .
มาไฟล์ขนาดใหญ่สำหรับ Hadoop
( 256MB แต่ละ ) และฐาน ( แอพพลิแต่ละ ) เวลาโหลด hadoopdb
สามารถจัดเก็บเอกสารแยกเอกสารแต่ละโต๊ะ
โดยใช้ข้อมูลที่เป็นข้อความประเภท dbms-x การประมวลผลแต่ละไฟล์แยกต่างหาก
เอกสาร HTML ตามที่อธิบายไว้ด้านล่าง .
ฐานข้อมูลขนานควรในทางทฤษฎีสามารถใช้ฟังก์ชัน userdefined
, F , เพื่อแยกเนื้อหาของเอกสารแต่ละ

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.