Optimizing CompilersIn the realm of

Optimizing Compilers
In the realm of highly parallel code, and even to some extent for moderately parallel code,
advanced compilers have created more instruction-level parallelism. But for slightly parallel
code, most advances in optimizing compilers [1] have actually reduced the amount of
instruction-level parallelism. For example, common subexpression elimination, code motion of
loop invariants, induction variable elimination, and elimination of redundant loads and stores all
reduce redundant computation. And computing something redundantly (e.g., twice in a basic
block) clearly provides an increase in instruction-level parallelism! In our experience for slightly
parallel code, only tree height reduction and reduction in strength provide added instruction-level
parallelism.
Of course many effective compilation techniques have been developed for highly parallel code.
Vectorizing compilers, parallelizing compilers, trace scheduling, loop unrolling, and loop
jamming can all increase the accessible parallelism within code. Finally, for moderately parallel
code, techniques like loop unrolling and trace scheduling can speed up non-vectorizable
applications when running on superscalar or superpipelined machines.
3.3.2. Effects of Longer Base Latencies
So far our assumption has been that the latency of all operations, or at least the simple
operations, is one base machine cycle. As we discussed previously, no known machines have
this characteristic. For example, few machines have one cycle loads without a possible data
interlock either before or after the load. Similarly, few machines can execute floating-point
operations in one cycle. What are the effects of longer latencies? Consider a machine where
ALU operations are one cycle, but loads, stores, and branches are two cycles, and floating-point
operations are three cycles. Then the base machine is actually like a slightly superpipelined
machine. If we multiply the execution frequency of each instruction by its latency, we get the
average degree of superpipelining.
instruction frequency latency contribution
ALU/shift 40% 1 0.4
load/store 35% 2 0.7
branch 15% 2 0.3
FP 10% 3 0.3
total 1.7
Thus, our machine is closer to a superpipelined machine of degree two than it is to our ideal base
machine. To the extent that some operation latencies are greater than one base machine cycle,
the remaining amount of exploitable instruction-level parallelism will be reduced. In this
example, assuming the average degree of instruction-level parallelism in slightly parallel code is
around two, this machine should not stall often because of data-dependency interlocks.

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

คอมไพเลอร์ที่มีประสิทธิภาพในขอบเขตสูงขนานรหัส และแม้ ในบางกรณีสำหรับปานกลางขนานรหัสคอมไพเลอร์ขั้นสูงได้สร้าง parallelism ระดับคำแนะนำเพิ่มเติม แต่ขนานเล็กน้อยรหัส ความก้าวหน้ามากที่สุดในการเพิ่มประสิทธิภาพคอมไพเลอร์ [1] มีจริงลดจำนวนparallelism ระดับคำแนะนำ ตัวอย่าง ทั่วไปนิพจน์ย่อยตัด รหัสการเคลื่อนไหวของinvariants วน เหนี่ยวนำตัวแปรตัดออก ตัดออกของซ้ำซ้อนโหลด และเก็บทั้งหมดลดคำนวณซ้ำซ้อน และคำนวณสิ่งที่ redundantly (เช่น สองในพื้นฐานบล็อก) ให้เพิ่มใน parallelism ระดับคำแนะนำชัดเจน ในประสบการณ์ของเราในเล็กน้อยระดับคำแนะนำเพิ่มให้ขนานรหัส ลดความสูงของต้นไม้และลดความแข็งแรงเท่านั้นparallelism การหลักสูตรเทคนิคคอมไพล์ที่มีประสิทธิภาพมากได้รับการพัฒนาสำหรับรหัสสูงขนานคอมไพเลอร์ vectorizing คอมไพเลอร์ วางแผน ติดตาม parallelizing unrolling วนรอบ และวนรอบjamming สามารถทั้งเพิ่ม parallelism เข้าภายในรหัส ในที่สุด สำหรับปานกลางขนานรหัส เทคนิค unrolling วนและติดตามแผนสามารถเร่งความเร็วไม่ใช่ vectorizableใช้งานเมื่อรันในเครื่อง superscalar หรือ superpipelined3.3.2. ผลของเวลาแฝงพื้นฐานอีกต่อไปจนมีอัสสัมชัญของเราได้ที่แฝงการดำเนิน หรือเรียบง่ายอย่างน้อยการดำเนินงาน เป็นวงจรพื้นฐานเครื่องหนึ่ง เรามีเครื่องจักรที่กล่าวถึงก่อนหน้านี้ ไม่รู้จักthis characteristic. For example, few machines have one cycle loads without a possible datainterlock either before or after the load. Similarly, few machines can execute floating-pointoperations in one cycle. What are the effects of longer latencies? Consider a machine whereALU operations are one cycle, but loads, stores, and branches are two cycles, and floating-pointoperations are three cycles. Then the base machine is actually like a slightly superpipelinedmachine. If we multiply the execution frequency of each instruction by its latency, we get theaverage degree of superpipelining.instruction frequency latency contributionALU/shift 40% 1 0.4load/store 35% 2 0.7branch 15% 2 0.3FP 10% 3 0.3total 1.7Thus, our machine is closer to a superpipelined machine of degree two than it is to our ideal basemachine. To the extent that some operation latencies are greater than one base machine cycle,the remaining amount of exploitable instruction-level parallelism will be reduced. In thisexample, assuming the average degree of instruction-level parallelism in slightly parallel code isaround two, this machine should not stall often because of data-dependency interlocks.

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

การเพิ่มประสิทธิภาพคอมไพเลอร์
ใน realm ของสูงขนานรหัสและแม้กระทั่งบางส่วนสำหรับปานกลางขนานรหัส
คอมไพเลอร์ขั้นสูงออกแบบเพิ่มเติมการสอนระดับขนาน . แต่สำหรับเล็กน้อยขนาน
รหัสก้าวหน้ามากที่สุดในการเพิ่มประสิทธิภาพคอมไพเลอร์ [ 1 ] ได้จริงลดปริมาณของ
ขนานระดับการเรียนการสอน ตัวอย่างเช่น การ subexpression ทั่วไป รหัสเคลื่อนไหวของ
ห่วงผลยืนยงขจัดตัวแปร , การขจัดความซ้ำซ้อนและโหลดและร้านค้าทั้งหมด
ลดการคำนวณ ) และการคำนวณบางอย่างโดยไม่จำเป็น ( เช่น สองครั้งในบล็อกขั้นพื้นฐาน
) อย่างชัดเจนให้มีการเพิ่มระดับการขนาน ! ในประสบการณ์ของเราเล็กน้อย
รหัสขนาน , ต้นไม้ ความสูงลดลงและลดความแข็งแรงให้เพิ่ม
ระดับการเรียนการสอนขนาน .
แน่นอนรวบรวมเทคนิคมากมายที่มีประสิทธิภาพได้รับการพัฒนาสูงขนานรหัส .
vectorizing ตัวแปลโปรแกรมคอมไพเลอร์ parallelizing , ติดตาม , การตั้งเวลา , ห่วงคลี่ และห่วง
ติดขัดสามารถเพิ่มความสามารถเข้าถึงได้ภายในรหัส สุดท้าย ขนาน
ปานกลางรหัส เทคนิค ชอบห่วงคลี่และติดตามตารางสามารถเพิ่มความเร็วไม่ vectorizable
การใช้งานเมื่อรันบนเครื่องซูเปอร์สเกลาร์หรือ superpipelined .
3.3.2 . ผลกระทบของศักยภาพฐานอีกต่อไป
ห่างไกลสมมติฐานของเราได้ว่า ศักยภาพของการดำเนินงานทั้งหมด หรืออย่างน้อยก็ง่าย
การดําเนินงาน คือหนึ่งรอบเครื่องเบส อย่างที่เราคุยกันไว้ก่อนหน้านี้ ไม่มีเครื่อง มี
ลักษณะนี้ ตัวอย่างเช่น เครื่องน้อยหนึ่งรอบโหลดโดยไม่
ข้อมูลเป็นไปได้ส่วนก่อนหรือหลังโหลด ในทำนองเดียวกันเครื่องไม่กี่สามารถดําเนินงานจุด -
ในหนึ่งรอบ อะไรคือผลกระทบของการเกิดอีกต่อไป ? พิจารณาเครื่องที่
ปฏิบัติการ ALU เป็นหนึ่งรอบ แต่โหลด ร้านค้า และสาขาสองรอบ และการดำเนินงานจุด -
3 รอบ แล้วฐานเครื่องจริงเหมือนเล็กน้อย superpipelined
เครื่องถ้าเราคูณความถี่ของแต่ละคำสั่งประหาร โดยแอบแฝง เรารับปริญญาโดย superpipelining
.

เปลี่ยนความถี่แฝงการบริจาค Alu / 40 % 1 0
โหลด / ร้าน 35% 2 สาขา 15 % 2 0.3 0.7

FP 10 % 3 0.3
รวม 1.7
ดังนั้น เครื่องของเราเป็น ใกล้ชิดกับ superpipelined เครื่องระดับสอง แทนที่จะเป็นเครื่องฐาน
ในอุดมคติของเราจนบางคนเกิดการดำเนินงานมากกว่าหนึ่งเครื่องฐานวงจร
ที่เหลือขนานระดับการเรียนการสอน exploitable จะลดลง ในตัวอย่างนี้
สมมติว่าระดับเฉลี่ยของการสอนระดับขนานในเล็กน้อยขนานรหัส
รอบสอง เครื่องนี้ไม่ควรถ่วงเวลาบ่อยเพราะข้อมูลการพึ่งพา interlocks .

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.