GPUs are throughput-oriented proces

GPUs are throughput-oriented processors that depend on massive
multithreading to tolerate long latency memory accesses. The
latest GPUs all are equipped with on-chip data caches to reduce
the latency of memory accesses and save the bandwidth of NOC
and off-chip memory modules. But these tiny data caches are vulnerable
to thrashing from massive multithreading, especially when
divergent load instructions generate long bursts of cache accesses.
Meanwhile, the blocks of divergent loads exhibit high intra-warp
locality and are expected to be atomically cached so that the issuing
warp can fully hit in L1D in the next load issuance. However, GPU
caches are not designed with enough awareness of either SIMD execution
model or memory divergence.
In this work, we renovate the cache management policies to design
a GPU-specific data cache, DaCache. This design starts with
the observation that warp scheduling can essentially shape the locality
pattern in cache access streams. Thus we incorporate the
warp scheduling logic into insertion policy so that blocks are inserted
into the LRU-chain according to their issuing warp’s scheduling
priority. Then we deliberately prioritize coherent loads over divergent
loads. In order to enable the thrashing resistance, the cache
ways are partitioned by desired warp concurrency into two regions,
the locality region and the thrashing region, so that replacement is
constrained within the thrashing region. When no replacement candidate
is available in the thrashing region, incoming requests are
bypassed. We also implement a dynamic partition scheme based
on the caching effectiveness that is sampled at runtime. Experiments
show that DaCache achieves 40.4% performance improvement
over the baseline GPU and outperform two state-of-the-art
thrashing resistant cache management techniques

0/5000

จาก: -

เป็น: -

ผลลัพธ์ (ไทย) 1: [สำเนา]

คัดลอก!

GPUs are throughput-oriented processors that depend on massivemultithreading to tolerate long latency memory accesses. Thelatest GPUs all are equipped with on-chip data caches to reducethe latency of memory accesses and save the bandwidth of NOCand off-chip memory modules. But these tiny data caches are vulnerableto thrashing from massive multithreading, especially whendivergent load instructions generate long bursts of cache accesses.Meanwhile, the blocks of divergent loads exhibit high intra-warplocality and are expected to be atomically cached so that the issuingwarp can fully hit in L1D in the next load issuance. However, GPUcaches are not designed with enough awareness of either SIMD executionmodel or memory divergence.In this work, we renovate the cache management policies to designa GPU-specific data cache, DaCache. This design starts withthe observation that warp scheduling can essentially shape the localitypattern in cache access streams. Thus we incorporate thewarp scheduling logic into insertion policy so that blocks are insertedinto the LRU-chain according to their issuing warp’s schedulingpriority. Then we deliberately prioritize coherent loads over divergentloads. In order to enable the thrashing resistance, the cacheways are partitioned by desired warp concurrency into two regions,the locality region and the thrashing region, so that replacement isconstrained within the thrashing region. When no replacement candidateมีในภาค thrashing คำขอขาเข้าข้าม เรายังสามารถใช้แบบไดนามิกพาร์ทิชันที่ใช้เกี่ยวกับประสิทธิผลแคที่ความที่รันไทม์ ทดลองแสดงว่า DaCache ได้รับการปรับปรุงประสิทธิภาพ 40.4%ผ่านพื้นฐาน GPU และมีประสิทธิภาพสูงกว่าสองรัฐ-of-the-artเทคนิคจัดการแค thrashing ทน

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 2:[สำเนา]

คัดลอก!

GPU เป็นตัวประมวลผลที่มุ่งเน้นการส่งผ่านข้อมูลที่ขึ้นอยู่กับขนาดใหญ่
multithreading หน่วยความจำที่จะทนต่อความล่าช้านานเข้าถึง
GPUs
ล่าสุดทั้งหมดมีการติดตั้งแคชข้อมูลบนชิปเพื่อลดความล่าช้าของการเข้าถึงหน่วยความจำและบันทึกแบนด์วิดธ์ของNOC
และโมดูลหน่วยความจำออกชิป แต่แคชข้อมูลเล็ก ๆ
เหล่านี้มีความเสี่ยงที่จะหวดจากmultithreading
ขนาดใหญ่โดยเฉพาะอย่างยิ่งเมื่อมีคำแนะนำในการโหลดที่แตกต่างสร้างระเบิดที่ยาวนานของการเข้าถึงแคช.
ในขณะที่กลุ่มของโหลดที่แตกต่างกันแสดงภายในวิปริตสูงท้องที่และคาดว่าจะถูกเก็บไว้อะตอมเพื่อให้การออกวิปริตสามารถตีอย่างเต็มที่ใน L1D ในการออกโหลดต่อไป อย่างไรก็ตาม GPU แคชไม่ได้ถูกออกแบบด้วยความตระหนักมากพอที่จะดำเนินการอย่างใดอย่างหนึ่ง SIMD รูปแบบหรือความแตกต่างของหน่วยความจำ. ในงานนี้เราปรับปรุงนโยบายการจัดการแคชในการออกแบบแคชข้อมูล GPU เฉพาะ DaCache การออกแบบนี้จะเริ่มต้นด้วยการสังเกตว่าการจัดตารางวาร์ปเป็นหลักสามารถรูปร่างท้องถิ่นรูปแบบในการเข้าถึงกระแสแคช ดังนั้นเราจึงรวมการตั้งเวลาวิปริตตรรกะแทรกลงไปในนโยบายเพื่อให้บล็อกจะถูกแทรกเข้าไปในอาร์โซ่ตามการตั้งเวลาวิปริตออกของพวกเขามีความสำคัญ จากนั้นเราก็จงใจจัดลำดับความสำคัญโหลดกันในช่วงที่แตกต่างกันโหลด เพื่อที่จะช่วยต้านทานการนวดที่แคชวิธีที่มีการแบ่งพาร์ติชันโดยเห็นพ้องด้วยวิปริตที่ต้องการเป็นสองภูมิภาคภูมิภาคท้องถิ่นและภูมิภาคนวดเพื่อให้เปลี่ยนเป็นข้อจำกัด ในภูมิภาคหวด เมื่อผู้สมัครทดแทนไม่สามารถใช้ได้ในภูมิภาคหวดที่ร้องขอเข้ามาจะข้าม นอกจากนี้เรายังใช้รูปแบบพาร์ติชันแบบไดนามิกที่ใช้เกี่ยวกับประสิทธิภาพการแคชที่เป็นตัวอย่างที่รันไทม์ การทดลองแสดงให้เห็นว่า DaCache ประสบความสำเร็จในการปรับปรุงประสิทธิภาพ 40.4% มากกว่า GPU พื้นฐานและมีประสิทธิภาพสูงกว่าสองรัฐของศิลปะการนวดทนเทคนิคการจัดการแคช

การแปล กรุณารอสักครู่..

ผลลัพธ์ (ไทย) 3:[สำเนา]

คัดลอก!

GPUs สามารถมุ่งเน้น โปรเซสเซอร์ที่ขึ้นอยู่กับขนาดใหญ่
ตอนหลังทนความแฝงนานที .
ล่าสุด GPUs ทั้งหมดมีการติดตั้งบนแคชข้อมูลเพื่อลด
( หน่วยความจำเข้าถึง และประหยัดแบนด์วิดธ์ของน็อค
และปิดโมดูลหน่วยความจำชิป แต่เหล่านี้เล็กข้อมูลแคชมีความเสี่ยงที่จะโบยจากใหญ่

multithreading , โดยเฉพาะอย่างยิ่งเมื่อคำสั่งสร้างระเบิดของแคชโหลดอย่างนานที .
ส่วนบล็อกของโรงเรียนในท้องถิ่นวาป
ภายในจัดแสดงโหลดสูง และคาดว่าจะ atomically แคช ดังนั้นการออก
บิดอย่างเต็มที่สามารถตีใน l1d ในการออกโหลดต่อไป อย่างไรก็ตาม , GPU
แคชไม่ได้ถูกออกแบบด้วยความตระหนักเพียงพอของ simd ประหาร
นางแบบหรือ divergence หน่วยความจำ .
ในงานนี้เราปรับปรุงแคชของนโยบายการจัดการการออกแบบ
GPU โดยเฉพาะข้อมูลแคช , dacache . การออกแบบนี้จะเริ่มต้นด้วยการสังเกตที่วิปริตตารางสามารถ

เป็นหลักรูปร่างส่วนรูปแบบในแคชเข้าถึงกระแสข้อมูล เราจึงรวม
วิปริตตารางตรรกะในนโยบายแทรกเพื่อแทรกลงในบล็อก
เลยโซ่ตามออกวาร์ปของตาราง
สําคัญเราก็ตั้งใจจัดติดต่อกันมาโหลดไม่โหลด

เพื่อช่วยเจ้าของต้านทาน , แคช
วิธีถูกแบ่งโดยต้องการบิดพร้อมกันเป็นสองภูมิภาค ภูมิภาค และท้องถิ่น
เจ้าของพื้นที่ เพื่อให้เปลี่ยนเป็น จำกัด เจ้าของ
ภายในภูมิภาค เมื่อไม่มีผู้สมัครแทน
พร้อมโบยการร้องขอเข้ามา
ภูมิภาคข้าม . เรายังดำเนินการตามแผนแบบไดนามิกบน
พาร์ทิชันแคชประสิทธิภาพนั่นคือตัวอย่างที่รันไทม์ การทดลองแสดงให้เห็นว่า dacache บรรลุ 40.4 %

การปรับปรุงประสิทธิภาพมากกว่า 500 GPU และแสดงสองรัฐ - of - the - art
นวดเทคนิคการจัดการแคชป้องกัน

การแปล กรุณารอสักครู่..

ภาษาอื่น ๆ

การสนับสนุนเครื่องมือแปลภาษา: กรีก, กันนาดา, กาลิเชียน, คลิงออน, คอร์สิกา, คาซัค, คาตาลัน, คินยารวันดา, คีร์กิซ, คุชราต, จอร์เจีย, จีน, จีนดั้งเดิม, ชวา, ชิเชวา, ซามัว, ซีบัวโน, ซุนดา, ซูลู, ญี่ปุ่น, ดัตช์, ตรวจหาภาษา, ตุรกี, ทมิฬ, ทาจิก, ทาทาร์, นอร์เวย์, บอสเนีย, บัลแกเรีย, บาสก์, ปัญจาป, ฝรั่งเศส, พาชตู, ฟริเชียน, ฟินแลนด์, ฟิลิปปินส์, ภาษาอินโดนีเซี, มองโกเลีย, มัลทีส, มาซีโดเนีย, มาราฐี, มาลากาซี, มาลายาลัม, มาเลย์, ม้ง, ยิดดิช, ยูเครน, รัสเซีย, ละติน, ลักเซมเบิร์ก, ลัตเวีย, ลาว, ลิทัวเนีย, สวาฮิลี, สวีเดน, สิงหล, สินธี, สเปน, สโลวัก, สโลวีเนีย, อังกฤษ, อัมฮาริก, อาร์เซอร์ไบจัน, อาร์เมเนีย, อาหรับ, อิกโบ, อิตาลี, อุยกูร์, อุสเบกิสถาน, อูรดู, ฮังการี, ฮัวซา, ฮาวาย, ฮินดี, ฮีบรู, เกลิกสกอต, เกาหลี, เขมร, เคิร์ด, เช็ก, เซอร์เบียน, เซโซโท, เดนมาร์ก, เตลูกู, เติร์กเมน, เนปาล, เบงกอล, เบลารุส, เปอร์เซีย, เมารี, เมียนมา (พม่า), เยอรมัน, เวลส์, เวียดนาม, เอสเปอแรนโต, เอสโทเนีย, เฮติครีโอล, แอฟริกา, แอลเบเนีย, โคซา, โครเอเชีย, โชนา, โซมาลี, โปรตุเกส, โปแลนด์, โยรูบา, โรมาเนีย, โอเดีย (โอริยา), ไทย, ไอซ์แลนด์, ไอร์แลนด์, การแปลภาษา.