GDDR5X vs. GDDR5
Unlike high bandwidth memory (HBM, HBM2), GDDR5X is an extension of the GDDR5 standard that video cards have used for nearly seven years. Like HBM, it should dramatically improve memory bandwidth.
GDDR5X accomplishes this in two separate ways. First, it moves from a DDR bus to a QDR bus. The diagram below shows the differences between an SDR (single data rate), DDR (double data rate) and QDR (quad data rate) bus.
SDR vs. DDR vs. QDR
Memory signaling. Image by MonitorInsider
An SDR bus transfers data only on the rising edge of the clock signal, as it transitioned from a 0 to a 1. A DDR bus transfers data both when the clock is rising and when it falls again, meaning the system can effectively transmit data twice as quickly at the same clock rate. A QDR bus transfers up to four data words per clock cycle — again, effectively doubling bandwidth (compared to DDR) without raising clock speeds.
The other change to GDDR5X is that the system now prefetches twice as much data (16n, up from 8n). This change is tied to the QDR signaling — with the higher signaling rate, it makes sense to prefetch more information at a time.
MonitorInsider has an excellent discussion of GDDR5X, where they step through the new standard and how it differs from GDDR5. One point they make is that using GDDR5X may not map well to the characteristics of Nvidia GPUs. A 16n prefetch means that each prefetch contains 64 bytes of data. Nvidia GPUs use 32-byte transactions when accessing the L2 cache, which means the DRAM is fetching twice as much information as the L2 cache line size. GDDR5X introduces a feature, Pseudo-independent Memory Accesses, which should help address this inefficiency.
GDDR5X doesn’t address the core problem of GDDR5
Now that JEDEC has officially ratified the GDDR5X standard, I think it’s much more likely that AMD and Nvidia will adopt it. Given the additional complexity of HBM/HBM2, and the intrinsic difficulty of building stacks of chips connected by tiny wires as opposed to well-understood planar technology, some have argued that GDDR5X actually be the better long-term option.
To be fair, we’ve seen exactly this argument play out multiple times before. RDRAM, Larrabee, and Intel’s Itanium 64-bit architecture were all radical departures from the status quo that were billed as the inevitable, expensive technology of the future. In every case, these cutting-edge new technologies were toppled by the continuous evolution of DDR memory, rasterization, and AMD’s x86-64 standard. Why should HBM be any different?
The answer is this: HBM and HBM2 don’t just improve memory bandwidth. Both standards cut absolute VRAM power consumption and dramatically improve performance per watt. In addition, HBM2 offers VRAM densities that GDDR5X can’t match. Samsung has 4Gb chips in production right now (16GB per GPU), and expects to be producing 8Gb cards that would allow for 32GB of RAM per GPU in just four stacks of memory. Micron’s current GDDR5 tops out at 1GB per chip. Even if we double this to 2GB for future designs, it would still take 16 chips, each with its own trace routing and surface area requirement. GDDR5X does set a lower signaling voltage of 1.35v, but it’s not a large enough improvement to offset the increased chip counts and overall power consumption.
GDDR5X vs. GDDR5Unlike high bandwidth memory (HBM, HBM2), GDDR5X is an extension of the GDDR5 standard that video cards have used for nearly seven years. Like HBM, it should dramatically improve memory bandwidth.GDDR5X accomplishes this in two separate ways. First, it moves from a DDR bus to a QDR bus. The diagram below shows the differences between an SDR (single data rate), DDR (double data rate) and QDR (quad data rate) bus.SDR vs. DDR vs. QDRMemory signaling. Image by MonitorInsiderAn SDR bus transfers data only on the rising edge of the clock signal, as it transitioned from a 0 to a 1. A DDR bus transfers data both when the clock is rising and when it falls again, meaning the system can effectively transmit data twice as quickly at the same clock rate. A QDR bus transfers up to four data words per clock cycle — again, effectively doubling bandwidth (compared to DDR) without raising clock speeds.The other change to GDDR5X is that the system now prefetches twice as much data (16n, up from 8n). This change is tied to the QDR signaling — with the higher signaling rate, it makes sense to prefetch more information at a time.MonitorInsider has an excellent discussion of GDDR5X, where they step through the new standard and how it differs from GDDR5. One point they make is that using GDDR5X may not map well to the characteristics of Nvidia GPUs. A 16n prefetch means that each prefetch contains 64 bytes of data. Nvidia GPUs use 32-byte transactions when accessing the L2 cache, which means the DRAM is fetching twice as much information as the L2 cache line size. GDDR5X introduces a feature, Pseudo-independent Memory Accesses, which should help address this inefficiency.GDDR5X doesn’t address the core problem of GDDR5Now that JEDEC has officially ratified the GDDR5X standard, I think it’s much more likely that AMD and Nvidia will adopt it. Given the additional complexity of HBM/HBM2, and the intrinsic difficulty of building stacks of chips connected by tiny wires as opposed to well-understood planar technology, some have argued that GDDR5X actually be the better long-term option.To be fair, we’ve seen exactly this argument play out multiple times before. RDRAM, Larrabee, and Intel’s Itanium 64-bit architecture were all radical departures from the status quo that were billed as the inevitable, expensive technology of the future. In every case, these cutting-edge new technologies were toppled by the continuous evolution of DDR memory, rasterization, and AMD’s x86-64 standard. Why should HBM be any different?The answer is this: HBM and HBM2 don’t just improve memory bandwidth. Both standards cut absolute VRAM power consumption and dramatically improve performance per watt. In addition, HBM2 offers VRAM densities that GDDR5X can’t match. Samsung has 4Gb chips in production right now (16GB per GPU), and expects to be producing 8Gb cards that would allow for 32GB of RAM per GPU in just four stacks of memory. Micron’s current GDDR5 tops out at 1GB per chip. Even if we double this to 2GB for future designs, it would still take 16 chips, each with its own trace routing and surface area requirement. GDDR5X does set a lower signaling voltage of 1.35v, but it’s not a large enough improvement to offset the increased chip counts and overall power consumption.
การแปล กรุณารอสักครู่..
