The TMH consists of the pending queue and the scoreboard, as shown in Figure 2. The pending queue holds the tag-missed memory requests. Each scoreboard entry has a target tag, a counter, and an arrival bit. The counter stores the number of memory requests in the pending queue whose tag match the target tag. The arrival bit indicates whether the data of the target tag are loaded on GPU memory. By storing recent tags, the TMH exploits the scoreboard to avoid a burst of data fetch requests to the host. Figure 2(a) shows how the TMH handles the cold/conflict-missed read/atomic request. First, the TMH pushes the request into the pending queue. Next, it searches the scoreboard to find an entry whose target tag matches that of the request. When no matching entry exists in the scoreboard, the TMH generates a new data fetch request for the tag, sends it to the host, and creates a new scoreboard entry with the counter set to one and the arrival bit set to zero. If a matching entry exists in the scoreboard, the TMH does not send a new data fetch request to the host and only increases the counter of the matching entry. When the request reaches the head of the pending queue, it periodically checks the arrival bit of the scoreboard entry to keep track of the availability of its data. When the reply from the host arrives at the TMH, it changes the arrival bit from zero to one. When the request detects that the arrival bit has changed to one, the counter of the matching entry is decreased and the request is sent to the DRAM scheduler for replay. The matching entry is removed from the queue when its counter becomes zero. Figure 2(b) shows how the TMH handles the conflict-missed write request. The TMH simply relays the conflict-missed write request to the host by bypassing the pending queue, and sends a reply to the sender immediately. Simply relaying the conflict-missed write requests to the host is suitable for GPU applications because the write data are not likely to be accessed by other GPU threads following the nature of relaxed GPU memory model [8]. On the other hand, the cold-missed write requests are treated as tag hits and their d
The TMH consists of the pending queue and the scoreboard, as shown in Figure 2. The pending queue holds the tag-missed memory requests. Each scoreboard entry has a target tag, a counter, and an arrival bit. The counter stores the number of memory requests in the pending queue whose tag match the target tag. The arrival bit indicates whether the data of the target tag are loaded on GPU memory. By storing recent tags, the TMH exploits the scoreboard to avoid a burst of data fetch requests to the host. Figure 2(a) shows how the TMH handles the cold/conflict-missed read/atomic request. First, the TMH pushes the request into the pending queue. Next, it searches the scoreboard to find an entry whose target tag matches that of the request. When no matching entry exists in the scoreboard, the TMH generates a new data fetch request for the tag, sends it to the host, and creates a new scoreboard entry with the counter set to one and the arrival bit set to zero. If a matching entry exists in the scoreboard, the TMH does not send a new data fetch request to the host and only increases the counter of the matching entry. When the request reaches the head of the pending queue, it periodically checks the arrival bit of the scoreboard entry to keep track of the availability of its data. When the reply from the host arrives at the TMH, it changes the arrival bit from zero to one. When the request detects that the arrival bit has changed to one, the counter of the matching entry is decreased and the request is sent to the DRAM scheduler for replay. The matching entry is removed from the queue when its counter becomes zero. Figure 2(b) shows how the TMH handles the conflict-missed write request. The TMH simply relays the conflict-missed write request to the host by bypassing the pending queue, and sends a reply to the sender immediately. Simply relaying the conflict-missed write requests to the host is suitable for GPU applications because the write data are not likely to be accessed by other GPU threads following the nature of relaxed GPU memory model [8]. On the other hand, the cold-missed write requests are treated as tag hits and their d
การแปล กรุณารอสักครู่..
