By contrast, we leverage existing memory systems to realize PIM with minimal changes to the current ecosystem. Thus, our approach adds minimal amount of computing capability to the memory die for offloading memory-intensive operations while leaving complex or unbounded controls to the processor side. However, considering the gap between internal and external bandwidth of multi-bank DRAM, the approach tries to maximally exploit the excessive internal bandwidth.