The proposed design is extremely memory-efficient in the sense that the memory requirement to store the H-matrix of the proposed design is only 0.39% of that when we store the sparse original H-matrix as it is Table V summarizes the total area which is the sum of the memory circuit size which has been generated by a memory compiler and the control circuit size which has been added due to using an AGU. Also Table V shows how much is the latency overhead due to using the AGU.