In order to cope with the growing processing demands,
dedicated hardware acceleration of the most execution time
consuming and therefore critical algorithms is a suitable
technique [2]. However the efficiency of this method is heavily
dependent on the concept for the hardware/software interaction
[3]. In traditional approaches, time critical algorithms
are offloaded to stand-alone hardware acceleration
units, whereas Direct Memory Access (DMA) controllers are
used for the data transfers between hardware accelerators
and memories