OS runs in DDR and not BRAM, we have rerun the earlier bandwidth test in DDR
as well to allow for a more direct comparison, the results of which are presented in
Figure 9. In this test, the results for systems B and C are combined as there was no
appreciable difference between the two configurations. We can immediately note that
the maximum achievable application bandwidth has dropped significantly and is no
longer saturated, even with eight cores. Single-core bandwidth has been cut in half
and the bandwidth increases at a lower rate than when the application is run from
BRAM. Comparing the results of the tests run in a stand-alone environment versus
running with an OS, we see a further reduction in the bandwidth achievable when running
with the OS. While we expect some additional overhead while running under an
OS, we expect the impact is magnified here as there are no caches in the system. In
future work, we would like to measure the impact again with a system with level one
caches to see if the overhead of the OS remains as high.