tests across the multiple processors, we have the CPUs wait on a barrier until they are
all ready to start. Completion time is determined after they have all exited a second
barrier upon completion. In this test, the application is run within BRAM, with the
only requests to Double Data Rate (DDR) memory being those issued for measuring
the read bandwidth, along with a few additional requests for the implementation of
the barriers.
Figure 8 presents the maximum system read bandwidth for the three system con-
figurations as the number of cores is increased from one to eight. Each system A, B,
and C has been labeled with the clock frequencies of the three levels in the system
(processors, arbiter, memory controller). For all three systems, memory bandwidth is
saturated between four and five cores and is stable after that point. From the difference
between systems B and C we can see that there is minimal impact in running
the arbiter in the same clock domain as the processors and at half the frequency of the
memory controller. While in all cases we note that bandwidth scales linearly with the
number of cores (up to the saturation point), for system C, the bandwidth also scales
with a one-to-one ratio and nearly so for system B. Therefore, in terms of bandwidth,
the largest factor in our system is the operating frequency of the memory controller.
6.2. Pthreads Support
One benefit of the port to the Linux 3.7 kernel and the newer release of PetaLinux
(v12.12) is the addition of pthreads support in glib that was not present in earlier
versions. As the pthreads library makes use of the LWX and SWX instructions, and we
have altered the semantics of these instructions to be a superset of their original behavior,
testing pthreads support in the system is another way to verify the implementation
of our conditional load/store operation.
As we have already investigated the stand-alone bandwidth of the system, an interesting
extension of that is to see what impact the OS has on achievable bandwidth.
As such, we have set up a pthreads-based multi-threaded application with the same
structure as our stand-alone test using a barrier to synchronize the threads. Since the