At four threads, it reaches 424.6MB/S. When a few of the OSTs are overloaded, we observe that there is a huge drop in throughput for the naive algorithm, for example, Naive(4,l) drops by 30% compared to the Naive(4,0) because it cannot avoid the congested OSTs