latency and high bandwidth when multiple threads
access the same memory position.[6] This is important
because these consecutive accesses are repeatedly
required by the SA algorithm. Each step of the Markov
chain three requires a random numbers. NVidia’s
CURAND library facilitates generation of random
numbers to use them immediately by the kernels. This
has reduced the time of reading and writing from the
global memory. More about CURAND library can be
found on [7].