Some part of the code need special attention if they cannot be parallelized as written. For example, there is a section of PCRTM in which height values are determined by calculating a height difference and repeatedly adding the difference to an accumulating height variable and storing that variable in an array of z values. In this way, the a array is an array of partial sums of the height differences. This cannot be parallelized entirely.Rewriting it as a sum reduction can partially parallelize the summation as described in [15] but this would give no substantial speedup with the low loop number (101 vertical steps) involved. The code can still be modified to optimize it for the GPU, specifically by computing each height difference in parallel and then performing only the partial sums sequentially.