Modify and assemble M. The values and row indices of the
preconditioner generated in the compute preconditioner
kernel are stored in the Mvalue and Mindex vectors in the
format shown in Fig. 4. Since the allocated size to each
column of M on global memory is equal to n2;max (which is
not necessarily equal to the number of nonzeros per
column), to assemble M each warp has to store the number
of nonzeros of the column it is generating into a vector
called Mpointer . In the Post-GSAI stage the Mvalue , Mindex , and
Mpointer data structures are modified to match the CSC
storage format. The first kernel in the Post-GSAI stage called
modify changes Mpointer to match the CSC format (Mp ointer in
Fig. 4). Another kernel called assemble then modifies the
Mindex and Mvalue vectors on the GPU to match the column
storage format (M and M in Fig. 4). The updated index value
vectors are generated on GPU memory and do not need to be transferred to the CPU.