Hi Navamin,
You should choose version 3. You should use the formula below to calculate the memory needed to fit into your GPU. If you also use GPU for display, you have less GPU memory for computation, you should factor that consideration in. Increase no.of next_permutation to fit into GPU memory.
Memory = (total permutations * elements) / (number of next_permutation + 1)