A set of local 32-bit registers is available for each SP. The SMs communicate through the global/device memory.
The global memory can be read or written by the host, and is persistent across kernel launches by the same application. Shared memory is managed explicitly by the programmers.