Although it requires extra work to tile sort the points and requires each point to be written twice before reaching its final destination, this approach produces much more coherent memory accesses. In our implementation, using the tiled z-buffer approach reduced the total time for projection and z-buffering from 42 milliseconds to 25 for a 512x512 image. Since this is the most expensive part of the render cache
computations, this is a significant savings. A tiled approach has be used previously to parallelize3 the render cache by explicitly partitioning the point cloud to reduce communication. By dynamically sorting the projected points each frame, our tiling approach has fewer visual artifacts and can be more flexible in its sampling and point cloud update strategies. We hope to explore our approach as a potentially better parallelization strategy if we have access to a suitable shared memory parallel machine.