Due to power constraints, clock speeds have largely stalled and processors are being built with increasing numbers of cores. In short, one has to deal with parallelism within a single node. Unfortunately, parallel data processing techniques that were applied in the past for processing data across nodes do not directly apply for intra node parallelism, since the architecture looks very different. For example, there are many more hardware re-sources such as processor caches and processor memory channels that are shared across cores in a single node.