4.4 Thread Mapping Since the number of threads of parallel applications in shared memory systems is currently relatively low (limited to several thousands of threads), it is feasible to evaluate the global communication behavior of the application to perform the thread mapping. The thread mapping problem is defined as finding a mapping of threads to processing units (PUs) that maximizes the locality, given a description of the communication behavior and the hardware hierarchy. Several algorithms have been suggested previously to calculate this mapping. Most require graph-based descriptions of the behavior and the hierarchy.