Parallel Hash Join
A hash-based refinement of the approach offers improved performance. The main observation is that if A and B are very large, and the number of partitions k is chosen to be equal to the number of processors n, the size of each partition may still be large, leading to a high cost for each local join at the n processors.
An alternative is to execute the smaller joins
the other, but with each join executed in parallel using all processors. This approach allowsus to utilize the total available main memory at all n processors in each join Ai Bi and is described in more detail as follows:
1. At each site, apply a hash function h1 to partition the A and B tuples at this site into partitions i =1 :::k. Let A be the smaller relation. The number of partitions k is chosen such that each partition of A ts into the aggregate or combined memory of all n processors.
2. For i =1 ...k, process the join of the ith partitions of A and B. To compute Ai Bi , do the following at every site:
(a) Apply a second hash function h2toall Ai tuples to determine where they should be joined and send tuple t to site h2(t).
(b) As Ai tuples arrive to be joined, add them to an in-memory hash table.
(c) After all Ai tuples have been distributed, apply h2 to Bi tuples to determine where they should be joined and send tuple t to site h2(t).
(d) As Bi tuples arrive to be joined, probe the in-memory table of Ai tuples and output result tuples.
The use of the second hash function h2 ensures that tuples are (more or less) uniformly distributed across all n processors participating in the join. This approach greatly reduces the cost for each of the smaller joins and therefore reduces the overall join cost. Observe that all available processors are fully utilized, even though the smaller joins are carried out one after the other.