A relational query execution plan is a graph of relational algebra operators and the operators in a graph can be executed in parallel. If an operator consumes the output of a second operator, its pipelined parallelism; if not, the two operators can proceed essentially independently. An operator is said to block if it produces no output until it has consumed all its inputs. Pipelined parallelism is limited by the presence of operators (e.g., sorting or aggregation) that block.
The key to evaluating an operator in parallel is to partition the input data; we can then work on each partition in parallel and combine the results. This approach is called data-partitioned parallel evaluation.
Data Partitioning
Partitioning a large dataset horizontally across several disks enables us to exploit the I/O bandwidth of the disks by reading and writing them in parallel. There are several ways to horizontally partition a relation. If there are n processors, the ith tuple is assigned to processor i mod n in round-robin partitioning. In hash partitioning, a hash function is applied to a tuple to determine its processor. In range partitioning, tuples are sorted (conceptually), and n ranges are chosen for the sort key values so that each range contains roughly the same number of tuples; tuples in range i are assigned to processor i.
Round-robin partitioning is suitable for eciently evaluating queries that access the entire relation. If only a subset of the tuples (e.g., those that satisfy the selection con-dition age = 20) is required, hash partitioning and range partitioning are better than round-robin partitioning because they enable us to access only those disks that contain matching tuples.If range selections such as 15