In some warehouse workloads, two tables are frequently joined together.
For example, the TPC-H benchmark frequently joins the
lineitem and order tables. A technique commonly used by MPP
databases is to copartition the two tables based on their join key in
the data loading process. In distributed file systems like HDFS,
the storage system is schema-agnostic, which prevents data copartitioning.
Shark allows co-partitioning two tables on a common
key for faster joins in subsequent queries.