In some warehouse workloads, two tables are frequently joined together.
For example, the TPC-H benchmark frequently joins the
lineitem and order tables. A technique commonly used by MPP
databases is to co-partition the two tables based on their join key in
the data loading process. In distributed file systems like HDFS,
the storage system is schema-agnostic, which prevents data copartitioning.
Shark allows co-partitioning two tables on a common
key for faster joins in subsequent queries.