The join condition is an inequality predicate which is cannot be evaluated by traditional key partitioning as in equi-joins. This is typically a condition that can be handled by Theta-Join efficiently.
The configurations files from before can be used the same way but need a small addition. An estimated cardinality must be specified for each table that is joined using Theta-Join in the query plan. If the query plan contains a Theta-Join from previously Theta-Joined tables, their estimated cardinality must be specified as well. This cardinality is used to generate the mapping described in [1] and achieve a good load balancing (details provided in the next section). Nevertheless, wrong values do not have significant impact on performance. If no estimate is available, it is better to underestimate the cardinality of a relation.
Cardinality are specified with the following format :
TABLENAME_CARD
In our example, we have to specify the estimated cardinality for LINEITEM and ORDERS :
ORDERS_CARD 10000
LINEITEM_CARD 10000
If we wanted to use the result of this join in a second Theta-Join, we would specify the cardinality the following way :
LINEITEM_ORDERS_CARD 100
We now present the corresponding Java query plan using ThetaJoinComponent :
ProjectOperator projectionLineitem = new ProjectOperator(new int[] { 0, 5 });
DataSourceComponent relationLineitem = new DataSourceComponent(
"LINEITEM",
dataPath + "lineitem" + extension,
_queryPlan)
.addOperator(projectionLineitem);
ProjectOperator projectionOrders = new ProjectOperator(new int[] { 0, 3 });
DataSourceComponent relationOrders = new DataSourceComponent(
"ORDERS",
dataPath + "orders" + extension,
_queryPlan).addOperator(projectionOrders);