In summary, we have a single upper-level intersection and multiple lower-level intersections. The upper-level inter- section could be executed based on the scalar merge (c.f. Section 2.2) or also using an intersection based on STTNI. However, an important observation is that in most cases the upper-level intersection is responsible only for a small fraction of the execution time of the complete intersection process. In general, the fraction of upper-level and lower- level execution time depends on the average cardinality of the subsets. Roughly speaking, the higher the cardinality of the subsets, the more of the execution time is spend for the intersection of the subsets. For this reason, we use a scalar algorithm for the upper-level intersection and our 16- bit parallel algorithm for the lower-level intersections.