Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
In the counting k-mers example above, you can see there is a defined predicate and projection. The predicate allows rapid filtering of rows while a projection allows you to efficiently materialize only specific columns for analysis. For this k-mer counting example, we filter out any records that are not mapped or have a MAPQ less than 20 using a predicate and only materialize the Sequence, ReadMapped flag and MAPQ columns and skip over all other fields like Reference or Start position, e.g.