Since its introduction just a few years ago, the MapReduce framework [16] has become extremely popular for analyzing large datasets in cluster environments. The success of MapReduce stems from hiding the details of parallelization, fault tolerance, and load balancing in a simple programming framework. Despite its popularity, some have argued that MapReduce is a step backwards and ignores many of the valuable lessons learned in parallel RDBMSs over the last 30 years [18, 25]. Among other things, the critics of MapReduce like to point out its lack of a schema, lack of a declarative query language, and lack of indexes