There are, of course, many different ways of doing log processing with MapReduce. However, at an abstract level, a few common patterns emerge (e.g., [7, 9]). The log data is typically stored in the underlying distributed file system (DFS) in timestamp order. In addition, there are usually one or more reference tables containing information about users, locations, etc. These reference tables vary in size but are usually much smaller than the log data. They are typically maintained in an RDBMS but copied to the DFS to make log processing in MapReduce more efficient. For example, at Facebook, 4TB of reference data is reloaded into Hadoop’s DFS every day [7].