Unlike the Grep task’s data set, which was uploaded directly into HDFS unaltered, the UserVisits and Rankings data sets
needed to be modified so that the first and second columns are separated by a tab delimiter and all other fields in each line are separated by a unique field delimiter. Because there are no schemas in
the MR model, in order to access the different attributes at run time,
the Map and Reduce functions in each task must manually split the
value by the delimiter character into an array of strings.