Many real-world systems implement a streaming strategy of partitioning the input data into fixed-size segments that are processed by MapReduce platform
The disadvantage of this approach is that the latency is proportional to the length of the segment plus the overhead required to do the segmentation and initiate the processing jobs
Small segments will reduce latency and overhead, but will make it more complex to manage inter-segment dependencies
The optimal segment size will depend on the application S4 attemps to explore a different paradigm that is simple and can operate on data streams in real time