We considered extending the open source Hadoop platform to support computation of unbound streams but we quickly realized that the Hadoop platform was highly optimized for batch processing.
MapReduce systems typically operate on static data by scheduling batch jobs
In stream computing, the paradigm is to have a stream of events that flow into the system at a given data rate over which we have no control.
The processing system must keep up with the event rate or degrade gracefully by eliminating events, this is typically called load shedding
The streaming paradigm dictates a very different architecture than the one used in batch processing.
Attempting to build a general-purpose platform for both batch and stream computing would result in a highly complex system that may end up not being optimal for either task