Introduction
Simple Scalable Streaming System
General-purpose, distributed, scalable, partially fault- tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data
Inspired by MapReduce and Actor models of computation
Real-time data analysis (financial data, twitter feeds, news data ...)
High-frequency trading Complex Event Processing Real-time Search Social Networks Incorporating Web application user feedback in real time Search advertising personalization Low-latency data processing pipelines Online algorithm development
Existing large-scale MapReduce data processing platforms (Hadoop) are highly optimized for batch processing MapReduce systems typically operate on static data by scheduling batch jobs In stream computing, the paradigm is to have a stream of events that flow into the system at a given data rate over which we have no control The processing system must keep up with event rate or degrade gracefully by eliminating events (load shedding) The streaming paradigm dictates a very different architecture than the one used in batch processing Attempting to build a general-purpose platform for both batch and stream computing would result in highly complex system that may end up not being optimal for either task