• Processing Capability
– Pig which is intended to allow people using Hadoop to
focus more on analyzing large datasets and thus
spend less time having to write mapper and reducer
programs [11,12];
– Chukwa which is a data collection system for monitoring
large distributed systems [26,15];
– Oozie which is a open-source tool for handling complex
pipelines of data processing [12,3,11]. Using Oozie, users
can define actions and dependencies between them and
it will schedule them without any intervention [11].