Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of HDFS and MapReduce framework and inherits Hadoop’s scalability and robustness. It has four primary components:
Agents that run on each machine and emit data
Collectors that receive data from the agent and write to a stable storage
MapReduce jobs for parsing and archiving the data
HICC, Hadoop Infrastructure Care Center; a web-portal style interface for displaying data
Flume from Cloudera is similar to Chukwa both in architecture and features. Architecturally, Chukwa is a batch system. In contrast, Flume is designed more as a continuous stream processing system.