• Storage costs. Because of the low cost of Hadoop storage, you could store both versions of the data in the HDFS: the before application data and the after transformed data. Your data would then all be in one place, making it easier to manage, reprocess (if needed) and analyze at a later date.
• Processing power. Processing data in Hadoop frees up EDW resources and gets data processed, transformed and into your EDW quicker so that the analysis work can begin.
Back in the early days of Hadoop, some went so far as to call Hadoop the “ETL killer,” putting ETL vendors at risk and on the defensive. Fortunately, these vendors quickly re-sponded with new HDFS connectors, making it easier for organizations to optimize their ETL investments in this new Hadoop world.
If you’re experiencing rapid application data growth and/or you’re having trouble getting all your ETL jobs to finish in a timely manner, consider handing off some of this work to Hadoop – using your ETL vendor’s Hadoop/HDFS connector or MapReduce – and get ahead of your data, not behind it.