Organization. At this point the architecture has to deal with various data formats (texts formats, compressed files,variously delimited, etc.) and must be able to parse them and extract the actual information like named entities, relation between them, etc. [14]. Also this is the point where data have to be clean, put in a computable mode, structured or semi-structured, integrated and stored in the right location (existing data warehouse, data marts, Operational Data Store,Complex Event Processing engine, NoSQL database) [14].Thus, a kind of ETL (extract, transform, load) had to be
done. Successful cleaning in Big Data architecture is not
entirely guaranteed; in fact “the volume, velocity, variety, and
variability of Big Data may preclude us from taking the time
to cleanse it all thoroughly”