Locating and collecting reliable data from multiple sources that are in various formats
Preparing the data for analysis.
Collected is not usable until it has been organized, standardized, duplicates are removed (called deduping), and other data cleansing processes are done