แปลบทความวิจัยTools of a data scientist
Unlike your typical programmer, who may use a standardised set of tools, data scientists tend to use a wide array of ever changing tools. This is because the data science landscape is evolving rapidly, with many new tools still far from maturity. That being said, below we’ve compiled a series of popular tools for data scientists aligned to specific practices:
Data Analysis:
Here, the tools are really just the programming languages a data scientist uses to extract and analyse data. This is typically Python, R and SQL.
Data Warehousing:
A data scientist may choose to have their own database to which they can extract and analyse data. MySQL is among the most popular to handle reasonable size datasets. Moving in to the realms of big data, they would typically turn to programs like Hive or Redshift. You’d also be surprised how far most data scientists can go utilising the average .CSV file before it falls over.
Data Visualisation:
Among the most commonly mentioned tools for data visualisation are D3.js and Tableau. For D3.js, if you can imagine a data visualisation, a data scientist can achieve it using the software. Tableau is the most popular data visualisation tool out there at the moment allowing the compiling data from hundreds of inputs and then easily transforming the data into visualisations.
Machine Learning:
This is perhaps the area most in flux with new tools emerging daily. Most established and widely used is perhaps Scikit-learn which utilises Python for machine learning. Then of course there is Spark MLlib which is Apache’s own machine learning library for Spark and Hadoop.