By Dimitri Antoniou
A week ago, Codeup launched our immersive Data Science career accelerator! With our first-class kicking off in February and only 25 seats available, we’ve been answering a lot of questions from prospective students. One, in particular, has come up so many times we decided to dedicate a blog post to it. What is the difference between data science and data analytics?
First, let’s define some of our terms! Take a look at this blog to understand what Data Science is. In short, it is a method of turning raw data into action, leading to the desired outcome. Big Data refers to data sets that are large and complex, usually exceeding the capacity of computers and normal processing power to deal with. Machine Learning is the process of ‘learning’ underlying patterns of data in order to automate the extraction of intelligence from that data.
Now, let’s look at the data pipeline that data scientists work through to reach the actionable insights and outcomes we mentioned:
- We start by collecting data, which may come from social media channels, network logs, financials, employee records, or more.
- We then process that data into usable information stored in databases or streamed.
- Next, we look back on the history of that data to summarize, describe, and explain, turning the data into meaningful knowledge. Here we’re primarily using mathematics, statistics, and visualization methods.
- Now we convert that knowledge into intelligence, seeking to predict future events so that we can make decisions in the present. This is where practitioners will introduce mathematical/statistical modeling through machine learning to their data.
- Finally, we enable action by building automations, running tests, building visualizations, monitoring new data, etc.
Data professionals work at different stages of the spectrum to move data through the pipeline. On the left, Big Data Engineers specialize in collecting, storing, and processing data, getting it from Data to Information. In the middle, analysts work to understand and convert that information to knowledge. Lastly, a Machine Learning Engineer utilizes machine learning algorithms to turn intelligence into action by building automations, visualizations, recommendations, and predictions.
Data Scientists span multiple stages of this pipeline, from information to action. They will spend about 70% of their time wrangling data in the information stage. They will conduct a statistical analysis to derive knowledge. Lastly, they predict future events and build automations using machine learning.
For those technical folk out there, data science is to data engineering or machine learning engineering as full-stack development is to front-end or back-end development. For the non-technical folk, data science is the umbrella term that houses data analytics, machine learning, and other data professions.
So what’s the biggest difference between a data analyst and a data scientist? Data scientists utilize computer programming and machine learning in addition to mathematics and statistics.