Subscribe to our Newsletter

Receive our news and insights

Blog / Data Science  

The key tools you need to get started with data science in 2020

A Picture of Sergio Infante
By:
January 14, 2020 | Topic: Data Science  
The key tools you need to get started with data science in 2020

There are a lot of tools for data scientists. Some of these tools are losing popularity and some are becoming better known and more widely used. If I were to prepare a complete list of them, it would be very extensive, and probably not very useful.

So, to prepare a more useful list for 2020, I decided to consider these market-driven criteria:

  • The skills requested in job portals
  • Company requirements identified in round tables and meetings with Belatrix’s clients.
  • Tool popularity identified by developer surveys and developer portals, such as Stack Overflow and others.

If you are considering learning about data science projects or wanting to improve your use of data in your organization, there are a lot of reasons why you should concentrate on the following three technologies that are in demand and growing:

1. Python

While several programming languages have become key to data science, there is probably no other tool or language that has become as core to the topic, as has Python. Its popularity also continues to increase, with little sign that this is going to change in the near future. So if you’re a student wanting to get started on the topic, you’ll be doing little wrong by focusing on it. Among the benefits it offers are the ability to deal with the statistical functions, while it also has numerous libraries available (I’ll discuss some of them below). Python was the 2nd most loved technology in Stack Overflow’s 2019 survey.

2. Pandas

Pandas is a Python library, and has quickly become indispensable for those working in data science using the programming language. If you’re wondering about the name, it refers to “panel data”. As one commentator pointed out, Pandas has “become the backbone of most data projects”. It is open source, and we’ve seen many organizations use it successfully together with the popular libraries Matplotlib and NumPy. In 2019, it was the fourth most popular framework/library according to Stack Overflow.

3. Scikit-Learn

Increasingly organizations are looking for the insights that machine learning algorithms can provide. Scikit-Learn is an excellent Python library with which you can get started, learn, and implement machine learning solutions. Within the teams at Belatrix, developers have commented upon its ease of use, as well as the availability of different algorithms (supervised and unsupervised), which can rapidly speed up both learning and implementation time.

Based on my own experience, the best way to learn these tools and technologies is practice, so you’ll need data and examples to learn. I recommend using Kaggle. You’ll love it because you can find challenges (competitions), datasets and notebooks available to you.

In addition to the above three, make sure to also be familiar with:

  1. SQL for querying relational databases.
  2. Tableau for data visualization.
  3. A cloud platform (AWS, Azure or Google Cloud Platform).
  4. A deep learning framework, such as TensorFlow.

Where you may not want to invest your energy

In addition to highlighting what I believe you as a data scientist will need in 2020, it’s also worth pointing out the areas where I personally recommend not focusing. Of course, this will depend on your individual situation – for example, if the organization you’re working for is already using these technologies, then you’ll likely disagree.

  • Apache products like Hive, Hadoop and Spark. Hadoop in particular is starting to reflect that it is already over a decade old. Organizations have so many options today to handle large quantities of data, and indeed it is quite rare today for us to hear Hadoop in client conversations. Belatrix’s President and Co-founder, even questioned in a blog post last year, if we’re “nearing the end of Hadoop”.
  • R programming language. In several developer survey results, R is showing a decrease in popularity. I believe this is likely to continue, and would recommend to students to focus their time on those I have listed above instead.

What do you think of my list? Whether or not you agree, we know that data science will be of ever greater importance for businesses in 2020 and beyond. Everything from helping to optimize the algorithms of a fashion retailer, to improving the supply chain management of a large manufacturer, will require individuals with data science expertise.

Service Design: Providing meaningful end-to-end experiences

Related Services

EXECUTIVE INSIGHTS

Business  

The leaders we need to navigate the COVID-19 storm

By

April 23 / 2020

1 Stars2 Stars3 Stars4 Stars5 Stars
Loading...

As we gradually get used to our new COVID-19 reality, daily life from just a few weeks ago now feels like a lifetime away. For businesses this has created,...

Read post

HOT
TOPIC