Python Use By Data Scientists Growing

Python, SQL, and R continue to be the top programming languages for data science professionals, according to the findings of Kaggle’s State of Data Science and Machine Learning report.

The annual survey is noteworthy due to the large number of participants – it received responses from almost 20,000 data professionals from 171 countries and territories this time.

Language of choice

An analysis published on Business Broadway looked at the raw data from the Kaggle survey, and concluded that data professionals used an average of three languages in 2019. The top programming language was Python (87%), followed by SQL (44%) and R (31%). The other languages in the top 10 list include Java, C and C++, JavaScript, Bash, MATLAB, and TypeScript.

By comparing the 2019 with the 2018 data, it was found that Python saw an increase of 4 percentage points (83% in 2018) in popularity, while SQL stayed the same (40%). On the other hand, R usage decreased 5 percentage points (36% in 2018) from last year, and 15 percentage points from two years ago (46% in 2017).

In the same vein, eight in 10 data professionals (79%) would recommend the use of Python as the language to learn first. Only one in 10 (9%) would recommend going with R.

Based on the data, Python is hence the most popular language used by data professionals by far – and its popularity is increasing. R, on the other hand, is seeing a sustained decline in use. Extrapolating from the data, Python is clearly becoming the default language of choice for data science and machine learning.

Other findings

According to the Kaggle report, data science is mostly male currently, an ongoing imbalance. Over half of them are over thirty years old and are highly educated. Interestingly, more than half have fewer than five years of coding, and less than that in terms of machine learning experience.

And as most might have suspected, most data scientists work at either small or very large organizations, with a huge pay disparity in how much they are paid. Specifically, the report noted that salaries for data scientists in the United Sates “far exceed” that of other countries.

When it comes to deep learning frameworks, TensorFlow and Keras continue to be dominant, while usage of Google Cloud AutoML nearly doubled compared to the previous year. Simpler methods such as linear regressions and decision trees are more heavily used compared to more complex techniques.

Finally, data scientists have adopted cloud computing in their work, though the Kaggle report stressed that they are not using it as a replacement for local developer environments. On that front, Amazon Web Services (AWS) and Google Cloud are the top public cloud platforms of choice.

Continuing education

Perhaps one of the most encouraging findings is the nascent nature of data science. Indeed, more than half of the surveyed companies are new to machine learning, which means it is still not too late for businesses to start exploring how data science can benefit their organizations.

Overall, the community is relatively young but highly educated, work at companies of all sizes and still trying to figure out the best way to adopt machine learning technologies. If there is one thing that stood out, it would be the intense amount of learning inherent to the field of data science.

According to the report, most respondents continue to learn new data science skills despite many data scientists already possessing advanced degrees. The cutting-edge and rapidly evolving nature of this field precludes formal education methods, too: Blogs, Kaggle forums, Coursera, and YouTube are the top avenues for ongoing education.

The Kaggle State of Data Science and Machine Learning can be accessed here.

Photo credit: iStockphoto/tookitook