Demand for data scientists is higher than ever as businesses turn to the power of data and AI for everything from enhancing their customer experience to increasing the efficiency of operations.
Indeed, a 2021 survey found that data scientists in the United States can expect a median salary of USD164,500 in 2020, up eight percent from the median salary of USD152,500 in 2019.
That data scientists are increasingly hard to find and expensive to hire threatens to derail the data-driven future that organizations are working towards. But what if someone told you that the shortage isn’t as dire as it appears on the surface?
Dissecting the data talent shortage
In a contributed opinion piece on TechRepublic, Guillaume Moutier says that the real issue behind the data talent shortage isn’t what many of us may think.
Moutier is the senior principal data engineering architect at Red Hat Cloud, and he argues that the shortage does not stem from a dearth of data scientists who can perform data modeling, but from finding people versed in data management and manipulation.
“This is where the shortage lies: Data scientists who are nearly as skilled in software engineering as they are in data modeling. Enterprises need people who know how to productize their output so it can be used in real-world use cases, not just people who can build an effective model,” he explained.
In a nutshell, Moutier is saying we don’t have enough people to perform the required work around business analysis and data preparation, with the latter ranging from accessing data repositories to data wrangling. (Data wrangling is the process of transforming and mapping data from one format into another)
So yes, while we might need rocket scientists to build a rocket, they are hardly the only experts needed. And within the context of the data science field, these other roles within the data ecosystem greatly outnumber that of data scientists.
The making of a data engineer
Crucially, training these data professionals takes far fewer resources and less time than it takes to train full-fledged data scientists. And while it might take years for a data scientist to complete their education — followed by more years for the work experience to make a difference, organizations can equip suitable professionals with data-centric skillsets relatively quickly.
A recent blog post by online learning platform DataCamp highlighted the wide range of technical skills that data engineers should have. Skills include database management, programming, and cloud computing, as well as familiarity with ETL (Extract, Transform, Load) frameworks for manipulating data and stream processing frameworks for working with real-time data.
Seen from this perspective, it is apparent that experienced database administrators or programmers already have a significant portion of the skills that data engineers require. It is no wonder that organizations are taking to training their existing employees to transition them to data science roles.
Closing the data science gap
What’s more, while data engineering roles likely make up the lion’s share of data roles, they are by no means the only ones. For instance, there is the data analyst tasked with visualizing and transforming the data, the data storyteller to find the narrative that best expresses the data, or the business intelligence developer to provide the analytics and business insights — and many other specialized roles.
Suddenly, closing the data science gap looks a lot less frightening — and organizations run the risk of missing the forest for the trees if they focus solely on the data scientist and ignore the broader data ecosystem. Of course, there is no running away from having to draw up competitive salary and benefits packages to hire the data scientists that are needed.
To excel in data science, businesses must concentrate on building a team.
Ultimately, businesses are increasingly moving away from gut decisions or limiting personal experiences when making major business decisions. They see machine learning and data-driven decisions as the way forward to improve the efficiency (or operations) of their organizations and increase profitability.
And as I previously observed, organizations that seek to succeed with data must first establish organization-wide competency with data by developing data-centric skills and bringing about data democratization. And they need to start today.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/NickR