Developing Data Competency: Train or Hire, Build or Buy?

Around the world, organizations are rushing to develop data competency to establish supremacy in an increasingly competitive, data-driven world.

As businesses set up new analytics or data science departments, the debate of whether they should hire new data scientists or train existing employees with data skills often arises.

Should businesses train or hire?

In a blog last week, Allison Buenemann, an industry principal at Seeq Corp observed that there are advantages to hiring the best and brightest data scientists, which brings with it benefits such as establishing immediate competency in terms of a strong data science foundation and data wrangling chops.

There are disadvantages, too. For one, Buenemann argues that hiring might be far costlier than organizations expect due to two factors. This is because a hypothetical energy company A doesn’t just have to outbid energy company B but must also compete with other large, well-resourced organizations such as tech unicorns, investment banks, and consulting firms.

Moreover, it is not reasonable to have even the best data scientists start work and become productive on day one, if only because it takes time to assimilate into the company. There is also the need to bring them up to speed with the help of subject matter experts and business leads, who must allocate time for the new hires.

Her recommendation: Adopt a hybrid approach of hiring data scientists and building data citizen data scientists by training, so that insiders get expert support while data scientists don’t have to start from scratch.

Speaking from the viewpoint of a process manufacturer, she wrote: “Newly hired data scientists will be most effective when they work in collaboration with site manufacturing engineers and domain experts.”

Don’t forget to equip them

The next major consideration will be the tools to be deployed, which represents another common conundrum for organizations starting on their data science journey.

While most would associate data science and AI with powerful computing systems, the truth is that typical business-centric AI models rarely require the next-gen hardware that we cover from time to time. What matters more is the data stack.

According to the “The State of Data Science 2020” report by Anaconda, respondents report that 45 percent of their time getting data ready (data loading - 19%, data cleansing - 26%), 21 percent of their time on data visualization, and 23 percent on model selection, training, and scoring. The last 11 percent of their time is spent deploying AI models.

Having the requisite software tools in place to bridge data silos, simplify data wrangling, and perform data visualization can hence speed up data science initiatives tremendously.

Of course, a suitable software stack must be set up from the scores of open source libraries and tools for anything from data analysis, visualization, or even machine learning. There are also commercial offerings from the likes of Tableau, Qlik, and Tibco, among others.

Hiring practices will probably need to be aligned with the chosen data stack. But unless there are specific reasons where experience with a particular tool or language is required, I’ll argue it’s better to hire for data manipulation skills and the ability to analyze data.

Building a movement

Ultimately, organizational-wide data competency is a movement that will only succeed when the power of data is made available to the broadest cross-section of the company. And the easiest way to get there is through the introduction of self-service analytics.

“Transforming an existing workforce into an army of citizen data scientists is best achieved through technology investment in self-service analytics software offering different experiences catering to the different user personas,” wrote Buenemann.

Humans being humans, though, just having self-service analytics available does not mean people will automatically gravitate towards them.

As I wrote in “3 Steps to Train More Citizen Data Scientists”, businesses should start by identifying champions who can lead the way as citizen data scientists and center their efforts on strategic projects that leverage the power of data.  

While having more data is generally better than having less data, be careful not to go overboard with data collection. As we argued before, data for data’s sake is completely pointless, and can even be harmful when erroneous inferences are drawn out of thin air from random data. So, focus on quality data that aligns with the objectives of the organization.

Both data science and AI are here to stay, which means the time to develop data competency is now.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/Koh Sze Kiat