One of last year's biggest topics is about to get bigger in 2022. And it’s data mesh, a term coined by Zhamak Dehghani, the principal consultant at Thoughtworks.
I’m not going to get into the fundamentals and principles behind it. My colleague did an excellent job laying it out in this article.
In short, data mesh tackles large, complex, monolithic data infrastructures in a similar way microservices broke up monolithic enterprise software into manageable chunks.
In the case of large data architectures, a data mesh subdivides them into discrete data domains. This allows smaller, cross-functional teams to manage and work on them more efficiently, with each team having an intimate knowledge of the data sources, consumers, and functional issues. More importantly, the teams get to understand the context of the data.
We’ve been on this road before
On paper, that sounds great. But in reality, we’ve also seen how decentralized data management created many more problems than solutions. We all understood the complexity created by distributed databases. While Hadoop helped process data where it lived, it added to the rising complexity.
Then you have non-data-related issues. They include communications between domains and teams, limited resources in a shared infrastructure, and unwieldy data flows — all making data meshes complex to manage.
The greatest challenge lies with mindset. As noted in the earlier CDOTrends articles, data mesh treats data as products. This is not an overnight mindset switch that companies make after decades of looking at data very differently. Moving from lumping data into a centralized data warehouse, an idea from the 1980s to treating them as products will be a mental leap for many.
The secret weapon in DataOps
All this is about to change. Part of the reason is the rising maturity of DataOps teams, which plays very nicely into the data mesh approach.
DataOps already addresses the issues of working with decentralized organizational structures. And as this article notes, the DataOps stack is also maturing, allowing teams to build end-to-end pipelines. As the author, Eran Strod wrote: “Data mesh encourages autonomy, while DataOps handles global orchestration, shared infrastructure, inter-domain dependencies and enables policy enforcement.”
Companies like VistaPrint and Zalando are implementing data meshes. In this interview, Sebastian Klapdor, VistaPrint’s executive vice president and chief data officer, said that the cross-functional teams moved fast “without being dependent on other teams.” He also noted that the teams could build better data products because they understood the data context.
Most importantly, DataOps teams and data engineers already see their data as products. It is how they adopted some of the practices of DevOps successfully. So, there is no mind leap required.
The lure of democratization
As this McKinsey article notes, the data mesh is not a cure-all. Specific use cases will need centralized domain definitions with central storage. And we need to create global data governance standards to allow interoperability of data assets.
Where data meshes will really play to their strength is in data democratization. While past forays into this concept looked at offering data access to all users in a company, data mesh goes further by providing a framework for democratizing data management.
This is a powerful solution in a market where data talent is in short supply, and the appetite for data-driven insights continues to multiply with no end in sight.
Winston Thomas is the editor-in-chief of CDOTrends and DigitalWorkforceTrends. He’s a singularity believer, a blockchain enthusiast, and believes we already live in a metaverse. You can reach him at [email protected].
Image credit: iStockphoto/carloscastilla