Companies are waking up to data architecture being a companywide matter, not just an issue for the chief data officer. But like anything in IT, it has led to an explosion of hype, IT concepts, acronyms, loose definitions, and confusion.
Data leaders wading through this growing alphabet soup of new IT vocabulary terms will find two have emerged to have a polarizing effect: data mesh and data fabric. Each has its merits and reasons why companies are taking a hard look.
While they offer options for companies trying to tame the unruly mess left by data fragmentation and unlock more value from their data sets, they also put data governance at the front and center of their data strategies.
Reframing the data headache
Data fabric is essentially a design concept and technology architecture. Forrester analyst Noel Yuhanna was among the first to define the term in the mid-2000s.
“Data fabric is a design concept and technology architecture geared toward addressing the complexity of data management to operate in any hybrid or fragmented data ecosystem. It emphasizes data automation, a robust metadata foundation, and a flexible technology backbone,” says Jon Teo, data governance domain expert at Informatica.
The primary value of a data fabric for a disparate data ecosystem is that it unifies its representation, management, and data access without having to consolidate the data assets physically. According to Teo, the “beauty” of having a flexible, unified platform is that new or changed data can be easily added or onboarded using a low or no code approach. Essentially, it allows data stewards, engineers, analysts, and scientists to create better data pipelines and oversee them.
While the benefits are clear, practicing it has been a problem. In the past, assembling a data fabric with the full suite of data management capabilities for both on-premises and cloud ecosystems would require multiple vendors' tools. Then, you are saddled with the additional cost and associated risks of stitching these together.
Now, companies like Informatica offer all the pillars needed to establish a data fabric architecture built on AI-powered metadata across end-to-end data management capabilities. They range from data integration to data governance and from data mastering to self-service analytics.
Data mesh takes a different tact to address data fragmentation within the enterprise. Instead of an overarching technology management layer, it makes it a human question by creating distributed teams to manage their data domains as they see fit.
“Data mesh focuses on organizational change – enabling domain teams to own the delivery of data products with the understanding that the domain teams are closer to their data and thus understand their data better,” says Teo.
The idea is that these localized data “product teams” (composed of various roles such as business data stewards, analysts, and engineers) will know their domain data better. Combining their skills and knowledge capabilities will help companies quickly gain insights into complex questions and scale analytic capabilities across the organization.
Data fabric and mesh are catching on because of the growing frustration of managing data warehouses and data lakes. Data teams used the former to store structured data for SQL analytics, while the latter held unstructured data for machine learning modeling.
“Data meshes are a response to the evolution of data management centralization that we’ve seen for the past few years and helps to address issues of agility and scaling,” says Teo.
Where these two concepts differ is in how APIs are addressed. While data meshes require data teams to code (or adopt) APIs, data fabric takes a no-code or low-code approach where the APIs are part of the architecture, with automation playing a significant role.
Teo adds that both concepts use APIs, endpoint integration, and other connectivity options extensively “under the hood” to provide seamless data access.
Whichever data architecture concept a company chooses, it will need data governance to make it work. But there’s one gaping hole in both concepts: data governance. And for Teo, the distributed nature of the physical data assets in both these architectures makes it even more urgent that companies get their data governance right before starting their journeys.
Bite-sized data governance success
Hellofresh offers an excellent example of what data governance can achieve when adopting new data architecture concepts.
The ready-meal kit delivery company, which operates in different countries and has over 11,000 employees, wanted to decentralize data ownership to democratize analytics and data science across the company.
So in 2019, they began their first steps into adopting data mesh, making domain owners and data producers responsible for the quality of their data and treating data assets as a product to be delivered to the rest of the company.
But before embarking on this data quest, the global data governance lead worked with senior data leaders to establish new standards for data governance. This included finding the right tooling to catalog metadata and track data lineage.
“Those capabilities were essential for maintaining high-quality data and making it easy for all HelloFresh employees to find it,” says Teo.
The proof in the pudding came the very next year, in 2020, when COVID-19 dramatically changed the way people worked and lived. Hellofresh was drowning in a deluge of meal box orders. It knew it had to make faster decisions to meet these new demands or lose this business opportunity and tarnish its reputation.
Essentially, with data mesh, the company accurately forecasted customer demand and the number of ingredients needed for the meal kits. It allowed HelloFresh to fulfill the increase in orders while enabling it to finetune its logistics to balance the food capacity in its warehouses and ensure that all meal kits are fresh when customers receive them.
“These solutions helped the organization quickly pivot its analytics capabilities to navigate the shift in order and customer patterns. Hellofresh delivered over 600 million meals in 2020 and more than doubled its year-on-year revenue,” Teo observes.
Why start with data governance?
When discussing data governance, we often look for one significant benefit: visibility. The proper metadata approach and automated platforms can allow data governance leaders to gain detailed visibility of data assets and lifecycle along with data ownership, stewardship practices, and the organizational control principles that must be enforced consistently. Properly executed, it also allows them to take quick actions where data gaps are found.
Data scientists, stewards, or engineers can tell you that data transparency and easy discoverability is often difficult to achieve — especially in a sprawling organization that works with fragmented data sources and repositories. Both the data fabric and data mesh specify the foundational need for "active metadata management" adding discoverability and addressability of data products across different domains.
“From a data fabric perspective, data governance relies a lot on having a consistent enterprise metadata ‘map’ of the data universe that drives a lot of visibility, consistency of action,” explains Teo.
Data meshes rely on consistent data governance differently, one that “is much more than a technology architecture or approach,” Teo explains. “Establishing a model of ‘Distributed Data Products’ model relies on successful organizational cooperation and change as well.”
Companies need to consider how different, distributed “domain custodians” are empowered with the right governance models. In addition, the companies need to ensure “consistent visibility of the data assets, self-serve data facilities and more,” Teo adds.
Data meshes add an additional requirement. “With data mesh, the central data governance focus is to establish standards that allow flexibility and interoperability, instead of mainly focusing on policy control objectives,” explains Teo.
Finding the starting point
So how can companies start this data governance and embrace these new data architectural concepts?
Well, for starters, it should not be seen as replacing a data lake or data lakehouse, says Teo. “For example, data warehouses and data lakes can still exist in the mesh architecture, but they become just another node in the mesh, rather than a centralized monolith.”
Instead, companies need to prioritize strategic governance factors and leverage their data management organization structures and operating model, “perhaps adapted to the type of interactions envisioned by a data mesh framework,” adds Teo.
This means empowering the appropriate data owners, custodians, and technical and policy standards that allow data products to be developed, discovered, interchanged, and used reliably.
Teo observes that some organizations in the region are beginning to explore these architectural patterns and concepts, given that data productization can unlock greater agility and competitiveness.
“The idea of the data catalog and ‘Enterprise Metadata’ has really taken hold as the foundational element to allow management of the disparate or hybrid IT ecosystems most organizations have,” says Teo.
“A common, enriched view of data assets in different data stores, structures, and data environments, where on-premises or in the cloud, will allow for common data management practices and unified data consumption and usage experiences,” he adds.
With a proper data governance framework, companies can start solving complex data problems and confidently choose the right data architecture to scale their models quickly. It becomes especially vital as companywide AI plays a more significant role in helping Asia Pacific businesses navigate a challenging market environment.