Companies have been talking about data governance for over a decade, as seen in this article.
Today, every data user, from the data scientist to the decision-maker using the data science models' insights, understands data governance’s value. Leading companies who have jumped on the bandwagon also demonstrated an unfair advantage over their followers in building more accurate data models and make better data-driven decisions quickly.
However, all this enthusiasm does not veer from the fact that data governance is complex. Raw data can be filled with errors and garbage bits and bytes; weeding them out and validating the data sources are gargantuan tasks for many DataOps teams that are often populated by very few data engineers.
It’s only going to get worse. As IoT sensors turn on, they’ll pump more data (often not standardized) into growing data lakes, threatening to turn them into data swamps very quickly. And is the job of the DataOps teams to build the proper pipelines from these varied sources.
That’s only half of their job. Today’s data engineers are also involved in automating and scaling data science models across the organization, making the traditional analogy of a plumber inaccurate. The nuances of creating a suitable model and ensuring no model drift occur add further complexity to data governance.
The DataOps world is also exploding with new tools and platforms. Many data engineers work with an array of tools, making managing pipelines additionally complex, although logical data structures and data fabric platforms are helping to bring back some sanity. It also raises questions about who owns the different parts of the pipeline.
Meanwhile, stricter data regulations and rising data privacy laws make data governance even harder. Complex regulations like GDPR and CCPA can add new complexities to managing data and data sources. In some jurisdictions, AI models can sometimes create data that can be viewed as personal information, creating new challenges.
Deploying a data governance framework while addressing these challenges is not easy. Very few companies are also willing to press the reset button to start data governance afresh.
This is where DataGovOps comes in. It offers a simple proposition: what if we codify data governance as part of the DataOps workflows?
What is DataGovOps trying to automate
To understand why DataGovOps is a hot trend in data science today, we need to understand what data governance does.
At its simplest definition, data governance involves building a business glossary or data catalog, understanding data lineage (which is becoming crucial for financial services companies), creating data quality definitions, policing data security, and ensuring defined roles and responsibilities.
DataGovOps gives a modern makeover of these responsibilities. For example, it codifies data catalogs. Instead of data lineage, it goes further by looking at process lineage. It also promotes automated data testing and, in turn, is creating a new category of self-service sandboxes.
The common theme that links all the benefits is automation. By adding “Gov” in between “Data” and “Ops,” it seeks to codify governance and automate it.
In a Medium article “Data Governance as Code,” Data Kitchen captures its essence well: “The orchestration that deploys the new data, new schema, model changes, and updated visualizations also deploys updates to the data catalog. The orchestrations that implement continuous deployment include DataGovOps governance updates into the change management process. All changes are deployed together. Nothing is forgotten or heaped upon an already-busy data analyst as extra work.”
The required mindset shift
While the advantages of DataGovOps are clear, companies need to overcome two significant hurdles in deploying them, none of which involves coding.
One is the pervasive use of the responsibility, accountability, contribution, and informed (RACI) matrix for data governance. As described in this article, the matrix assumes that an individual will make all the decisions and a single person is accountable for every decision.
In reality, decisions are made by several people. In an agile development environment, you need to empower the team, not a closed set of individuals. Besides, as companies get asked to include data ethics as part of the data governance, it raises the “who watches the watcher?” question.
Such a “waterfall culture” also adds layers of bureaucracy which DataGovOps tries to cut through in the first place. It also limits how well data engineers can codify and automate data governance.
It’s why a data.world and DataKitchen whitepaper “Burned-out data engineers are calling for DataOps” saw 69% of data engineers saying their current governance programs are only making their job more difficult.
DataGovOps also needs to focus less on compliance and outcomes and focus more on the end-user, wrote Alation’s data governance and enablement specialist Aaron Bradshaw in his blog.
That is not an easy call for DataOps teams to make. In some regulated industries, compliance teams have better political capital. So, it’s left to the senior management to understand the evolving role of data governance and not see it as a necessary cost center for reducing risks.
Meet the DataGovOps ally
However, things are changing fast. New concepts and practices are helping to drive DataGovOps.
One is data mesh, a rising trend in its own right. It highlights the value in domain-driven ownership of data, data as a product, self-service data platform, and federated computational governance, as discussed in a ThoughtWorks webinar.
Data mesh does have its own set of challenges. But it aligns very well with the DataGovOps concept. And as companies see the value in embracing data meshes (and many vendors are now aligning their tools to this concept), it will empower DataGovOps.
Then overworked and stressed-out DataOps teams can at least breathe a little. They can start focusing more on building pipelines and automating models and worry less about hidden governance landmines blowing up in their faces.
Winston Thomas is the editor-in-chief of CDOTrends and DigitalWorkforceTrends. He’s a singularity believer, a blockchain enthusiast, and believes we already live in a metaverse. You can reach him at [email protected].
Image credit: iStockphoto/jamesteohart