To produce a great piece of music, orchestras require a variety of independent, skilled musicians working together under an equally skilled conductor. With all the innovations in their instrumentation and the talent of composers, success for an orchestra comes down to humans working together perfectly at the moment — ideally making it all look effortless.
And so it is with maximizing data as an enterprise asset, the end game for DataOps.
DataOps, also known as modern data engineering, is the industrialization of rapid data delivery and improvement in the enterprise (just as DevOps is for software development).
Emerging DataOps infrastructures will include a high-performing, integrated “data supply chain” of data suppliers, data preparers, and data consumers. The three Vs of big enterprise data — volume, variety, and velocity — and the people who need that data both demand it.
This is why Tamr, in evolving its solution for automating DataOps, spends as much attention on the people involved in DataOps as we have on technologies and processes. You can have the best tools and mindset, but it won't work at scale if you don’t have the right skill set for DataOps.
At Tamr, we’ve identified eight DataOps personas essential to delivering data as an enterprise asset — the key kinds of user roles powering modern enterprise data ecosystems. (More on these below.) You may not recognize some of these titles, but you may recognize people in your organization who are already embracing the job — or ready to step up.
Towards a Functional, Integrated Data Supply Chain
For effective DataOps, everything starts and ends with the data consumer because that’s who ultimately turns data into business value. Data consumers include an exploding generation of data citizens who bring digital-native-style data fluency, expectations shaped by early self-service data preparation, and increasing analytic skills to their jobs.
Large companies’ emerging data ecosystems include a growing collection of data suppliers: source owners, from business/IT architects responsible for data residing in SAP and SF.com applications to DBAs responsible for Oracle and mainframe-transaction databases.
Last — but certainly not least — is a new generation of data preparers, who provide the critical link between data suppliers and data consumers. Data preparers include data engineers and ELT professionals empowered with DataOps’ Agile approach and interoperable, best-of-breed tools. There are also two newer titles, data steward and data curator. Combined, these people lead the delivery and support of high-quality, mission-ready data at scale, resulting in superior operating efficiencies and transformational analytics.
Newly appointed chief data officers (CDOs), who are ultimately responsible for data assetization, have a significant task in front of them. Like chief information officers in the 1980s, CDOs need to execute on technical fronts while also overcoming the unique challenges that are human at the core. These include people’s hopes, fears, and idiosyncrasies when dealing with “their” data.
As in our orchestra example: these diversely skilled, independently tasked, and individually motivated people must collaborate under a skilled conductor (modern data engineering/DataOps and the CDO) to deliver clean, unified, and mission-ready data to the business at scale. Ideally effortlessly — or at least without the high cost, complexity, and risk of current methods like traditional MDM.
The Eight Personas of DataOps
Each persona is defined by a collection of similar roles, goals, attitudes, and jobs to be done. Only by empowering each of the DataOps personas thoughtfully can modern data engineering/DataOps automation maximize data as an enterprise asset.
Here is how we’ve helped Tamr customers build out their DataOps infrastructures with the right empowered personas.
Data consumers
The data citizen
“What if I could get the data I need when, where, and in the format for my job — every time?”
Data citizens are the people on the front lines of the business who make critical business decisions every day, using common business apps and data visualization tools. Their ultimate bosses are CMOs, CFOs, CPOs, and other “C’s.” They need accurate, timely, and “mission-ready” data delivered right into their tools from one source. For example, they need to be able to go to one place for all their customer data without knowing the technical nuances of getting there.
The data analyst
“What if I could get better, cleaner data to deliver better insights to my CXO — and faster?”
Data analysts deliver critical insights to the business, typically through dashboards and reports built in PowerBI or other tools. Their bosses are CMOs, CFOs, and other decision-makers in the business. They know the analytical needs of the business but need accurate, clean, and up-to-date data from multiple sources, as well as the knowledge of where to find it. While they are more data-internals-savvy than the average data citizen, poor data wastes their time and can delay time-critical insights for the business.
The data scientist
“What if I could get my models and algorithms deployed and operating faster?”
Data scientists are the rocket scientists of the data supply chain. Usually crazy-smart about applying data to solve nuanced business problems, they build the models and algorithms for predictive analytics and other sources of competitive advantage. Their bosses are CMOs, CFOs, and other high-level decision-makers in the business. They need accurate, clean, and up-to-date data from multiple sources, as well as the knowledge of where to find it and the source owners. When “bad” data finds its way into models and algorithms, it can have a domino effect on the business. Expensive to recruit and hire, many data scientists today are still spending 70-80% of their time on data preparation tasks before they can get to the science part of their jobs. They also need real-time insight into the data behind their models to iterate and defend them (an audit trail for how models make decisions).
The data developer
“What if I could get my apps to market faster, without reinventing the wheel or bugging DBAs?”
Data developers build business applications from corporate data. Examples are customer 360 portals or supplier management applications. Their bosses are CMOs, CFOs, CPOs, and other “C’s.” They need accurate, clean, and up-to-date data from multiple sources — including from outside the business — and knowledge of where to find data sources and their owners. While empowered by advanced, open development tools like Python and REST, the more time they have to spend worrying about the accuracy of corporate data sources, the less time they can spend on inventively weaving data into compelling business applications.
While data consumers don’t directly use Tamr Unify, they benefit from its bi-directional feedback channel with data preparers and data source owners. For example, a data citizen can hit a context-sensitive Tamr hot button in her Tableau data viz app to query a data steward or curator about a piece of data. Data consumers can also funnel requirements and provide feedback to data preparers.
Data consumers also need Golden Records, a “single version of truth” that represents the best data available about an entity so users can be sure they have the correct and complete version of master data for an entity.
Data preparers
The data engineer
“What if I could simplify and speed how I build data pipelines to better serve consumers?”
Data engineers, a long-established role in IT, are the heart and soul of DataOps, but their ultimate boss today is the CDO. Simply put, data engineers turn raw data into consumables, building complex pipelines that automate the delivery of unified data. Tamr Unify enables data engineers to get beyond the constraints of their traditional tools (ETL, SQL) and Waterfall approaches to preparing data. Instead, they enjoy Agile data-preparation methodologies, the ability to work with interoperable/best-of-breed tools, and advanced automation when building data pipelines. Our AI-driven, human-guided system accelerates the transformation of raw data from sources into business-ready data for data consumers in an iterative, repeatable and scalable fashion. Data engineers work closely with their new colleagues, data curators (to understand data consumer needs), and data stewards (to integrate what they’ve learned about how data consumers use data to continually improve the data).
The data curator
“What if I could serve my customers faster and more accurately?”
Data curators are a new/emerging role operating under DataOps and the CDO. Curators couple an in-depth knowledge of how data consumers use the data (context) with the technical chops to fine-tune it (content), ensuring that data consumers get the data in the most actionable forms. Consumers can go to one authoritative person and source for each core entity in their business (e.g., customers, suppliers, products). They can stop creating their own versions of the truth for individual projects because there’s always a better, curated source for them to use, one that has benefited from contributions outside of their own work. Curators play a role in data governance and ensure that data distribution is aligned to corporate governance standards. Look for more data curators as the enterprise data franchise becomes bigger, better, and more strategic to the business.
The data steward
“What if I could track and improve critical data sources more easily and securely?”
Data Stewards are another evolving role operating under DataOps and the CDO. Stewards couple an understanding of data sources and knowledge of their operational use to create policies, ensure proper usage (governance), and continually improve the data and its usefulness. Stewards leverage new tools to gather feedback from consumers as they work with the curated data and incorporate it to drive data source remediation and governance policy updates. Through systematic collection and management of data feedback, stewards can focus their efforts on the most popular — or lowest quality — data sources. Stewards may align to individual consumer roles: for example, highly technical stewards specializing in collaboration with data scientists have their own special requirements and contributions regarding data quality. Again, look for more data stewards, particularly as the enterprise gets more ambitious in creating/opening up critical data sources to more uses, users, and applications.
Both data curators and data stewards directly benefit from Tamr Unify. For example, both use Unify’s expert sourcing in understanding consumers’ uses of data and fine-tuning data delivery. A new addition to Tamr Unify, Tamr Steward, provides data stewards with an issue tracking and feedback system built for analytics — the first of its kind.
Data Suppliers
The data source owner
“What if I could give the business the maximum value from my data while protecting its integrity?”
Data source owners own the data. Their ultimate boss is the CIO, and they generally have IT skills, often at the enterprise-application-architect level. They bought, operate/maintain and own major operational applications (e.g., ERP, CRM, SF.com), business databases (e.g., Oracle, CICS), and relationships with third-party data vendors. They also need visibility of their data and who is using it.
While data source owners don’t directly use Tamr Unify, they benefit from its bi-directional communications channel with data preparers. In this way, they can receive queries, requirements, and timely feedback from data consumers. They also benefit from the visibility of their data and its use provided by Tamr Unify.
When you put it all together, this is how Tamr Unify helps these personas work together (see below): an efficient, “closed-loop” system that maximizes the delivery of high-quality enterprise data at scale.
Activating the power of people
If you’re moving to DataOps, you may already have people with these persona titles — or, if not the title, the responsibilities or aspirations to it. The following chart can help you identify your essential personas.
Tamr can help you capture the power of people in designing Agile, repeatable, and scalable DataOps, including avoiding “persona creep.”
Andy Palmer, chief executive officer and co-founder of Tamr, wrote this article. The original article is here.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/cyano66; Graphics and tables: Tamr