Why Dark Data Matters

Organizations today are increasingly turning to data and analytics, as they seek to establish a data-driven culture to leverage the vast amount of data at their disposal.

Yet the explosion of data is arguably outpacing the ability of businesses to use it, resulting in the proliferation of dark data.

What is dark data?

What exactly is dark data? According to Gartner, dark data is the information assets organizations collect, process, and store during regular business activities, but generally fail to use for other purposes.

Dark data is typically generated as users interact with devices and services throughout the day and might include server log files, machine-generated data, or unstructured data such as social media feeds and customer communication. In a nutshell, dark data is the untapped and unexplored “big data” that the organization has yet to identify and use.

And dark data is more common than you might imagine. Indeed, 52% of a company’s data is “dark” on average, according to Veritas. On its part, Splunk pegs this figure at 55%, based on business and IT decision makers they asked in “The State of Dark Data” report (pdf).

Of course, the reason for the sheer volume of dark data can be partly attributed to the high cost of accessing and organizing the disparate data formats previously. Fortunately, the onward march of technology means there are now more tools than ever to manage data – more on that later.

Why does dark data matter?

In a recent blog entry, Matt Labovich at professional services firm PwC explained why dark data isn’t something that should be swept under the carpet from a regulatory point of view.

For instance, a typical utility might use a variety of sensors designed to detect gas leaks. Suppose a safety incident happens, and it was revealed that the sensors could have been used to predict or prevent the incident. This might well culminate in a costly lawsuit or a public relations nightmare.

Similarly, regulators would not be impressed by a bank attempting to deflect responsibility in the event of fraud if it turns out that existing data might have revealed red flags and prevented the breach, argues Labovich. After all, financial institutions are expected to use all the tools (and data) at their disposal to detect unauthorized activities and prevent fraud.

Dark data matters beyond regulated industries, too. Specifically, it has the potential to create new revenue streams or enable more efficient processes that can lead to lower costs and greater business competitiveness. For example, dark data such as unsolicited feedback can serve as an additional window into the customer experience, understand machine operating capacities, or perhaps even gauge employee morale and productivity.

Dark data can also uncover hidden correlations between pieces of information that were previously thought to be completely unrelated. Finally, because dark data can provide information that is not available in any other format, businesses can diversify their data analytics to offer more relevant and accurate insights.

Getting a handle on dark data

So how can businesses leverage technology to harness dark data? For a start, Databricks with its data lakehouse architecture makes it possible to hold data in its native, raw format while retaining the ability to generate insights from it; Informatica offers both strong cloud services and on-premises solutions to analyze data wherever they might be.

From his viewpoint, Labovich recommends the use of AI tools to parse data such as weather patterns or sensor readings. Where they are indecipherable by humans, AI can easily manage the data volume to identify potential anomalies and uncover other hard-to-find insights, he notes.

Separately, Ajay Bhatia, a vice president at Veritas in a contributed opinion piece recommends using AI to identify and manage untagged and unstructured data, quickly scanning, tagging, and classifying them for use.

To be clear, business leaders must also take an interest in data and establish a strong data culture within their organization. This includes driving data democratization and building data lakes to incorporate both structured and unstructured data as the foundation for new data initiatives.

“The more data that your company can analyze, the better decisions you’ll make. While it’s important to not get overloaded with information – keep going back to your business objectives and the risks you’re accounting for to determine what to observe – you should still be aware of what you’re collecting,” Labovich summed up.

As more organizations become more data-focused, it will be the organizations that can successfully leverage dark data to their advantage that will get ahead.

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/Maksym Kaplun