Can You Have Too Much Data?

Billed as one of the most valuable assets a business can have, data is considered instrumental in strategic areas such as deepening one’s understanding of the market or facilitating the development of new products.

Within this narrative, organizations must amass as much data as possible while bolstering their analytics and AI capabilities to mine them for insights. Unsurprisingly, reports show that big data market volume will exceed USD100 billion over the next five years.

So, the more data there is, the better things will be, right?

The case for quality data

It turns out that there are scenarios where too much data might not be a good idea, according to Karen Burns, cofounder of AI computer vision firm Fyma. In a contributed opinion piece for CPO Magazine, she argues that more data only generate value if organizations know how to use it.

“Data for data’s sake is completely pointless, and organizations are increasingly jumping the gun when it comes to pursuing ambitious insights strategies, without the know-how to make the most of new data,” wrote Burns.

In addition, there are various disadvantages to juggling a vast amount of data. More data means a larger attack surface, whether through data leaks or data breaches. Burns points out that the damage can be significant for at-risk industries such as healthcare, FSI, and telecommunications sectors.

Having more data also means more data wrangling, which can drive the cost up significantly – increasing an initially manageable cost to unsustainable levels. Finally, we know that bad data is a serious problem when training AI models, so much so that it can result in trained models that are effectively useless.

Start with strategic objectives

To avoid rushing headlong into data initiatives and ending up with a situation where the organization is gathering more data for its own sake, Burns suggests that data managers take a step back to review their strategic objectives and the best route to achieve them.

“CTOs need to be taking greater care over their cost-benefit analysis before looking to data analytics to improve their business models. Not all data will be useful, and you need a solution that is built from the ground-up to be compliant and secure,” she wrote.

Two key areas must be in place: the right data platform and the right talent to bring data projects to fruition. And businesses should probably not jump straight into more challenging areas such as AI before they get the right guidance from experts.

In a nutshell, quality data in smaller quantities leveraged to fulfil strategic objectives will impact the bottom line in a bigger way than chaotic and unfocused attempts at using any and every data. Burns noted: “People fall into the trap of thinking more data automatically generates more value and revenue.”

Growing momentum for small data

Fortunately, there is a growing awareness that the ability to make data-driven decisions doesn’t necessarily require a huge amount of data. Indeed, Gartner in a report last year predicted that 70% of organizations will shift their focus from big to small and wide data by 2025.

Unlike big data, small data is an approach that requires less data but still offers useful insights. According to Gartner, this approach might include certain time-series analysis techniques, few-shot learning, synthetic data, or self-supervised learning.

It is worth pointing out that Singapore, which is working hard to position itself as a leading hub for AI, has in recent months shared its intention to focus on data sharing and quality data. Of course, the country had previously also acknowledged that its datasets are comparatively smaller given the size of the country.

“Taken together, [small data and wide data] are capable of using available data more effectively, either by reducing the required volume or by extracting more value from unstructured, diverse data sources,” explained Jim Hare, a distinguished research vice president at Gartner.

“Taken together they are capable of using available data more effectively, either by reducing the required volume or by extracting more value from unstructured, diverse data sources.”

Is it time to take a second look at your data strategy?

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/SIphotography