LinkedIn Shares More About its Graph Database

How do you handle two million queries per second across a database containing the relationships between more than 270 billion entities? Why, you put together your own graph database designed to support queries in real-time, of course.

The graph database in question is named LIquid, and it took LinkedIn four years of effort to build. According to a post on the LinkedIn Engineering blog, LIquid is a single, automated service to replace hand-coded queries; it is queried with a general-purpose query language to return results in an optimal fashion.

In a blog post last week, Bogdan Arsintescu, the director of engineering at LinkedIn shed further light on the inner workings of the popular business-focused social media platform and elaborated more about the Economic Graph, LinkedIn’s domain-specific knowledge graph implementation.

At LinkedIn scale

One way that the Economic Graph provides value for members is through second-degree connections, says Arsintescu. Think in terms of the colleague of an old school friend or the new boss of a co-worker from a prior job – and the paths to get to them.

But generating this out at scale and in real-time is no simple task. To illustrate his point, Arsintescu noted that a LinkedIn user with 1,000 connections with 100 connections having 1,000 connections each will quickly bring the number of second-degree connections to over 100,000 connections.

The grunt work falls on LIquid, which is responsible for hosting, indexing, and providing real-time access to all connections within the Economic Graph. To handle rapid growth and ensure uninterrupted access, everything is kept in a single graph database, the entirety of which is loaded into working memory.

LinkedIn is prepared for growth: Arsintescu says LIquid was designed to scale up to ten times its current size. Indeed, he says the 2 million queries per second metric is expected to double within the next 18 months as more people join and use LinkedIn.

Powering the social networking service

But what does it take to power everything? A report on The New Stack detailed the computing horsepower required to run LIquid. To provide the much-needed memory, the system both scales up and scales out. Specifically, memory is shared across servers, with as much memory packed into each server as possible – at least a terabyte for each server.

Moreover, the Economic Graph is held in a replica, built on a cluster that ranges between 20 and 40 servers. Each replica is capable of serving a certain amount of the load; the system’s throughput can be increased by adding more replicas.

Though confident of the current scalability, Arsintescu nonetheless revealed that the team is currently looking at ways to break apart this homogeneous architecture implementation.

He notes that LIquid's homogeneous architecture treats all data equally, leading to inefficiencies as the graph grows. Addressing these challenges requires adopting a heterogeneous approach with tiered storage and workload optimization, allowing for auto-tuning and dynamic optimization of cost-saving features.

Unfortunately, there are no plans to release LIquid at the moment, which is in its fourth generation. It is being deployed in other parts of Microsoft, however.

“LIquid is now an essential component in delivering value to our members via real-time access to relevant data for three other knowledge systems in LinkedIn. This successful implementation has inspired other teams within LinkedIn to adopt it as a graph index and for sister teams at Microsoft to show interest in the technology.”

Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].​

Image credit: iStockphoto/NicoElNino