What are knowledge graphs, and how do they work? As we wrote in May, LinkedIn uses a proprietary knowledge graph implementation to power its business networking site. But how can the typical enterprise benefit from graph databases?
Neo4j’s Jim Webber was in Singapore last month for GraphSummit. He spoke to CDOTrends on knowledge graphs, how a graph database differs from a relational database, and how graph databases can play a role in machine learning.
Webber is the chief scientist at Neo4j, the developer of the graph database management system of the same name. He also recently released a book together with his colleague Jesús Barrasa titled “Building knowledge graphs: A practitioner’s guide”.
Understanding graph databases
How can we convert a relational database into a graph database? While Webber was quick to caution that plucking a random relational database table and pulling it into a graph probably isn’t ideal, he was happy to share.
“I've got a table; it's called the customer table and has rows in it. I will take all those rows and [turn them into] nodes in my [graph] database. And then because it came from the customer table, I will give each node the label ‘Customer’.”
So tables give you labels, and rows give you nodes.
The next step entails looking for the foreign keys on the relational database and using them to create the relationships between nodes on the graph database.
Webber says: “With graph, you now have more degrees of freedom; you are not hamstrung by the schema, and you can enrich that graph by running algorithms to look for opportunities to link records together that the relational model would perhaps have prohibited.”
This process could be through a human analyst armed with visualization tools or algorithms designed for the purpose. This entails creating relationships between nodes or tagging nodes with appropriate labels.
To graph or not to graph
But before you go running to convert all your relational databases into graph databases, are there scenarios where a graph database would be a poor fit?
“If your problem naturally decomposes or is a set, graph is a poor choice. Graphs are also a poor choice for undifferentiated bulk storage, such as storing logs for your application,” explained Webber
“When graphs come into their own is when you want to perform analysis. When I want to understand your shopping behavior, you bet I'm pulling that data into the graph because that's where I've got the rich associativity that allows me to understand it.”
The associativity of nodes is at the heart of graph’s strength, says Webber. The opposite is true of relational databases: “Relational systems work well until you join, and then it's a bit less predictable. And the more you join, the less predictable it becomes.”
“In graphs, associativity is normal. It's a first-class citizen. [Nodes] are all linked; relationships exist. So it means they're very cheap to traverse at query time.”
“In relational, we will go to many lengths to reduce our joins. We will do denormalization, fancy indexes, and all kinds of clever strategies, whereas, in graph, we don't. If there is naturally an association and link in our domain, we put it in the graph because we know the cost is minimal.”
Graph and AI
So how can graph benefit AI? Webber cited analyst Gartner about how half the queries they receive today revolve around AI or ML – and how half touched on the use of knowledge graphs: “This suggests an interesting trend that's bubbling under.”
“As a technologist, I think you've got two huge advantages with graph if you're doing traditional ML. If you are mining features from your data and training a model, graph gives you more features.”
“I can take your name, your age, your gender, and your postcode from a relational database. But what relational can't give me is your community, your PageRank, or your centrality score.”
Moreover, graph databases can return related data faster than relational databases slowed by joins. The faster response could well prove pivotal when training new ML models on expensive GPUs.
Webber shared an anecdote about a recent meeting with a Southeast Asian firm working on training its AI-powered chatbot. To reduce hallucination and erroneous responses, this firm constrained the training exclusively on data from a knowledge graph that represents how their products work, says Webber.
The future of Graph databases
Adoption of graph databases by the data science community has been rapid, especially so when they could see how it gives them a competitive advantage, says Webber.
“I see graph data increasingly taking over enterprise workloads. I think graphs are fit for purpose for the next 30 or 40 years, in the same way, that relational has been fit for purpose for the last 30 or 40 years.”
“Certainly, a lot of the systems I built in my previous roles building business systems would have been better done as a graph database. I can do certain large-scale analysis using graph in minutes or hours, versus days or weeks with relational.”
To be clear, Webber sees room for both relational and graph databases to exist alongside each other.
“When you talk to a builder, they will have tools such as hammers and saws. Can I cut a plank of wood with a hammer? Yes, if I hit it until it breaks. But I'll probably get injured, and the cut is not going to be nice. It’s about using the right tool for the right job,” he summed up.
Paul Mah is the editor of DSAITrends. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose. You can reach him at [email protected].
Image credit: iStockphoto/NicoElNino