KGC23 Keynote: The Future of Knowledge Graphs in a World of LLMs — Denny Vrandečić, Wikimedia
Based on The Knowledge Graph Conference 's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Knowledge graphs provide factual answers through explicit entity-relation storage, making many lookup-style questions cheaper and faster than token-by-token LLM inference.
Briefing
Large language models can answer questions, but knowledge graphs deliver the same kind of factual reliability far more cheaply—especially when answers require precise lookups rather than generative computation. Denny Vrandečić, co-founder of Wikidata and a longtime knowledge-graph architect, argues that the most practical future isn’t choosing between LLMs and knowledge graphs. Instead, it’s combining them: use LLMs as an interface and orchestration layer, while knowledge graphs provide ground truth, auditability, and efficient retrieval.
Vrandečić frames the moment as a rapid adoption cycle similar to past technology inflections, pointing to how quickly “stable diffusion” reached massive user numbers. That speed, he suggests, has created a kind of shock—teams are scrambling to understand what LLMs change in how knowledge is processed and where existing systems fit. He narrows the scope to the technical relationship between knowledge graphs and LLMs, explicitly avoiding broader debates about ethics, copyright, or existential risks.
The core case for knowledge graphs comes from first principles: a knowledge graph stores entities and relationships so answering many factual queries becomes a graph lookup. A large language model, by contrast, must run inference across many layers and parameters to generate tokens—even when the question is essentially a retrieval task. Vrandečić illustrates this with a concrete example about who created Raphael’s “School of Athens.” He reports that LLMs produce fluent, contextual answers but take seconds, while Wikidata-style querying returns quickly. He then scales the comparison using cost estimates: thousands of Wikidata queries in cloud settings cost cents, while running GPT-4-class inference at similar scale can cost dollars—on the order of tens of times more.
He also argues that LLMs struggle with consistency and provenance in ways that matter for real-world knowledge. A Wikipedia “rabbit hole” about the birthplace of actress Anna Begović yields conflicting answers across Google, Wikipedia-derived sources, and Wikidata, with contamination from earlier claims complicating verification. In another example, he describes how asking an LLM-based system for the birthplace of a person can produce different answers depending on language context, even when the underlying facts should be stable. He further notes that LLMs can miss edge cases—like “mayors of cities born after 1998”—and may respond confidently while being inconsistent with earlier statements.
From there, the proposed direction is architectural. Vrandečić advocates “augmented language models,” where LLMs don’t just generate text but call external tools: knowledge-base queries, math engines, and function libraries. He points to “toolformer” as an example of using LLMs to decide which services to invoke. In this setup, knowledge graphs become both a knowledge store and a set of functions—something LLMs can query for ground truth rather than relearn facts repeatedly inside model parameters.
He adds a second, structural argument about why facts shouldn’t be internalized in model weights. With models growing large to memorize trivia-like knowledge (including multilingual entity facts), he questions whether it’s efficient to embed millions of statements that could instead live in curated, editable symbolic systems. His closing vision is a world where LLMs generate infinite content, but knowledge graphs preserve the “true story”: auditable, editable, and designed to handle uncertainty explicitly. He even suggests adding a new special value for “it’s complicated” to represent cases where knowledge lacks a single clean ground truth, enabling systems to spend more effort verifying those edges. The takeaway: LLMs are powerful language interfaces, but knowledge graphs are the infrastructure for reliable, cost-effective knowledge.
Cornell Notes
Denny Vrandečić argues that knowledge graphs remain essential in an LLM world because they provide efficient, auditable factual retrieval that generation models can’t match on cost or consistency. He contrasts graph lookup—where answers come from stored entities and relationships—with LLM inference—where producing an answer requires running through many layers to generate tokens even for simple facts. He cites examples of conflicting or inconsistent birthplace answers across languages and systems, plus edge cases where LLMs fail to find facts that a knowledge graph query can return. The proposed path forward is “augmented language models”: use LLMs as an orchestration and UX layer that calls knowledge-graph queries and other tools for ground truth. This combination aims to reduce hallucinations, improve explainability, and keep costs manageable at scale.
Why does Vrandečić claim knowledge graphs can be cheaper than LLMs for factual question answering?
What kinds of failures does Vrandečić describe that motivate grounding answers in knowledge graphs?
How does the “augmented language model” idea change the division of labor between LLMs and knowledge graphs?
What does Vrandečić mean by “don’t internalize every fact in model parameters”?
Why propose a special value like “it’s complicated” in knowledge graphs?
How does language affect LLM-based answers in Vrandečić’s examples?
Review Questions
- In Vrandečić’s framework, what specific tasks should be handled by knowledge graphs versus by LLMs?
- What evidence does he give that LLM answers can vary by language or fail on edge cases, and why does that matter for reliability?
- How would an “augmented language model” reduce hallucinations compared with a pure text-generation approach?
Key Points
- 1
Knowledge graphs provide factual answers through explicit entity-relation storage, making many lookup-style questions cheaper and faster than token-by-token LLM inference.
- 2
LLMs can generate fluent context, but their inference cost and repeated computation make them inefficient for large-scale factual retrieval.
- 3
LLM outputs can be inconsistent across systems and languages, and they may cite sources that don’t actually support the claimed facts.
- 4
A practical path forward is “augmented language models” that orchestrate tool calls—knowledge-graph queries, math engines, and function libraries—rather than generating everything internally.
- 5
Embedding millions of factual statements inside model weights is inefficient when curated, editable symbolic knowledge stores already exist.
- 6
Knowledge graphs can improve auditability and explainability by grounding answers in queryable, structured data rather than opaque model memory.
- 7
Representing uncertainty explicitly (e.g., a proposed “it’s complicated” value) can help systems handle cases where truth isn’t a single clean value.