Masterclass: Knowledge Graphs & Massive Language Models — The Future of AI, RelationalAI

TL;DR

LLMs are framed as “instructable computers” because they can read language inputs and follow instructions to perform professional tasks, not because they behave like humans.

Briefing Cornell Notes

Briefing

Conversational AI is being treated as a new “computer for humans,” but the practical breakthrough isn’t that it behaves like people—it’s that it can reliably consume and generate language at scale, then be steered to perform professional tasks. The session frames today’s large language models (using GPT-4 as the exemplar) as instructable systems: they can read text and soon multimodal inputs (images, audio, video), operate within private intranets, and follow instructions well enough to become an operational layer for knowledge work in finance, law, healthcare, and entertainment.

A central theme is how this capability emerged. The talk contrasts older “computer science 1.0” approaches—where engineers specify algorithms and data models explicitly—with “computer science 2.0,” where code is effectively learned from data. Neural networks learn parameterized continuous functions (with GPT-4 described as having up to a trillion parameters), trained via stochastic gradient descent to minimize prediction errors. For language, the key training mechanism is self-supervision: predict the next token from prior tokens. Scale matters: more data, more parameters, and better architectures (especially Transformers with self-attention) enable generalization across many tasks without building a separate system for each.

Yet raw language modeling isn’t enough for “instructable computers” that stay on task. The session highlights dialogue management and reinforcement learning with human feedback (RLHF) as a way to align outputs with preferences. Instead of asking humans to label the “correct” answer, evaluators compare alternatives and choose which is better for the goal—simplifying feedback into scalable preference signals.

From there, the talk pivots to enterprise knowledge infrastructure, especially knowledge graphs. Knowledge graphs remain valuable because they provide structured, verifiable representations with integrity constraints, and they reduce ambiguity that embeddings and free-form text can struggle with. But large language models can also generate knowledge graphs “from thin air,” producing ontologies and extracting facts from documents using instructions like “answer questions based on this ontology” or “emit a Datalog schema.” The tradeoff is control: models may hallucinate entities or relationships, so the workflow must treat the model as an unruly collaborator—iteratively asking it to revise the ontology and checking outputs against constraints.

The session also lays out practical architectures for combining systems. One approach uses knowledge graphs as a cache or pre-processing layer, then lets the language model handle natural-language querying, reasoning, and explanation. Another approach relies on vector databases (embeddings) as external memory for retrieval-augmented generation, then uses multi-hop prompting to iteratively ask for missing information. The talk emphasizes that multi-step reasoning can be expensive due to multiple model calls and token costs, so retrieval and context selection matter.

The “elephant in the room” is the enterprise decision: should teams invest in knowledge graphs and logical models, or in embeddings plus LLMs? The answer offered is not either/or. Knowledge graphs can improve reliability and entity consistency over time, while LLMs reduce the friction of building and querying those structures in natural language. The future described is a hybrid stack—knowledge graphs for precision and governance, language models for instruction-following and flexible interaction, and retrieval mechanisms to keep context grounded.

Cornell Notes

The session argues that modern LLMs (GPT-4 as the example) function as “instructable computers” for professionals because they can read and generate language, follow instructions, and be aligned to goals via dialogue management and RLHF. Their capabilities come from learning continuous, parameterized functions from massive text using self-supervised next-token prediction and Transformer architectures, with scale as a key driver. In enterprise settings, LLMs can draft ontologies and extract facts from documents, but they can also hallucinate—so outputs must be checked and revised against integrity constraints. Knowledge graphs remain valuable for reliability and structured reasoning, and the most practical direction is hybrid: use knowledge graphs for governance and verification, and LLMs plus retrieval/vector search for natural-language interaction and multi-step problem solving.

Why does the talk treat GPT-4-like systems as a new “computer for humans,” and what makes them different from traditional software?

They’re framed as systems that can be instructed in natural language and then execute work without engineers writing a bespoke algorithm for each task. Instead of “programming” via explicit code, users provide instructions and context; the model reads language inputs and generates outputs token-by-token. The talk stresses that the key shift is operational: humans can interact with the system using language, and the system can handle many tasks through learned patterns rather than hand-specified logic.

How did the field move from “computer science 1.0” to “computer science 2.0” in this explanation?

“1.0” is described as specifying real-world abstractions and algorithms explicitly—modeling phenomena, writing algorithms, and coding them in a programming language. “2.0” is described as learning the function from data when code isn’t known in advance: neural networks approximate continuous functions with many parameters, trained to minimize loss. The example given is image labeling (learn from labeled examples), generalized to language via next-token prediction.

What training objective makes language models work without human labeling for every example?

The talk describes self-supervision: given a text sequence, remove the last token, feed the remaining tokens to the model, and train it to predict the missing next token. Because the “correct” next token is already present in the raw text, the system can generate its own training signal at scale, iterating with stochastic gradient descent.

Why are knowledge graphs still important if LLMs can generate ontologies and extract facts?

Knowledge graphs provide structured representations with integrity constraints, making answers more verifiable and consistent—especially for entity identity and relationships over time. The talk notes that LLMs can hallucinate entities/URLs or invent facts, so a knowledge graph (or constraints derived from it) helps catch errors. The recommended workflow treats the LLM as a collaborator that must be repeatedly asked to revise and then checked against the graph’s assumptions.

How do vector databases and multi-hop prompting fit into enterprise retrieval and reasoning?

Vector databases store embeddings of document passages and retrieve semantically similar text using nearest-neighbor search. Because retrieval may miss what’s needed, multi-hop prompting iteratively asks the model to identify missing information: first retrieve and attempt an answer, then request follow-up questions or additional retrieval until the original goal can be satisfied. This is positioned as an external-memory loop that mimics a human searching through files.

What’s the practical “hybrid” architecture direction implied by the enterprise tradeoff?

Use knowledge graphs for precision, governance, and verifiable structure, while using LLMs for natural-language querying, drafting schemas/ontologies, and explanation. Retrieval mechanisms (vector search) can supply relevant context, and dialogue management/RLHF helps keep outputs aligned to tasks. The talk repeatedly frames the best results as combining symbolic structure with language-model flexibility.

Review Questions

What specific training mechanism allows language models to learn from unlabeled raw text, and how does it translate into next-token prediction?
In the hybrid KG + LLM approach, what role do integrity constraints play in controlling hallucinations or invented entities?
Why can multi-hop retrieval and reasoning increase cost, and what strategies does the talk suggest to manage that cost?

Key Points

1
LLMs are framed as “instructable computers” because they can read language inputs and follow instructions to perform professional tasks, not because they behave like humans.
2
The shift from hand-coded algorithms to learned parameterized functions is central: neural networks approximate continuous functions trained via stochastic gradient descent.
3
Language model training relies on self-supervision—predicting the next token—so massive text can generate training signals without manual labeling for every example.
4
Alignment for instruction-following depends on dialogue management and RLHF-style preference feedback, where humans compare outputs rather than provide absolute correctness labels.
5
LLMs can generate ontologies and extract facts from documents, but they can also hallucinate; enterprise workflows must treat them as iterative collaborators and verify against constraints.
6
Knowledge graphs remain valuable for reliability and entity consistency, while vector databases and retrieval loops help ground answers in relevant enterprise documents.
7
The most practical direction is hybrid: combine knowledge graphs (precision/governance) with LLMs (natural-language interaction/reasoning) and retrieval (context selection).

Highlights

The talk’s core enterprise claim: LLMs can draft and query knowledge structures in natural language, but knowledge graphs provide the guardrails needed to prevent silent errors.

Self-supervised next-token prediction is presented as the engine that turns raw text into a trainable model without per-example human annotation.

Multi-hop prompting is described as an iterative “ask for what’s missing” loop that compensates for incomplete retrieval.

The “elephant in the room” is resolved as a hybrid stack rather than a binary choice between knowledge graphs and embeddings + LLMs.

Topics

Instructable Computers
Transformer Language Models
RLHF Alignment
Knowledge Graphs
Vector Databases

Mentioned

Nick Vasiloglou
Kate Crawford
Daniel McCreery
John McCarthy
Rich Sutton
Jeff Dean
Armando Solar-Lezama
Ras Bodik
Vijay
Blake
LLM
GPT
GPT-4
GPT-3
RLHF
SQL
TSP
LSTM
RNN
API
JSON
CSV
Datalog
KG
AI
AGI
SEC

Masterclass: Knowledge Graphs & Massive Language Models — The Future of AI, RelationalAI | KGC 2023