KGC 2022 Keynote: 'Deep Learning with Knowledge Graphs' by Stanford's Prof. Jure Leskovec

TL;DR

Graph neural networks extend representation learning from sequences and fixed-size grids to heterogeneous relational data expressed as nodes and edges with typed attributes.

Briefing Cornell Notes

Briefing

Graph neural networks are positioned as the next general-purpose deep learning framework for relational data—able to learn directly from heterogeneous graphs, scale to massive networks, and deliver concrete gains in recommendations, security, reasoning, and drug discovery. The core idea is that modern deep learning’s biggest productivity and performance wins come from representation learning, but most neural architectures were built for sequences (text, speech) and fixed-size grids (images). Graph neural networks extend representation learning to data where meaning lives in connections: nodes, edges, and rich attributes across many entity types.

Leskovec frames graphs as a universal modeling substrate for real-world domains. Nodes can represent users, drugs, diseases, atoms, or accounts; edges can represent relationships like “buys,” “treats,” “interacts,” or “connected by a bond,” often with their own attributes. Predictions then come in three standard forms: node-level tasks (e.g., churn), link-level tasks (e.g., whether a user will buy), and graph-level classification (e.g., whether a molecule is toxic or treats a disease). What makes graph neural networks distinct is the computation structure: each node builds a local neural computation graph shaped by its neighborhood. Message passing and permutation-invariant aggregation let the model transform neighbor information, combine it with the node’s own features, and repeat until a prediction is made.

A major practical challenge is scalability. Naively expanding neighborhoods leads to “neighborhood explosion,” where the number of sampled/visited neighbors grows too fast to train efficiently. Leskovec’s group developed GNN autoscale, which prunes and approximates large parts of the neighborhood while providing mathematical bounds on approximation error and preserving expressive power. The approach also enables faster execution through optimized CPU–GPU memory transfer patterns, yielding reported 10–100× speedups and lower memory use. These advances are packaged in PyG (PyTorch Geometric), a widely used library with hundreds of architectures, many benchmark datasets, sparsity-aware CUDA kernels, and support for heterogeneous graphs and graph AutoML via neural architecture search.

In applications, the talk links graph learning to measurable improvements. At Pinterest, a graph neural recommender fuses image and text signals with graph structure to create embeddings for pins and content, improving over visual-only recommenders at very large scale (billions of nodes and tens of billions of edges). In security and finance, graph-based learning is used for fraud and intrusion detection by exploiting the structure of interactions to flag bad behavior quickly.

For reasoning, Leskovec argues that combining large language models with structured knowledge graphs improves question answering—especially when logic becomes harder, such as longer questions with multiple entities or negation, where language models struggle. In biomedicine, a knowledge graph built from publicly available clinical trials on ClinicalTrials.gov (70,000+ trials; 10 million+ nodes; 18 relation types) supports drug efficacy and adverse-effect prediction. The approach reports a 16× improvement in efficacy prediction versus text-only methods, better performance with less training data, and strong generalization to drugs never seen in training by connecting new candidates to their protein targets and the rest of the graph.

Finally, the talk extends graph learning to business data automation. Instead of heavy feature engineering and long iteration cycles, relational schemas can be represented as graphs and learned end-to-end with graph neural networks, reducing the need for feature stores and pipelines and accelerating time-to-value. The takeaway: graph neural networks generalize CNNs and Transformers as special cases, scale to huge graphs, and deliver domain-specific gains by learning task-tuned representations from relational structure.

Cornell Notes

Graph neural networks (GNNs) bring representation learning to relational data by using message passing over graph neighborhoods. They support node-, link-, and graph-level predictions on heterogeneous graphs with typed nodes/edges and rich attributes. A key scalability bottleneck—neighborhood explosion—is addressed by GNN autoscale, which prunes/approximates neighborhoods with bounded error while preserving expressive power, enabling large speedups and lower memory use. PyG (PyTorch Geometric) operationalizes these ideas with many architectures, heterogeneous-graph support, and optimized kernels. In practice, graph learning improves recommendations, fraud detection, knowledge-graph reasoning (especially with negation), and drug efficacy/adverse-effect prediction from large clinical-trial knowledge graphs.

Why does representation learning matter, and what limitation of standard deep learning does the talk target?

Representation learning is the core deep learning advantage: neural networks learn features tuned to the downstream prediction task, reducing manual feature engineering and improving both performance and productivity. Standard architectures are largely built for sequences (linear structures like text and speech) and fixed-size matrices (images). The talk targets the gap for complex relational data where structure is best expressed as graphs rather than sequences or fixed grids.

How do graph neural networks make predictions on different kinds of tasks?

GNNs can produce three main prediction types using the same framework: (1) node-level predictions such as churn (predicting a user’s future behavior), (2) link-level predictions such as whether a user will buy a product or whether a drug treats a disease, and (3) graph-level classification such as whether a molecule is toxic or treats a disease. The model’s computation is shaped by each node’s local neighborhood, using message passing plus permutation-invariant aggregation.

What is “neighborhood explosion,” and how does GNN autoscale address it?

Neighborhood explosion happens when expanding a node’s neighbors repeatedly causes the number of involved nodes to grow too quickly, making training infeasible. GNN autoscale prunes and approximates large parts of these neighborhoods to keep computation manageable. The approach includes theoretical bounds on approximation error and claims no loss of expressive power, and it is implemented to speed up CPU–GPU execution with smaller computation graphs and faster memory transfer patterns.

What role does PyG play in making graph neural networks practical?

PyG (PyTorch Geometric) provides a developer-focused library for graph representation learning. It supports many architectures (reported as 550), many benchmark datasets (reported as 200), heterogeneous graphs, and graph AutoML/neural architecture search. It uses sparsity-aware, highly optimized CUDA kernels and integrates with the PyTorch ecosystem, making it easier to prototype and scale graph neural network research and production systems.

How does graph learning improve reasoning compared with relying only on large language models?

The talk describes combining large language models (e.g., GPT-3) with structured Common Sense knowledge graphs. Adding structured information improves question answering accuracy, and it also enables explanation via reasoning paths through the graph. The biggest gains appear when questions require more structured logic—longer multi-entity questions and cases involving negation—where language models are described as weaker.

What does the clinical-trials knowledge graph enable in drug efficacy prediction?

Using publicly available clinical trial data from ClinicalTrials.gov, the group builds a large knowledge graph with 70,000+ trials represented as entities and extracts drugs, diseases, dosages, and eligible populations. The resulting graph has 10 million+ nodes and 18 relation types. Graph neural networks learn embeddings that support multiple prediction tasks, including drug efficacy and adverse effects. The talk reports a 16× improvement in efficacy prediction over approaches that treat trials as text, better performance with fewer training examples, and generalization to brand-new drugs by connecting them to targeted proteins and the existing graph structure.

Review Questions

What mechanisms in a GNN make aggregation permutation-invariant, and why does that matter for graphs?
How does GNN autoscale preserve expressive power while reducing computation, and what problem does it target?
In what types of question-answering scenarios does combining knowledge graphs with large language models produce the largest gains, according to the talk?

Key Points

1
Graph neural networks extend representation learning from sequences and fixed-size grids to heterogeneous relational data expressed as nodes and edges with typed attributes.
2
GNNs support node-, link-, and graph-level predictions using message passing and permutation-invariant aggregation shaped by each node’s local neighborhood.
3
Scalability hinges on controlling neighborhood growth; GNN autoscale prunes/approximates neighborhoods with bounded error while preserving expressive power, enabling 10–100× speedups.
4
PyG (PyTorch Geometric) is a widely used ecosystem for graph neural network research and deployment, offering many architectures, heterogeneous-graph support, and optimized CUDA kernels.
5
Graph neural recommenders can outperform visual-only systems by fusing image/text signals with graph structure at very large scale.
6
Knowledge-graph-enhanced reasoning improves question answering accuracy and interpretability, with especially strong benefits for negation and multi-entity logical structure.
7
Clinical-trial knowledge graphs built from ClinicalTrials.gov can support drug efficacy and safety prediction, including generalization to never-before-seen drugs by leveraging protein-target connections.

Highlights

Each node in a graph neural network builds its own computation structure based on its neighborhood, letting the model adapt to local graph shape rather than assuming a fixed input size.

GNN autoscale tackles neighborhood explosion by pruning/approximating neighborhoods while providing theoretical error bounds and claiming no expressive-power loss.

PyG is positioned as a de facto standard for graph neural network development, with extensive architecture coverage and heterogeneous-graph support.

Combining GPT-3-style language modeling with structured Common Sense knowledge graphs improves reasoning accuracy and enables explanation via reasoning paths.

A clinical-trials knowledge graph from ClinicalTrials.gov (70,000+ trials; 10 million+ nodes) supports drug efficacy prediction with a reported 16× improvement over text-only approaches.

Topics

Graph Neural Networks
Representation Learning
GNN Autoscale
PyTorch Geometric
Knowledge Graph Reasoning
Drug Discovery
Clinical Trials
Fraud Detection
Graph Recommenders
AutoML for Graphs

Mentioned

PyTorch Geometric
PyG
PyTorch
GPT-3
Amazon
Pinterest
TigerGraph
mlflow
SageMaker
CUDA
PyTorch Geometric
Jure Leskovec
Yingding
Joe Leskovic
GNN
GNN autoscale
PyG
CPU
GPU
ETL
ML
AutoML
CUDA
GPT-3