KGC 2022 Keynote: 'Deep Learning with Knowledge Graphs' by Stanford's Prof. Jure Leskovec
Based on The Knowledge Graph Conference 's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Graph neural networks extend representation learning from sequences and fixed-size grids to heterogeneous relational data expressed as nodes and edges with typed attributes.
Briefing
Graph neural networks are positioned as the next general-purpose deep learning framework for relational data—able to learn directly from heterogeneous graphs, scale to massive networks, and deliver concrete gains in recommendations, security, reasoning, and drug discovery. The core idea is that modern deep learning’s biggest productivity and performance wins come from representation learning, but most neural architectures were built for sequences (text, speech) and fixed-size grids (images). Graph neural networks extend representation learning to data where meaning lives in connections: nodes, edges, and rich attributes across many entity types.
Leskovec frames graphs as a universal modeling substrate for real-world domains. Nodes can represent users, drugs, diseases, atoms, or accounts; edges can represent relationships like “buys,” “treats,” “interacts,” or “connected by a bond,” often with their own attributes. Predictions then come in three standard forms: node-level tasks (e.g., churn), link-level tasks (e.g., whether a user will buy), and graph-level classification (e.g., whether a molecule is toxic or treats a disease). What makes graph neural networks distinct is the computation structure: each node builds a local neural computation graph shaped by its neighborhood. Message passing and permutation-invariant aggregation let the model transform neighbor information, combine it with the node’s own features, and repeat until a prediction is made.
A major practical challenge is scalability. Naively expanding neighborhoods leads to “neighborhood explosion,” where the number of sampled/visited neighbors grows too fast to train efficiently. Leskovec’s group developed GNN autoscale, which prunes and approximates large parts of the neighborhood while providing mathematical bounds on approximation error and preserving expressive power. The approach also enables faster execution through optimized CPU–GPU memory transfer patterns, yielding reported 10–100× speedups and lower memory use. These advances are packaged in PyG (PyTorch Geometric), a widely used library with hundreds of architectures, many benchmark datasets, sparsity-aware CUDA kernels, and support for heterogeneous graphs and graph AutoML via neural architecture search.
In applications, the talk links graph learning to measurable improvements. At Pinterest, a graph neural recommender fuses image and text signals with graph structure to create embeddings for pins and content, improving over visual-only recommenders at very large scale (billions of nodes and tens of billions of edges). In security and finance, graph-based learning is used for fraud and intrusion detection by exploiting the structure of interactions to flag bad behavior quickly.
For reasoning, Leskovec argues that combining large language models with structured knowledge graphs improves question answering—especially when logic becomes harder, such as longer questions with multiple entities or negation, where language models struggle. In biomedicine, a knowledge graph built from publicly available clinical trials on ClinicalTrials.gov (70,000+ trials; 10 million+ nodes; 18 relation types) supports drug efficacy and adverse-effect prediction. The approach reports a 16× improvement in efficacy prediction versus text-only methods, better performance with less training data, and strong generalization to drugs never seen in training by connecting new candidates to their protein targets and the rest of the graph.
Finally, the talk extends graph learning to business data automation. Instead of heavy feature engineering and long iteration cycles, relational schemas can be represented as graphs and learned end-to-end with graph neural networks, reducing the need for feature stores and pipelines and accelerating time-to-value. The takeaway: graph neural networks generalize CNNs and Transformers as special cases, scale to huge graphs, and deliver domain-specific gains by learning task-tuned representations from relational structure.
Cornell Notes
Graph neural networks (GNNs) bring representation learning to relational data by using message passing over graph neighborhoods. They support node-, link-, and graph-level predictions on heterogeneous graphs with typed nodes/edges and rich attributes. A key scalability bottleneck—neighborhood explosion—is addressed by GNN autoscale, which prunes/approximates neighborhoods with bounded error while preserving expressive power, enabling large speedups and lower memory use. PyG (PyTorch Geometric) operationalizes these ideas with many architectures, heterogeneous-graph support, and optimized kernels. In practice, graph learning improves recommendations, fraud detection, knowledge-graph reasoning (especially with negation), and drug efficacy/adverse-effect prediction from large clinical-trial knowledge graphs.
Why does representation learning matter, and what limitation of standard deep learning does the talk target?
How do graph neural networks make predictions on different kinds of tasks?
What is “neighborhood explosion,” and how does GNN autoscale address it?
What role does PyG play in making graph neural networks practical?
How does graph learning improve reasoning compared with relying only on large language models?
What does the clinical-trials knowledge graph enable in drug efficacy prediction?
Review Questions
- What mechanisms in a GNN make aggregation permutation-invariant, and why does that matter for graphs?
- How does GNN autoscale preserve expressive power while reducing computation, and what problem does it target?
- In what types of question-answering scenarios does combining knowledge graphs with large language models produce the largest gains, according to the talk?
Key Points
- 1
Graph neural networks extend representation learning from sequences and fixed-size grids to heterogeneous relational data expressed as nodes and edges with typed attributes.
- 2
GNNs support node-, link-, and graph-level predictions using message passing and permutation-invariant aggregation shaped by each node’s local neighborhood.
- 3
Scalability hinges on controlling neighborhood growth; GNN autoscale prunes/approximates neighborhoods with bounded error while preserving expressive power, enabling 10–100× speedups.
- 4
PyG (PyTorch Geometric) is a widely used ecosystem for graph neural network research and deployment, offering many architectures, heterogeneous-graph support, and optimized CUDA kernels.
- 5
Graph neural recommenders can outperform visual-only systems by fusing image/text signals with graph structure at very large scale.
- 6
Knowledge-graph-enhanced reasoning improves question answering accuracy and interpretability, with especially strong benefits for negation and multi-entity logical structure.
- 7
Clinical-trial knowledge graphs built from ClinicalTrials.gov can support drug efficacy and safety prediction, including generalization to never-before-seen drugs by leveraging protein-target connections.