Lecture 10: Research Directions - Full Stack Deep Learning - March 2019
Based on The Full Stack's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Research output has grown so quickly that keeping up requires strategy—reading papers selectively and using reading groups to cut time spent on low-value work.
Briefing
Research momentum in deep learning has accelerated to the point where thousands of papers arrive every month, making it impossible for any one person to keep up by reading everything. Against that backdrop, the lecture lays out several “frontier” directions—especially learning systems that adapt quickly—then connects them to broader themes: how to learn from fewer examples, how to learn across tasks and environments, and how to make training more data- and compute-driven.
The first major thread is few-shot learning: getting strong performance from only a handful of labeled examples per new class. The lecture frames the problem as a mismatch between what supervised learning typically needs (large labeled datasets) and what humans do naturally (generalizing from minimal exposure). A key hypothesis is that models can reuse prior knowledge about object categories—such as typical boundaries between categories—if that prior is learned in advance. In practice, this often looks like pretraining on large datasets (e.g., ImageNet) and then fine-tuning on a new task, but the lecture highlights a limitation: success is frequently assumed rather than guaranteed when the new task differs from the pretraining distribution.
To make adaptation more reliable, the lecture introduces model-agnostic meta-learning (MAML), an approach designed to produce an initialization that is “ready for fine-tuning.” Instead of training on examples, meta-training is organized around tasks. Each task has its own small train set and validation set; the shared parameter vector θ is optimized so that after a gradient update on a task’s training data, the resulting parameters perform well on that task’s validation data. The goal is not just good performance on one dataset, but fast learning on new tasks drawn from the same task distribution. The lecture illustrates this with standard few-shot benchmarks like Omniglot and miniImageNet, where “5-way” classification tasks are sampled repeatedly from subsets of classes. Reported results emphasize that one-shot and five-shot performance can become dramatically better than baselines that either train on all past data directly or rely on learned update rules that don’t beat gradient descent in this setting.
The lecture then expands the idea of “learning to learn” into reinforcement learning. Here, the agent interacts with an environment, and reward arrives with delayed consequences, creating three core difficulties: credit assignment (figuring out which actions caused outcomes), stability (learning can destabilize behavior), and exploration (trying actions that may be informative but not immediately rewarding). Despite these challenges, reinforcement learning has produced major breakthroughs in Atari and Go, including self-play approaches that generate clearer learning signals than playing only against a fixed best opponent.
A central research direction is meta-learning for reinforcement learning: training an agent across many environments so it can adapt within a small number of episodes to a new, unseen environment. The lecture describes architectures such as recurrent neural networks (and later alternatives using dilated temporal convolutions plus attention) that can use past experience to make better decisions quickly. Experiments on bandits and navigation tasks illustrate both promise (rapid adaptation when task distributions align) and fragility (performance can fail when exploration doesn’t stumble upon rewarding trajectories, especially with sparse rewards and long horizons).
Finally, the lecture broadens to other “research directions” tied to practical constraints: reward shaping for real-world robotics, explainable AI for high-stakes decisions, imitation-based meta-learning from demonstrations, and domain randomization to bridge the simulator-to-reality gap. The throughline is that modern progress increasingly depends on large-scale data, compute, and automated learning of learning rules—rather than only human-designed heuristics—while also acknowledging that real-world deployment remains constrained by how well training distributions match reality.
Cornell Notes
The lecture argues that modern deep learning research is shifting toward systems that can adapt quickly to new tasks using prior experience. Few-shot learning is framed as learning a reusable “starting point” so that fine-tuning requires only a small number of examples. Model-agnostic meta-learning (MAML) achieves this by meta-training across many tasks, optimizing shared parameters θ so one (or a few) gradient updates produce good validation performance for each task. The same “learn to adapt” idea is extended to reinforcement learning, where agents face delayed rewards and must explore; meta-learning aims to let agents adapt to new environments in only a few episodes. The practical importance is clear: with research output exploding, these methods offer a path to more reliable generalization when data is scarce or environments change.
Why does few-shot learning require more than standard supervised pretraining plus fine-tuning?
How does MAML turn “learning from examples” into “learning from tasks”?
What does “5-way one-shot” mean in the lecture’s few-shot benchmarks?
What makes reinforcement learning harder than supervised learning?
How does meta-learning for reinforcement learning differ from standard RL?
Why can meta-trained navigation agents fail even when they work for many random seeds?
Review Questions
- In MAML, what is optimized at meta-training time: training loss, validation loss, or both—and why does that matter for fast adaptation?
- Which reinforcement learning challenges (credit assignment, stability, exploration) most directly explain why sparse rewards and long horizons can break meta-learning for navigation?
- How does self-play change the learning signal compared with training only against a fixed best opponent in games like Go or Dota 2?
Key Points
- 1
Research output has grown so quickly that keeping up requires strategy—reading papers selectively and using reading groups to cut time spent on low-value work.
- 2
Few-shot learning aims to reduce the labeled data needed for new classes by learning reusable structure during pretraining or meta-training.
- 3
MAML trains a shared initialization θ across many tasks so that a small number of gradient updates yields strong validation performance for each task.
- 4
Meta-learning for reinforcement learning extends “learn to adapt” to environments, targeting rapid improvement within a few episodes on a new task.
- 5
RL’s core difficulties—credit assignment, stability, and exploration—become especially acute with sparse rewards and long horizons.
- 6
Bridging simulation to reality often relies on domain randomization and simulator diversity rather than building a single perfect simulator.
- 7
Practical RL/robotics gains often come from reward shaping and careful alignment between training task distributions and real-world variation.