Jeremy Howard on Platform.ai and Fast.ai (Full Stack Deep Learning - March 2019)
Based on The Full Stack's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Augmented machine learning—pairing human perception and judgment with computer speed—can outperform fully automated AutoML for most near-term tasks.
Briefing
Jeremy Howard argues that “augmented machine learning”—tight human–computer collaboration—beats fully automated ML pipelines for most practical problems, and that the fastest path to strong results often comes from combining what humans do well (rapid perception, similarity/difference judgment, and targeted labeling) with what computers do well (speed, memory, and large-scale optimization). He frames this as a direct challenge to AutoML’s goal of minimizing human involvement, saying that until computers become better than humans at everything (a far-off scenario), the best systems will keep humans in the loop.
Howard illustrates the point with Platform.ai, a workflow designed to make labeling and dataset building dramatically more efficient. The system starts with unlabeled or minimally labeled images (cars, faces, and other categories) and uses interactive projections to let humans quickly find clusters and outliers. Humans rapidly identify where examples look similar, then zoom in to spot differences—turning perception research into a practical labeling strategy. When the model struggles (for example, distinguishing car fronts from backs), the workflow adapts: humans provide a few examples, and the system generates “find similar” results using a pretrained ImageNet model. The interaction becomes a kind of visual dialogue—humans indicate what they’re trying to separate, while the model proposes projections that maximize the difference between chosen groups. With surprisingly few labels (on the order of hundreds), the system trains a classifier that reaches high accuracy (he cites 92% in the car example) and produces better embeddings after each round.
He extends the same idea beyond labeling. In a face dataset, the first projection already separates men and women well enough that bulk selection becomes feasible; later projections reveal additional structure (like sunglasses as an emergent category). The payoff is iterative: improved models yield better projections, which speed up the next labeling cycle. Howard also connects this approach to broader research on “human-augmented” training, emphasizing that studying human strengths—especially perception—can lead to better ML systems than trying to remove humans entirely.
From there, the conversation shifts to how data scientists can get strong results quickly without massive compute or exhaustive hyperparameter searches. Howard highlights research from Fast.ai’s ecosystem and its fellowship program, where the central theme is “spend a little human time instead of a lot of GPU time.” Examples include using a learning-rate finder (a quick procedure that identifies a good learning rate by running short experiments) and findings that, for transfer learning, default hyperparameters often work nearly as well as elaborate tuning. He describes classroom outcomes where beginners frequently reach near-perfect validation accuracy with only 100–200 images after using transfer learning and sensible defaults.
He then stacks additional practical accelerators: test-time augmentation (TTA) to average predictions over multiple inference-time transforms; progressive resizing to train first on smaller images and later on larger ones (speeding training while improving generalization); and “one-cycle” learning rate schedules paired with momentum changes to train faster and more reliably. Howard also discusses training reliability tricks for transfer learning (training newly initialized layers more aggressively), and optimizer details such as correct decoupled weight decay (AdamW) and ways to prevent Adam instability via gradient clipping or adjusting epsilon. The overall message is consistent: strong ML results come from combining smart defaults, efficient training heuristics, and deliberate human–computer interaction rather than brute-force automation or compute-heavy search.
Cornell Notes
Jeremy Howard argues that “augmented machine learning” outperforms fully automated ML because humans excel at perception tasks like spotting similarities/differences and making targeted judgments, while computers excel at speed and optimization. He demonstrates this with Platform.ai, an interactive labeling and training workflow that uses visual projections, “find similar,” and difference-maximizing views to separate classes (cars, faces) with only a few hundred labels, reaching high accuracy quickly. He then broadens the theme to practical training: transfer learning with strong defaults often beats massive hyperparameter grids, and quick techniques like learning-rate finding, test-time augmentation (TTA), progressive resizing, and one-cycle schedules can yield state-of-the-art results on a single GPU. The key takeaway is that small, human-guided steps plus well-chosen training heuristics can replace expensive trial-and-error compute.
Why does Howard say AutoML is the wrong target, and what alternative does he propose?
How does Platform.ai speed up labeling compared with traditional annotation workflows?
What role do pretrained ImageNet models play in the car and face examples?
What does Howard claim about hyperparameter search for transfer learning?
Which training heuristics does Howard highlight for speed and reliability, and what do they do?
What optimizer-related issues does Howard mention for Adam/AdamW, and how can they be mitigated?
Review Questions
- In Howard’s Platform.ai workflow, what specific human actions trigger the system to generate new projections or retrieve similar images?
- Why does progressive resizing both speed up training and often improve results, according to Howard?
- What evidence does Howard cite that default hyperparameters plus a learning-rate finder can outperform or match expensive hyperparameter grids in transfer learning?
Key Points
- 1
Augmented machine learning—pairing human perception and judgment with computer speed—can outperform fully automated AutoML for most near-term tasks.
- 2
Platform.ai accelerates dataset creation by using interactive visual projections that let humans rapidly find similarity regions, zoom into differences, and iteratively refine separation.
- 3
A small number of human labels (hundreds) can drive large improvements when the system uses pretrained ImageNet embeddings and then retrains after each interaction cycle.
- 4
For transfer learning, large hyperparameter grid searches are often unnecessary because defaults plus a learning-rate finder frequently land near optimal performance.
- 5
Practical speedups include test-time augmentation (TTA), progressive resizing (train small then scale up), and one-cycle learning rate schedules with coordinated momentum changes.
- 6
Training reliability improves when newly initialized layers are trained more aggressively (via freezing or layer-specific learning rates) and when optimizers like AdamW are implemented correctly.
- 7
Adam-related instability can be mitigated using gradient clipping or adjusting epsilon (EPS), especially during long training runs.