Lecture 01: When to Use ML and Course Vision (FSDL 2022)

TL;DR

ML products need an outer loop—deploy, measure with real users, collect data, and iterate—because offline metrics don’t guarantee real-world performance.

Briefing Cornell Notes

Briefing

Machine learning is moving into the mainstream, but the real challenge isn’t getting models to work—it’s deciding when ML is worth the added complexity and then running an iterative loop that keeps performance aligned with real users. Full Stack Deep Learning frames the course around that gap: research-style model development (“flat earth ML”) often stops after a good offline metric, while production ML products require an outer loop that measures behavior in the wild, collects new data, and retrains or redesigns until the system stays useful.

The lecture starts by placing today’s ML boom in context. In 2018, standout systems like early language models were impressive but hard to apply; now there’s broader standardization and easier deployment. Model training is increasingly commoditized through tools and libraries (including Hugging Face-style workflows), deployment can be done with minimal code, and frameworks such as Keras and PyTorch Lightning reduce the “spaghetti code” burden. At the same time, MLOps has emerged as a discipline for deploying and maintaining models at scale, reflecting the field’s shift from prototypes to operational systems.

Avoiding an “AI winter,” the argument goes, depends less on research alone and more on translating progress into real products. That translation demands a different process than academic work: after deployment, teams must monitor real-world performance, gather feedback, and build a data flywheel so the model improves as the product learns. The course positions itself as end-to-end product building rather than a deep dive into theory or training math.

From there, the lecture pivots to the first practical question for any ML project: should ML be used at all? ML projects carry a higher failure rate than traditional software efforts, driven by common breakdowns—models that are technically feasible but too slow or expensive to ship, teams that can build models but can’t deploy them, organizational misalignment on what “success” means, and projects that solve the wrong problem or deliver insufficient business value. A key takeaway is that the value must outweigh not only development cost but also the technical debt and ongoing complexity ML introduces.

To decide readiness, the lecture recommends exhausting simpler options first. Teams should ask whether rules or basic statistics could achieve most of the benefit, whether the organization can collect and store the needed data, and whether the team and ethics are aligned. If ML is still the right choice, feasibility depends on impact and cost. High-impact opportunities often come from reducing prediction cost (making decisions feasible at scale), lowering user friction, or replacing brittle rule systems with learned behavior. Cost drivers include data availability (including labeling and stability), accuracy requirements (which can raise costs super-linearly as targets tighten), and intrinsic problem difficulty.

Finally, the lecture outlines how ML product work proceeds through a lifecycle: planning, data collection and labeling, training and debugging, and deployment and monitoring—each stage feeding back into the others when real-world constraints break assumptions. Using a running example of pose estimation for robotics, it emphasizes that offline success doesn’t guarantee downstream success; metrics, data, and requirements often need revision after testing in realistic environments. The overarching message: start with the right problem, ship early enough to learn, and build the operational loop that keeps ML systems effective over time.

Cornell Notes

Machine learning is increasingly easy to build, but ML products fail when teams treat model development like an endpoint. The lecture argues that success requires an outer loop: deploy the model, measure real-world performance, collect new data, and iterate so the system stays aligned with users. Before starting, teams should ask whether ML is necessary at all—ML adds complexity and technical debt, so the project’s value must outweigh that cost. Feasibility then depends on impact and cost, with data availability, accuracy requirements, and problem difficulty acting as major cost drivers. Finally, ML work follows a lifecycle (planning → data → training → deployment/monitoring) where each phase can force changes to the others based on what happens in production.

Why does the lecture treat “flat earth ML” as insufficient for real products?

Offline ML workflows often end after a model hits good metrics on a dataset, producing a report or notebook and moving on. In production, the model’s inputs and user behavior shift, so performance can degrade. The lecture frames ML products as needing an outer loop: deploy the model, measure how it performs with real users, collect real-world data, and use that data to retrain or redesign. This creates a data flywheel where better models improve the product, which attracts more users, which generates more data for the next training cycle.

What are the most common ways ML projects fail, even when the model looks promising?

The lecture highlights several failure modes: (1) technically infeasible or poorly scoped projects that take too long to deliver value; (2) teams that can build models but aren’t the right team to deploy and operate them in production; (3) organizational misalignment—everyone agrees on the model’s offline metric, but not on whether the system is acceptable to run for users; and (4) solving a problem that isn’t big enough—organizationally, the added complexity of ML isn’t justified by the incremental value.

How should teams decide whether ML is worth using at all?

Teams should start by exhausting simpler approaches: ask whether rules or simple statistics could capture most of the benefit. They should also check readiness for data collection (do they already collect the needed data, and can they store it in a usable way?) and confirm the team can support the work. The lecture also adds an ethics check—whether it’s appropriate to use ML for the specific problem—before committing to a complex system.

What makes ML projects expensive as accuracy requirements rise?

The lecture claims project cost tends to scale super-linearly with accuracy targets. A rough rule of thumb given: tightening accuracy by an additional nine (e.g., 99.9 to 99.99) can increase costs by around 10× because it often requires much more data and additional infrastructure such as monitoring to ensure the model truly meets the stricter accuracy in production.

What does “data flywheel” mean in the context of ML-powered products?

A data flywheel is a virtuous cycle connecting model quality, product quality, and data generation. As the model improves, the product becomes better, which brings in more users. More users generate more data, which can be labeled or otherwise processed to train an even better model. The lecture emphasizes that the links must be operational: teams need a data loop to collect and select data points for labeling, and they must ensure improved predictions actually improve the product experience.

How does the ML project lifecycle work, and why does it loop?

The lifecycle is planning → data collection/labeling → training/debugging → deployment/monitoring. It’s iterative: early assumptions can break later. For example, teams may discover labeling is too hard, data collection is insufficient, offline metrics don’t match downstream success, or real-world performance is worse due to train/test mismatch. Those findings feed back to earlier stages, changing requirements, data strategy, and even the chosen evaluation metrics.

Review Questions

What outer-loop activities must happen after deployment to keep an ML product aligned with real user behavior?
Which three cost drivers does the lecture emphasize for ML projects, and how does each affect feasibility?
How do the three ML product archetypes (software 2.0, human-in-the-loop, autonomous systems) change the kinds of questions teams should ask before building?

Key Points

1
ML products need an outer loop—deploy, measure with real users, collect data, and iterate—because offline metrics don’t guarantee real-world performance.
2
Before using ML, teams should exhaust simpler options like rules or statistics and confirm they can collect and store the needed data.
3
ML projects often fail due to technical infeasibility, deployment/team mismatch, organizational misalignment on success criteria, or insufficient business value.
4
Feasibility should be judged using impact vs. cost, with data availability, accuracy requirements, and intrinsic problem difficulty as major cost drivers.
5
Tightening accuracy targets can raise costs super-linearly, often requiring more data and stronger monitoring to maintain performance.
6
ML work follows a lifecycle (planning → data → training → deployment/monitoring) where each phase can force changes to earlier assumptions.
7
Avoid tool fetishization: teams don’t need perfect infrastructure to start, but they do need the right problem and a practical path to production learning.

Highlights

The lecture’s central production lesson: good offline performance isn’t enough; ML products require continuous measurement and a data flywheel after deployment.

ML complexity creates technical debt faster than traditional software, especially when predictions influence other systems and when data dependencies are expensive to maintain.

Accuracy requirements can drive costs dramatically—tightening targets can multiply effort due to data and monitoring needs.

Project feasibility is hard to predict, so teams should start with a minimum viable model (even non-ML baselines) and validate success criteria with real constraints.

Different product archetypes (software 2.0, human-in-the-loop, autonomous systems) demand different feasibility questions, especially around acceptable failure rates and data loops.

Topics

ML-Powered Products
When To Use ML
ML Ops vs ML Products
Data Flywheel
ML Project Lifecycle

Mentioned

Josh Tobin
Charles
Sergey
Peter Abeel
Andrew Ng
André Carpathi
ML
MLOps
NLP
OCR
AI