3. Prioritizing - ML Projects - Full Stack Deep Learning
Based on The Full Stack's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use a 2x2 grid to prioritize ML work by combining business impact with feasibility (cost and risk).
Briefing
Picking the right machine learning projects comes down to a simple but disciplined tradeoff: pursue work that delivers high business impact while staying feasible in cost and execution risk. A practical way to frame that decision is a 2x2 grid—high impact with low cost/feasibility sits at the top—then use mental models to identify where ML can pay off quickly. Two recurring targets stand out: places where “cheap prediction” can be applied broadly, and parts of a pipeline that currently rely on complicated, brittle manual rules.
The “cheap prediction” idea draws from the economics of AI, which argues that AI’s core shift is lowering the cost of making predictions. Because prediction is central to decision-making, cheaper prediction tends to spread into domains where it was previously too expensive to automate. In business terms, that means looking for workflows where prediction can be embedded into many decisions—so even modest accuracy improvements can translate into meaningful operational or revenue gains.
A second lens comes from “software 2.0,” associated with Andrej Karpathy’s framing: instead of writing explicit rules, teams specify goals and use data plus optimization to search for programs that achieve them. When this approach works, it tends to generalize better than hand-coded logic and can be implemented as neural-network-like programs, which opens the door to computational advantages. The implication for prioritization is straightforward: rule-heavy systems—especially those that are slow, brittle, or hard to maintain—are strong candidates for replacing hand-tuned heuristics with learned models.
Feasibility then becomes the other half of the equation, driven by three main cost levers. First is data availability, including not just whether data exists but how expensive it is to label. Second is the accuracy requirement: pushing performance from 99% to 99.9% can demand disproportionately more effort, because the remaining errors often come from rare cases that require collecting and labeling more “hard” examples. The cost growth with accuracy is described as super-linear, with the rough intuition that reducing error by 90% may require around 10x more data—often the dominant cost driver. Third is problem difficulty, which is hard to estimate but can be assessed using signals from published work and the compute required to reproduce results.
Compute intensity matters because state-of-the-art results may rely on thousands of GPUs, making them unrealistic for smaller teams to replicate. Deployment constraints also change feasibility: large models with many parameters can become impractical in compute-restricted environments, and accuracy targets may be unattainable under those limits.
Finally, “how costly are wrong predictions?” can dominate the difficulty assessment. In safety-critical settings like self-driving, the cost of failure is so high that diminishing returns on accuracy may still be worth pursuing—or may be insufficient because reliability and robustness remain unsolved. In lower-stakes contexts, the same error rate might be acceptable, making the project more feasible.
The discussion also maps what tends to remain difficult in machine learning. Beyond supervised learning, many open problems involve complex outputs (3D reconstruction, video prediction, dialogue), reliability under out-of-distribution conditions, robust performance against adversarial attacks, and generalization beyond interpolation. Even within supervised learning, tasks like speech recognition in noisy real-world conditions, symbolic reasoning, and planning/causality are highlighted as persistent challenges. Returning to a running example of pose estimation for robotic grasping, the framework suggests it can be a strong target: the pipeline likely contains rule-based bottlenecks suited for software 2.0, accuracy needs may be manageable if failure costs are low, and while published results exist, adapting them to a specific robot and environment remains non-trivial.
Cornell Notes
Project selection for machine learning is framed as a tradeoff between impact and feasibility. High-impact work often comes from two places: embedding “cheap prediction” into decision-making and replacing brittle, hand-tuned rule systems with learned models (“software 2.0”). Feasibility is driven mainly by data availability (including labeling cost), the accuracy requirement (cost rises super-linearly as accuracy tightens), and problem difficulty (including compute demands and deployment constraints). The cost of wrong predictions can make an otherwise similar task far harder—safety-critical failures can dominate the evaluation. Using these levers helps teams decide what to build and how risky it will be to reach the needed performance.
How does the “impact vs. feasibility” 2x2 framework guide ML project selection in practice?
What does “cheap prediction” mean, and where should it influence project choice?
How does “software 2.0” differ from traditional software, and why does that matter for ML prioritization?
Why can improving accuracy from 99% to 99.9% become dramatically more expensive?
What signals help estimate problem difficulty when there’s limited prior work?
How should teams incorporate the cost of wrong predictions into feasibility?
Review Questions
- What are the three main cost drivers of ML project expense, and how does each one affect feasibility differently?
- Give two examples of how “cheap prediction” could create business impact, and explain why cheap prediction changes what’s automatable.
- Why might out-of-distribution robustness be a bigger challenge than improving average accuracy on a benchmark?
Key Points
- 1
Use a 2x2 grid to prioritize ML work by combining business impact with feasibility (cost and risk).
- 2
Target opportunities where prediction can be made cheap and used frequently in decision-making, since lower prediction costs expand where automation is viable.
- 3
Replace brittle, rule-based pipeline components with learned models when goals can be specified and data/optimization can find effective programs (“software 2.0”).
- 4
Treat data availability as the primary cost driver, and include labeling cost explicitly when estimating project expense.
- 5
Expect accuracy requirements to drive super-linear cost growth; rare error cases often require large increases in data and labeling.
- 6
Estimate problem difficulty using reproducibility signals (recency of published work) and compute requirements (e.g., GPU counts) rather than relying on benchmark results alone.
- 7
Incorporate the domain cost of wrong predictions—safety-critical failures can make a task far harder even if average accuracy seems reachable.