Lecture 08: ML Teams and Project Management (FSDL 2022)

TL;DR

ML product success depends on team structure, hiring specificity, probabilistic planning, and product design for failure—not on model performance alone.

Briefing Cornell Notes

Briefing

Machine-learning product teams face a structural problem: ML adds uncertainty, scarce talent, and stakeholder misunderstanding on top of the usual challenges of building software products. The core takeaway is that success depends less on any single model technique and more on how roles are organized, how hiring is targeted, how timelines are managed under uncertainty, and how products are designed to match users’ expectations to what models can actually do in production.

The lecture breaks down the interdisciplinary staffing needed to ship ML systems. Common roles include ML product managers (prioritize work with business, users, and stakeholders; produce design docs, wireframes, and work plans using tools like Jira and Notion), ML Ops/ML platform teams (build shared infrastructure and tooling to deploy and scale models, often integrating with AWS, Kafka, and vendor tooling), ML engineers (own the end-to-end lifecycle of a model in production—training, packaging, deployment, and ongoing maintenance; they use tools such as TensorFlow and Docker), ML researchers (focus on trained models and prototypes that may not yet be production-critical, often delivering reports or code repos and using tools like Jupyter notebooks), and data scientists (a catch-all that can mean analytics for business questions or overlap with ML research/engineering depending on the organization).

Hiring is treated as a high-stakes design choice. The “unicorn ML engineer” job description—requiring PhDs, years of TensorFlow, publications, and large-scale distributed systems experience—rarely matches real candidates. Instead, the lecture recommends being specific about what matters for the role: for many ML engineering needs, software engineering strength is the primary filter, with enough ML background to learn and operate effectively. For ML researchers, it argues for prioritizing quality over quantity of publications (judging the creativity and applicability of one or two strong works rather than counting papers), and for looking for candidates with an independent sense of what problems matter—sometimes using adjacent-field backgrounds as a signal. It also emphasizes sourcing beyond standard channels: tracking first authors of promising papers, recruiting from high-quality re-implementations, and leveraging conferences.

Team structure is framed as a maturity curve. At the “nascent/ad hoc” stage, ML work is low-hanging fruit but suffers from weak infrastructure, difficulty retaining talent, and limited leadership buy-in. In an “ML R&D” stage, ML sits in research, often with prototypes and long-term bets, but can get stuck due to data access problems and weak translation into business value. “Embedded ML in product teams” improves feedback loops and business impact but can create resource and hiring friction and conflicts with software delivery norms. A “centralized ML function” boosts talent density, data/compute access, and tooling investment, but introduces handoff friction to production. The end goal is an “ML-first organization,” where centralized expertise supports line-of-business teams that deliver quick wins.

Managing ML projects and expectations is especially difficult because progress is non-linear and timelines are probabilistic. The lecture recommends probabilistic project planning—assigning probabilities to task completion, running alternative approaches in parallel when prerequisites unlock multiple paths, and avoiding critical paths that assume research will succeed. It also stresses cultural alignment between research and engineering, frequent quick wins, and educating leadership on ML’s probabilistic nature and why accuracy metrics alone don’t communicate business outcomes.

Finally, product design must bridge the gap between user mental models and model reality. The lecture uses a “trained dog” analogy: ML systems can solve hard puzzles but fail in strange ways, generalize narrowly, and need feedback and guardrails. Good ML product design therefore explains benefits and limitations, supports human-in-the-loop control, uses confidence-based fallbacks, and builds feedback loops—ranging from implicit behavioral signals to explicit thumbs up/down and, when feasible, user correction inside the workflow—to continuously improve the system after deployment.

Cornell Notes

Machine-learning product success hinges on organization and process, not just model quality. The lecture maps out roles (ML PM, ML Ops/platform, ML engineers, ML researchers, data scientists), argues for targeted hiring over “unicorn” profiles, and shows how team structure evolves from ad hoc ML to an ML-first organization. ML projects demand probabilistic planning because progress is non-linear and research-like work can fail; timelines and success criteria must reflect that uncertainty. On the product side, users often expect superhuman intelligence, but ML behaves more like a trained system with failure modes—so products must add guardrails, human-in-the-loop options, and feedback loops that improve models using real user signals.

How do ML roles differ in what they produce and what they own?

ML product managers prioritize ML work with business and user stakeholders, producing design docs, wireframes, and work plans (often tracked in Jira/Notion). ML Ops/ML platform teams build shared infrastructure and tooling so models deploy more easily and scale, integrating with systems like AWS and Kafka and sometimes vendor ML tooling. ML engineers own the model lifecycle in production: they train, package (e.g., with Docker), deploy, and maintain prediction models once live. ML researchers focus on trained models and prototypes that may not be production-critical, delivering artifacts like reports or code repos and using tools such as Jupyter notebooks. Data scientists can mean analytics-focused business question answering or can overlap with ML research/engineering depending on the organization.

Why is the “unicorn ML engineer” hiring description often the wrong approach?

The lecture warns that job postings demanding PhDs, years of TensorFlow, publications, and large-scale distributed systems experience create an unrealistic candidate pool. Instead, it recommends hiring for the core skills actually required by the role—often software engineering first for ML engineering—then ensuring candidates have at least ML basics and a willingness to learn. For many companies, the practical need is taking an established model (via libraries) and deploying it reliably, not inventing new architectures from scratch.

What distinguishes task ML engineers from platform ML engineers?

Task ML engineers are assigned to one or a few ML pipelines and are responsible day-to-day for keeping models healthy, updating them, and handling failures—often becoming overburdened. Platform/ML Ops engineers work across teams to automate tedious parts of ML work, building shared tooling so task engineers can focus on models rather than plumbing. The lecture notes this distinction was popularized by Shreya Shankar’s blog post.

How should ML project planning differ from traditional software planning?

Traditional planning assumes tasks have predictable durations and dependencies, but ML work has higher failure rates and uncertain outcomes. The lecture recommends probabilistic project planning: assign probabilities to task completion, pursue alternate approaches that unlock the same dependencies in parallel, and avoid assuming a research path will succeed. It also emphasizes that progress can be misleading early (big gains may come quickly, then flatten) and that ML timelines often stall due to data or approach issues.

What are the main organizational archetypes for ML teams, and what trade-offs do they create?

The maturity curve runs from nascent/ad hoc ML (low support, weak infrastructure, limited leadership buy-in) to ML R&D (prototypes and long-term bets, but data access and business translation can fail). Next is embedded ML in product teams (tight feedback loops and business impact, but hiring/developing top talent and resource access can be harder, plus delivery conflicts). Then comes a centralized ML function (better talent density, data/compute access, and tooling investment, but production handoffs add friction). The target is an ML-first organization: centralized expertise plus ML capability embedded across line-of-business teams for both long-term bets and quick wins.

How should ML product design bridge the gap between user expectations and model reality?

Users often treat AI products as if they have broad human-like intelligence and learn from mistakes, but ML systems behave more like trained models that can fail in unexpected ways and generalize narrowly. The lecture recommends explaining benefits and limitations, adding guardrails, and designing for human-in-the-loop control rather than over-relying on automation. It also calls for feedback loops: implicit signals (e.g., churn or downstream actions), direct implicit feedback (clicks, sends, copies), explicit thumbs up/down or categorization, and—when possible—user correction inside the workflow (example: Great Scope where instructors re-label matched handwritten names).

Review Questions

Which hiring signals does the lecture recommend prioritizing for ML researchers, and why does it caution against counting publications?
How does probabilistic project planning change how teams decide what to do next when ML progress stalls?
Compare embedded ML teams versus centralized ML functions: what friction shifts between research, engineering, and production in each model?

Key Points

1
ML product success depends on team structure, hiring specificity, probabilistic planning, and product design for failure—not on model performance alone.
2
ML engineers typically own the full production lifecycle (training, packaging, deployment, maintenance), while ML Ops/platform teams focus on shared infrastructure and tooling.
3
Avoid “unicorn” ML engineer job descriptions; hire for the dominant practical skill (often software engineering for deployment-heavy roles) and require only the ML basics needed to succeed.
4
ML project timelines should be treated as uncertain; probabilistic planning and parallel exploration of alternatives help manage non-linear progress and research-like failure rates.
5
Organizational maturity matters: nascent, R&D, embedded, centralized, and ML-first structures each trade off data access, talent density, and production handoff friction.
6
ML product design must match user expectations to model reality using guardrails, human-in-the-loop options, confidence-based fallbacks, and clear communication of limitations.
7
Feedback loops—implicit, direct implicit, explicit, and user correction when feasible—are the mechanism for improving ML systems after deployment.

Highlights

ML progress is often non-linear: early gains can be dramatic, then flatten, making traditional timeline estimates unreliable.

The lecture’s hiring warning is blunt: “unicorn ML engineer” requirements are usually unfillable; role-specific skill targeting beats wish lists.

Team structure is treated as a maturity curve, with “ML-first” positioned as the end state that combines centralized expertise and embedded delivery.

Product design should treat ML like a trained system with failure modes, not superhuman intelligence—so guardrails and human-in-the-loop control are essential.

Probabilistic project planning replaces waterfall assumptions by assigning completion probabilities and exploring multiple approaches to unlock dependencies.

Topics

ML Team Roles
ML Hiring
ML Organization Archetypes
Probabilistic Planning
ML Product Design

Mentioned

Shreya Shankar
Lucas B Walt
Peter Beale
Chip Win
Sergey
ML
ML Ops
AWS
Kafka
FSDL
F1
TDSP
AGI
NLP
Kaggle
UC