What machine learning role is right for you?

TL;DR

DevOps engineers own deployment and monitoring of production systems, often using tools like AWS Lambda.

Briefing Cornell Notes

Briefing

Machine learning teams hire for several distinct roles—DevOps, data engineering, machine learning engineering, machine learning research, and data science—but the boundaries blur, especially around “data scientist.” The practical difference comes down to what each role produces: DevOps engineers deploy and monitor production systems, data engineers build and maintain data pipelines, machine learning engineers train-to-deploy prediction systems, machine learning researchers focus on model training and exploration, and data scientists often function as a catch-all for everything from reporting to hands-on modeling.

DevOps engineers typically own the operational lifecycle of production. Their work product is a deployed product, and their day-to-day tools often include cloud and deployment technologies such as AWS Lambda. Data engineers, by contrast, are responsible for the plumbing that makes machine learning possible: building data pipelines, aggregating and storing data, and monitoring data quality. Their work product is usually a distributed system that enables fast access to data for training, with common tooling including Hadoop and Airflow.

Machine learning engineers sit at the center of the training-to-production bridge. They train and deploy prediction models so the resulting prediction system runs on real production data. That “bridge” requirement—deep machine learning capability plus enough software engineering skill to integrate with existing codebases and company workflows—makes the role unusually hard to hire. People interviewed for this breakdown described it as a “rare unicorn” position, and also the hardest to fill.

Machine learning researchers share some overlap with machine learning engineers, but the distinction is often operational: researchers focus on exploration, iteration, and training models, then hand models off for deployment. In many organizations, the handoff is explicit—research produces the trained model; engineering operationalizes it.

Data scientist is the most nebulous label. No single definition dominates across companies. In some workplaces, “data scientist” means someone who can use Excel and databases, build or run trained models to answer business questions, and communicate results through reports to management. Elsewhere, the title effectively maps to what other organizations call machine learning research or machine learning engineering.

A useful way to think about fit is a 2x2 framework that places roles by software engineering skill (one axis) and machine learning knowledge (the other). The size of each role’s “bubble” reflects how much communication and technical writing matters for success. In that framing, ML DevOps is primarily a software engineering role, data engineering often looks like software engineering with the machine learning team as the “customer,” and machine learning engineering demands both strong ML depth and strong engineering execution.

Backgrounds vary widely. Machine learning engineers can come from software engineering with self-taught machine learning, or from science and engineering PhDs (often not strictly machine learning PhDs). Some increasingly come from structured programs such as the Google Brain fellowship or the Facebook fellowship. Data scientists can range from undergraduates and dedicated data science degree programs to highly technical PhD paths such as astrophysics, depending on how a company defines the title.

Cornell Notes

Machine learning hiring often splits into roles that differ by what they produce: DevOps deploys and monitors production systems, data engineers build data pipelines for training, machine learning researchers focus on model training and exploration, and machine learning engineers train and deploy prediction models into production. “Data scientist” is the least consistent title, sometimes meaning reporting and business analytics with Excel and databases, and sometimes meaning research or engineering. A 2x2 model maps roles by software engineering skill and machine learning knowledge, with communication and technical writing varying by role. The hardest role to hire for is typically machine learning engineering because it requires both state-of-the-art ML training ability and strong engineering integration skills.

How do DevOps engineers contribute to machine learning systems, and what does their “work product” look like?

DevOps engineers focus on deploying and monitoring production systems. Their work product is a deployed product, meaning they ensure the system runs reliably after it leaves the training environment. Common tooling mentioned includes AWS Lambda and other deployment tools.

What distinguishes data engineering from other ML roles?

Data engineers build and maintain data pipelines—aggregating, storing, and monitoring data. Their work product is typically a distributed system that enables fast access to data for training. Hadoop and Airflow are cited as common parts of their workflow.

Why is machine learning engineering described as unusually difficult to hire for?

Machine learning engineering requires a rare combination: deep understanding of machine learning (to train state-of-the-art models) plus enough engineering skill to integrate models into the company’s codebase and production workflow. Many people interviewed described it as the hardest role to hire for.

What’s the practical difference between machine learning research and machine learning engineering?

The line is often fuzzy, but a common pattern is that machine learning researchers handle exploration and iteration—training models and investigating data—then hand the trained model to machine learning engineers for deployment. Engineering is the function that operationalizes the model for real customer or production data.

Why does “data scientist” vary so much across companies?

The title lacks a shared definition. In some organizations, data scientists are primarily analysts who use Excel and databases, apply trained models to answer business questions, and communicate results via reports to management. In other organizations, “data scientist” effectively overlaps with machine learning research or machine learning engineering.

What kinds of backgrounds can lead to machine learning engineering or data science roles?

Machine learning engineering backgrounds range from software engineers who self-learn machine learning to science/engineering PhDs (often not strictly machine learning PhDs) and people coming from programs like the Google Brain fellowship or the Facebook fellowship. Data science backgrounds can include undergraduates and dedicated data science degree programs, but also technical PhDs such as astrophysics, depending on how the company defines the role.

Review Questions

Which role’s work product is a deployed prediction system running on real production data, and what two skill areas does it require?
How do data engineers’ responsibilities (pipelines, storage, monitoring) differ from machine learning researchers’ responsibilities (exploration, iteration, training)?
Why does the “data scientist” title create ambiguity when comparing job postings across companies?

Key Points

1
DevOps engineers own deployment and monitoring of production systems, often using tools like AWS Lambda.
2
Data engineers build and maintain data pipelines that aggregate, store, and monitor data for training, commonly with Hadoop and Airflow.
3
Machine learning engineers bridge training and production by training and deploying prediction models on real data.
4
Machine learning researchers typically focus on exploration and model training, then hand models off for deployment.
5
Machine learning engineering is widely described as the hardest role to hire because it combines advanced ML skill with strong software engineering integration.
6
“Data scientist” is a catch-all label with inconsistent meaning, ranging from Excel-and-database reporting to hands-on ML research or engineering.
7
A 2x2 framing by software engineering skill and ML knowledge helps clarify role fit, with communication and technical writing varying by role.

Highlights

Machine learning engineering is portrayed as the “rare unicorn” role because it demands both state-of-the-art ML training ability and production-grade engineering integration.

Data engineering is the infrastructure layer: pipelines, storage, monitoring, and fast data access for training.

“Data scientist” can mean anything from business reporting with Excel to roles that look like ML research or ML engineering, depending on the company.

Topics

Machine Learning Roles
DevOps
Data Engineering
Machine Learning Engineering
Data Science Titles