Wharton & MIT Can't Agree on AI: Here's What Both are Missing on Building Real AI Projects

TL;DR

MIT’s 95% failure rate and Wharton’s 75% success rate reflect different ROI definitions, time horizons, and thresholds for what counts as value.

Briefing Cornell Notes

Briefing

A 75% “success” rate and a 95% “failure” rate from two major studies on enterprise generative AI don’t contradict each other so much as they measure success on incompatible definitions. Wharton’s higher success figure reflects how executives track ROI—often through productivity, time saved, and throughput—while MIT’s failure rate comes from a much tighter standard that effectively treats projects as failures unless they can prove near-term, dollar-and-cents impact on the bottom line. The headline numbers feel mutually exclusive, but they’re built on different scorecards, different time horizons, and different thresholds for what counts as value.

That mismatch matters because it shapes how organizations decide what to fund, how to staff AI work, and what “good” looks like when results arrive unevenly. MIT’s approach pushes leadership toward a hard-nosed view of software ROI—especially important as AI tools can become dramatically more expensive per employee than prior software categories. Wharton’s approach, by contrast, mirrors how executives actually manage: they often accept operational metrics as proxies for business impact. The practical takeaway is to treat the viral top-line percentages as a starting point, not a decision framework.

Where enterprise AI programs tend to succeed more steadily, the discussion points to three missing building blocks that aren’t captured well by either study’s metrics. First is institutional fluency through team-level context awareness. In this model, context engineering isn’t a narrow job for specialists; it’s a deliberate capability teams maintain. Teams articulate their domain workflows, uncertainties, and value-driving processes to AI systems so the output is locally useful. Leaders then can observe “accountable acceleration” when teams consistently apply that context in their work.

Second is AI problem-solving skills—treated as a critical patch on team fluency rather than a one-time training topic. The hard part isn’t just learning to prompt or analyze; it’s understanding how LLMs process information well enough to decompose problems into forms the model can handle. Crucially, the ownership model flips in the AI era: individual contributors must own quality and decide whether the AI’s output meets the bar. Managers and teams can hold the AI literacy and shared methods, but without individual ownership—especially the willingness to challenge the model when it’s wrong—value stalls.

Third is “taste,” described as a democratized quality instinct for choosing the right problems and recognizing what excellent looks like. Pre-AI organizations could centralize taste in a small priesthood of experts; AI-native speed requires pushing that judgment down to teams (or at least broader units) so they can act autonomously without sacrificing standards. Taste is not universal like core LLM skills; it’s vertical- and situation-specific, closely tied to local domain knowledge and the ability to spot where the real juice is in the profitability matrix.

Taken together, the core argument is that the path to real AI projects isn’t found in arguing over 75% versus 95%. It’s built by institutionalizing context, reconfiguring ownership and skills across individuals and teams, and socializing taste—so organizations can deliver consistent value even as study headlines keep changing.

Cornell Notes

Two enterprise AI studies report sharply different outcomes—MIT’s 95% failure rate versus Wharton’s 75% success rate—because they use incompatible ROI definitions. MIT applies a very strict standard requiring measurable dollar-and-cents bottom-line impact within a short window, while Wharton reflects executive practice using softer operational metrics like productivity, time saved, and throughput. The more actionable framework offered is “institutional fluency,” built from three capabilities: team-level context awareness, AI problem-solving skills paired with individual-level ownership, and democratized “taste” for selecting the right problems and judging quality. These elements explain why organizations can show steadier progress than headline percentages suggest.

Why do MIT’s 95% failure rate and Wharton’s 75% success rate both appear credible?

They rely on different success criteria. MIT uses an extremely tight definition of project success that effectively labels projects as failures unless teams can measure a dollar-and-cents impact on the bottom line within roughly 6–12 months. Wharton’s measure comes from executive-reported ROI practices, which often emphasize productivity, time saved, and throughput rather than immediate profit attribution. Same population of large companies, different scorecards—so the numbers aren’t directly comparable.

What does “institutional fluency” mean in practical terms for AI adoption?

It’s the organization’s ability to repeatedly produce useful AI outcomes by building capabilities that persist beyond individual employees. The discussion frames teams as the atomic unit: teams have stable domain ownership and vertical responsibilities, and fluent organizations help teams deliberately maintain the context they can articulate to AI systems. When teams consistently feed the right local context, leaders can observe accountable acceleration and count it as success.

How does context engineering change when it becomes a team-level responsibility?

Instead of treating context as a specialized job, the model treats context as something teams must maintain. Teams deeply understand how their domain works—value-driving workflows, unique processes, and areas of uncertainty—then translate that into instructions and artifacts that LLMs can use. Without team-level context articulation, other improvements (tools, prompts, workflows) struggle because outputs won’t fit local needs.

What’s the proposed flip in ownership and skills for AI problem solving?

Traditional setups often let managers set quality standards while individuals contribute information fluency. In the AI era, the discussion argues that individual contributors must index highly on ownership: they must assess whether the AI’s output meets the bar and insist on correction when it doesn’t. Meanwhile, AI literacy and problem-solving skills can reside at the team level—shared via prompts, brown-bag sessions, and reusable artifacts—so knowledge spreads even as accountability stays with the individual producing the work.

What is “taste,” and why can’t it stay centralized in an AI-native company?

Taste is the ability to pick the right problems and recognize what excellent looks like—both in product quality and in problem selection (which “spicy” problems to solve). Pre-AI organizations could delegate taste to a small expert group, but AI-native speed requires democratizing that judgment so teams can move autonomously. Taste is also not universal: it depends on local domain knowledge and the organization’s specific profitability matrix.

Review Questions

How do MIT’s and Wharton’s ROI measurement approaches differ, and why does that make their headline percentages non-comparable?
In the proposed AI problem-solving model, what must individual contributors do that managers previously could handle at the team level?
Why is “taste” described as both essential and non-universal across organizations?

Key Points

1
MIT’s 95% failure rate and Wharton’s 75% success rate reflect different ROI definitions, time horizons, and thresholds for what counts as value.
2
MIT’s tighter standard demands near-term, bottom-line dollar impact, while Wharton’s success metric aligns with executive use of productivity, time saved, and throughput.
3
Enterprise AI success depends less on arguing over headline numbers and more on building “institutional fluency” that persists across teams.
4
Team-level context awareness is treated as the foundation: teams must deliberately articulate domain workflows, uncertainties, and value drivers to AI systems.
5
AI problem solving requires a skills/ownership inversion: AI literacy can be shared at the team level, but quality ownership must sit with individual contributors.
6
Democratized “taste” helps teams choose the right problems and judge quality, enabling autonomy without sacrificing standards.
7
Taste is vertical- and situation-specific, so it must be socialized locally rather than assumed as a universal capability.

Highlights

MIT’s methodology effectively labels projects as failures unless they can show measurable bottom-line dollar impact within about 6–12 months, making its bar far stricter than typical software ROI tracking.

Wharton’s 75% success figure tracks executive-defined ROI proxies—productivity, time saved, and throughput—rather than immediate profit attribution.

The proposed enterprise fix is “institutional fluency”: team-level context, individual-level ownership for quality, and democratized taste for selecting and judging work.

In AI-native work, individual contributors must be willing to challenge the model’s output; shared team skills aren’t enough without personal accountability.

Topics

Enterprise AI ROI
Institutional Fluency
Context Engineering
AI Problem Solving
Ownership and Skills
Taste and Quality

Mentioned

Nate B Jones