OpenAI Five

TL;DR

OpenAI Five is built to master full Dota as a coordinated five-hero team, not just isolated 1v1 scenarios.

Briefing Cornell Notes

Briefing

OpenAI’s “OpenAI Five” is an AI system built to play Dota as a coordinated five-player team, and early results show it can beat amateur squads in full-game matches—then even hold its own against a stronger human challenge. The effort matters because Dota demands constant teamwork, timing, and strategic map control, making it a high-stakes testbed for whether reinforcement learning can master complex, multi-agent coordination rather than just single-player tactics.

The system targets the full Dota game (not just 1v1 mini scenarios) after a prior bot that defeated top players in a smaller setting. This new approach relies on large-scale reinforcement learning with self-play, training the five bots together so they learn to act as a single unit. OpenAI describes running the game on more than 100,000 CPUs, letting the bots learn from every match they generate. A key training ingredient is a hyper-parameter called “team spirit,” which starts the bots selfishly and then tunes them to care about teammates—an explicit mechanism for turning independent agents into coordinated team behavior.

In early testing, OpenAI Five began playing against amateur teams to gauge real skill. The surprising outcome: it won its first games against every team tested. A Dota expert, William Lee (known as Blitz), reviewed the matches and highlighted how the bots executed high-level strategic decisions in mirror mode, where both sides use the same heroes. Blitz points to a specific example involving Crystal Maiden defending barracks: her Blink Dagger and Black King Bar enable an uninterruptible Freezing Field, and the resulting combo forces multiple human players into a losing 2v5 situation—an illustration of how the bots can create and exploit teamfight advantages.

Blitz also emphasized map control and lane prioritization. In two consecutive games, the bots consistently “owned” the same crucial areas—taking away roughly two-thirds of the map while leaving certain bottom towers untouched. Blitz argues this isn’t luck: controlling the hardest-to-manage side of the map and focusing on top and mid areas reflects an intuitive grasp of what matters most in Dota’s strategic flow. He notes that it took him years to learn these kinds of strategies, suggesting the bot’s behavior reflects more than rote pattern matching.

After the amateur-team results, Blitz challenged OpenAI Five directly by pairing with top players from the audience. Despite an early moment where Blitz appeared likely to die, the bots eventually won the match, including a decisive push toward the first lane of rax. While OpenAI Five still isn’t at the level of pro teams, the match outcomes were framed as a meaningful step: the bot’s teamfighting coordination stayed coherent under pressure, and mistakes seemed to be punished consistently.

Looking ahead, OpenAI plans a live match in July against a team of top players and notes the Dota world championships in August. The broader goal is less about one game and more about generalizing the training method—using reinforcement learning and self-play to tackle complex, multi-agent problems beyond Dota.

Cornell Notes

OpenAI Five is an AI system trained to play Dota as a coordinated five-hero team, using large-scale reinforcement learning and self-play. Training runs on over 100,000 CPUs, and a “team spirit” hyper-parameter shifts the bots from selfish behavior toward teammate-aware coordination. In early tests against amateur teams, the bots won their first games against every team they faced. Dota expert William Lee (“Blitz”) highlighted consistent, high-level map control and teamfight execution, including mirror-mode examples where coordinated hero combos forced lopsided fights. After that, Blitz challenged the system with top audience players, and OpenAI Five still managed to win—signaling progress toward pro-level play.

What training approach lets OpenAI Five learn to coordinate as five agents rather than as isolated players?

OpenAI Five uses reinforcement learning with self-play at large scale. The five bots train together so they learn team behaviors, not just individual tactics. A dedicated hyper-parameter called “team spirit” starts the bots completely selfish and then tunes them to care about teammates, encouraging coordinated decision-making as a single unit.

Why did mirror mode matter in the expert’s assessment?

Mirror mode gives both teams the exact same heroes, removing hero draft advantage as an explanation for outcomes. That makes execution—teamfight timing, positioning, and map control—more visible. Blitz’s examples, including Crystal Maiden’s defensive combo, rely on how the team uses identical kits to create winning fights and pressure objectives.

What specific strategic behaviors did Blitz point to as evidence of non-luck performance?

Blitz emphasized map control and lane prioritization. In two games, the bots repeatedly took away about two-thirds of the map while leaving certain bottom towers untouched, focusing on top and mid areas. He argued that doing this consistently across games indicates learned strategy rather than coincidence.

How did the bots perform in teamfights, according to the expert and match narrative?

The teamfight aspect was described as especially strong: coordination stayed intact and mistakes were punished. Blitz compared the experience to being “hammered” when making errors, suggesting the bot’s responses are tightly linked to team positioning and timing rather than forgiving human-like play.

What happened when Blitz challenged OpenAI Five directly with top audience players?

Blitz paired with strong human players and faced early danger, with commentators noting Blitz was about to die and that the humans were down members with limited time remaining. Despite that, the bots won the match and pushed decisively toward the first lane of rax, showing they could overcome a stronger lineup.

What are the near-term milestones and the broader aim beyond Dota?

OpenAI plans a live match in July against a team of top players, while the Dota world championships are set for August. The broader aim is to treat the training method as general-purpose: learning Dota is the proving ground for applying similar reinforcement learning and self-play techniques to complex problems in other domains.

Review Questions

How does the “team spirit” hyper-parameter change the bots’ learning dynamics, and why is that important for five-player coordination?
What evidence suggests the bots’ map control decisions are learned strategy rather than random outcomes?
Why does mirror mode strengthen the validity of performance comparisons in these Dota matches?

Key Points

1
OpenAI Five is built to master full Dota as a coordinated five-hero team, not just isolated 1v1 scenarios.
2
Training uses reinforcement learning with self-play at very large scale, running the game on over 100,000 CPUs.
3
A “team spirit” hyper-parameter tunes bots from selfish play toward teammate-aware coordination.
4
Early matches against amateur teams produced immediate wins across all tested squads.
5
Dota expert William Lee (“Blitz”) highlighted consistent map control—especially repeated focus on top and mid areas—and coherent teamfight execution.
6
In a direct challenge, Blitz and top audience players still lost as the bots pushed toward the first lane of rax.
7
Future plans include a live July match against top players and continued development toward pro-level competition, with an eye on generalizing the method to other complex problems.

Highlights

OpenAI Five’s early results were unusually clean: it won its first games against every amateur team it tested in full-game Dota.

Blitz pointed to repeated, high-level map control—taking roughly two-thirds of the map and prioritizing top and mid—across multiple games.

The system’s teamfighting coordination stayed disciplined under pressure, with mistakes met by immediate, punishing responses.

A direct human challenge led by William Lee (“Blitz”) ended in an OpenAI Five win, including a decisive rax push.

Topics

OpenAI Five
Dota Teamplay
Reinforcement Learning
Self-Play Training
Map Control

Mentioned

William Lee