OpenAI Five
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
OpenAI Five is built to master full Dota as a coordinated five-hero team, not just isolated 1v1 scenarios.
Briefing
OpenAI’s “OpenAI Five” is an AI system built to play Dota as a coordinated five-player team, and early results show it can beat amateur squads in full-game matches—then even hold its own against a stronger human challenge. The effort matters because Dota demands constant teamwork, timing, and strategic map control, making it a high-stakes testbed for whether reinforcement learning can master complex, multi-agent coordination rather than just single-player tactics.
The system targets the full Dota game (not just 1v1 mini scenarios) after a prior bot that defeated top players in a smaller setting. This new approach relies on large-scale reinforcement learning with self-play, training the five bots together so they learn to act as a single unit. OpenAI describes running the game on more than 100,000 CPUs, letting the bots learn from every match they generate. A key training ingredient is a hyper-parameter called “team spirit,” which starts the bots selfishly and then tunes them to care about teammates—an explicit mechanism for turning independent agents into coordinated team behavior.
In early testing, OpenAI Five began playing against amateur teams to gauge real skill. The surprising outcome: it won its first games against every team tested. A Dota expert, William Lee (known as Blitz), reviewed the matches and highlighted how the bots executed high-level strategic decisions in mirror mode, where both sides use the same heroes. Blitz points to a specific example involving Crystal Maiden defending barracks: her Blink Dagger and Black King Bar enable an uninterruptible Freezing Field, and the resulting combo forces multiple human players into a losing 2v5 situation—an illustration of how the bots can create and exploit teamfight advantages.
Blitz also emphasized map control and lane prioritization. In two consecutive games, the bots consistently “owned” the same crucial areas—taking away roughly two-thirds of the map while leaving certain bottom towers untouched. Blitz argues this isn’t luck: controlling the hardest-to-manage side of the map and focusing on top and mid areas reflects an intuitive grasp of what matters most in Dota’s strategic flow. He notes that it took him years to learn these kinds of strategies, suggesting the bot’s behavior reflects more than rote pattern matching.
After the amateur-team results, Blitz challenged OpenAI Five directly by pairing with top players from the audience. Despite an early moment where Blitz appeared likely to die, the bots eventually won the match, including a decisive push toward the first lane of rax. While OpenAI Five still isn’t at the level of pro teams, the match outcomes were framed as a meaningful step: the bot’s teamfighting coordination stayed coherent under pressure, and mistakes seemed to be punished consistently.
Looking ahead, OpenAI plans a live match in July against a team of top players and notes the Dota world championships in August. The broader goal is less about one game and more about generalizing the training method—using reinforcement learning and self-play to tackle complex, multi-agent problems beyond Dota.
Cornell Notes
OpenAI Five is an AI system trained to play Dota as a coordinated five-hero team, using large-scale reinforcement learning and self-play. Training runs on over 100,000 CPUs, and a “team spirit” hyper-parameter shifts the bots from selfish behavior toward teammate-aware coordination. In early tests against amateur teams, the bots won their first games against every team they faced. Dota expert William Lee (“Blitz”) highlighted consistent, high-level map control and teamfight execution, including mirror-mode examples where coordinated hero combos forced lopsided fights. After that, Blitz challenged the system with top audience players, and OpenAI Five still managed to win—signaling progress toward pro-level play.
What training approach lets OpenAI Five learn to coordinate as five agents rather than as isolated players?
Why did mirror mode matter in the expert’s assessment?
What specific strategic behaviors did Blitz point to as evidence of non-luck performance?
How did the bots perform in teamfights, according to the expert and match narrative?
What happened when Blitz challenged OpenAI Five directly with top audience players?
What are the near-term milestones and the broader aim beyond Dota?
Review Questions
- How does the “team spirit” hyper-parameter change the bots’ learning dynamics, and why is that important for five-player coordination?
- What evidence suggests the bots’ map control decisions are learned strategy rather than random outcomes?
- Why does mirror mode strengthen the validity of performance comparisons in these Dota matches?
Key Points
- 1
OpenAI Five is built to master full Dota as a coordinated five-hero team, not just isolated 1v1 scenarios.
- 2
Training uses reinforcement learning with self-play at very large scale, running the game on over 100,000 CPUs.
- 3
A “team spirit” hyper-parameter tunes bots from selfish play toward teammate-aware coordination.
- 4
Early matches against amateur teams produced immediate wins across all tested squads.
- 5
Dota expert William Lee (“Blitz”) highlighted consistent map control—especially repeated focus on top and mid areas—and coherent teamfight execution.
- 6
In a direct challenge, Blitz and top audience players still lost as the bots pushed toward the first lane of rax.
- 7
Future plans include a live July match against top players and continued development toward pro-level competition, with an eye on generalizing the method to other complex problems.