Mistral 3: Europe's Answer to DeepSeek or Too Little, Too Late?
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Mistral Large 3 is a 675B MoE model with 41B active parameters, and its active-parameter fraction is described as higher than many recent MoE releases.
Briefing
Mistral has returned with a major open-model push—four new releases led by Mistral Large 3, plus smaller “Ministral 3” models that include both base and instruction-tuned variants, and reasoning versions for each size. The practical impact is straightforward: developers get more choice across model sizes and training styles, and they can fine-tune from base checkpoints rather than being locked into instruction-only offerings.
At the center is Mistral Large 3, a 675B mixture-of-experts (MoE) model with 41B parameters active at a time. That active-parameter figure is notably higher than what’s become common in recent MoE releases—where systems like GPT-oss and Qwen MoEs are described as using roughly 5% or less active parameters. Mistral also released three smaller dense models under the Ministral 3 naming scheme. The lineup is designed to be more than just “chat” models: alongside instruction-tuned versions, Mistral provides base models for each size, and it also includes reasoning versions for the Ministral family.
Benchmark claims place Mistral Large 3 in the same competitive neighborhood as frontier Chinese models such as DeepSeek 3.1 and Kimi K2, with comparisons that mark a shift in how Mistral positions itself. Earlier releases often compared against Western closed models like Claude Sonnet, Claude Haiku, and GPT-4o mini; those references are largely absent here. The transcript attributes that change to the pace of frontier progress—companies with smaller budgets can’t keep up with the newest closed-model baselines, so comparisons become more selective.
The open-model ecosystem angle is equally important. On aggregate leaderboards like LMArena, Mistral Large appears lower in overall ranking (down to around position 28 in the cited ordering), but it ranks highly when filtered by Apache 2.0 licensing—placing it near the top open options and ahead of many Qwen 3 models, with only Qwen’s large MoE model edging it out. The caveat is that the benchmark set shown is selective, and the current Mistral Large 3 is described as a non-reasoning version, with a reasoning variant expected later.
For smaller models, the Ministral 3 sizes—14B, 8B, and 3B—are framed as a return to the tier where Mistral historically did well (notably the earlier 7B open model). The 3B model is reported to land on par for certain tasks with Gemma 3 12B instruct, while the 14B model is generally competitive with Qwen 14B, with particular strength in instruction following. The transcript also emphasizes availability: unlike some competitors that focus on a single flagship release, Mistral is putting out multiple sizes and variants, including base checkpoints that enable custom fine-tuning and experimentation.
Finally, the release is positioned as a practical tool for teams: with models improving quickly, the transcript argues that organizations should rely on their own app-specific benchmarks and fast model-swapping pipelines rather than chasing public leaderboard numbers. Mistral’s base-and-reasoning roadmap, plus the immediate availability of GGUF versions, is presented as the concrete reason the company still matters in open-source circles—even if it can’t match the newest closed-model pace.
Cornell Notes
Mistral’s comeback centers on Mistral Large 3, a 675B mixture-of-experts model with 41B active parameters, alongside three smaller Ministral 3 dense models (14B, 8B, 3B). The release stands out because it includes base models and instruction-tuned variants for each size, plus reasoning versions for the Ministral family—giving developers more control than instruction-only checkpoints. Benchmark positioning puts Mistral Large 3 near DeepSeek 3.1 and Kimi K2, while licensing filters (Apache 2.0) place it among the top open options. The transcript also stresses that teams should test models on their own use-case benchmarks and swap models quickly as new releases land. A reasoning version of Mistral Large 3 is expected later, which could further shift comparisons.
What makes Mistral Large 3 technically notable compared with many recent MoE releases?
How does the release structure (base vs instruction vs reasoning) change what developers can do?
Where does Mistral Large 3 land in benchmark comparisons, and why are those comparisons treated cautiously?
Why does the transcript emphasize smaller model availability (14B/8B/3B) rather than only a single flagship?
What operational advice does the transcript give for evaluating new models?
Review Questions
- Which specific variants are included for Ministral 3 (base, instruction-tuned, reasoning), and how does that affect fine-tuning options?
- How do the transcript’s licensing-filtered comparisons (Apache 2.0) differ from its overall LMArena ranking discussion?
- What does the transcript suggest about relying on public benchmarks versus running internal, use-case benchmarks?
Key Points
- 1
Mistral Large 3 is a 675B MoE model with 41B active parameters, and its active-parameter fraction is described as higher than many recent MoE releases.
- 2
Mistral released four models total: Mistral Large 3 plus three Ministral 3 dense models at 14B, 8B, and 3B.
- 3
Ministral 3 includes base models and instruction-tuned models for each size, plus reasoning versions for each size—supporting custom fine-tuning and experimentation.
- 4
Benchmark positioning places Mistral Large 3 near DeepSeek 3.1 and Kimi K2, but the transcript flags that shown comparisons are selective and that the current Large 3 is non-reasoning.
- 5
Licensing-focused comparisons (Apache 2.0) put Mistral Large 3 near the top of open models, even if its overall leaderboard position is lower.
- 6
The release includes GGUF versions, making it easier to try different quantized formats.
- 7
The transcript recommends fast model-swapping and app-specific internal benchmarks over relying solely on public leaderboard results.