DeepSeek Coder: AI Writes Code | Free LLM For Code Generation Beats ChatGPT, ChatDev & Code Llama
Based on Venelin Valkov's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
DeepSeek Coder is trained on about two trillion tokens with roughly 87% code, supporting strong code generation.
Briefing
DeepSeek Coder is an open-source code-focused language model from DeepSeek AI that’s trained heavily on programming data and tuned to follow coding instructions. The core claim behind it is straightforward: it can outperform major coding LLMs on benchmark suites and human evaluation, while remaining practical to use via public model releases and an online interface. That matters because it positions a freely available model as a serious option for real coding workflows—especially when developers want strong Python and algorithmic performance without paying for closed models.
Training details point to why it may work well. DeepSeek Coder is trained on roughly two trillion tokens, with about 87% coming from code and the remainder largely from English and Chinese natural language. Model sizes range from 1 billion parameters up to 33 billion parameters, and the context window is large—around 16k tokens—so it can handle longer prompts and codebases. Beyond the base model, an instruction-tuned variant called DeepSeek Coder Instruct is further fine-tuned with about 2 billion tokens of instruction data, aiming to produce more task-aligned outputs.
Performance comparisons are a major part of the pitch. The transcript cites benchmark results where DeepSeek Coder is reported to land roughly 8–10% better than Code Llama–style baselines on multiple coding benchmarks. It also claims that DeepSeek Coder Instruct performs better than GPT-3.5 Turbo on human evaluation data. In the practical comparisons described, the 33B instruction-tuned model performs very competitively against GPT-3.5 Turbo across most benchmarks, with one benchmark (MBPP) where it’s close rather than clearly ahead. The overall positioning offered is that DeepSeek Coder sits between ChatGPT and GPT-4 in capability—stronger than many open alternatives, and not far from top-tier closed models.
The transcript then tests the model on hands-on tasks. For Python, it generates correct functions quickly for simple prompts like computing the square of a sum and splitting a list into three parts, including input validation via ValueError. A more creative prompt—writing excuses in the voice of Dwight from The Office—produces working code that generates excuses, though the output doesn’t fully match the requested persona.
Algorithmic tests on LeetCode show a clearer success pattern. The model produces accepted solutions for an easy Two Sum problem and a medium “array game” problem on the first attempt, with runtime and memory rankings reported as strong (e.g., beating large portions of submissions). It also tackles a hard “count of range sum” problem using a prefix-sum approach combined with sorting and Binary Indexed Tree (Fenwick tree) logic, and the solution is accepted—despite earlier attempts with the same prompt reportedly failing.
Finally, the transcript extends beyond algorithms into software building. DeepSeek Coder generates a basic Flask app that stores submitted emails in a SQLite database and provides an admin page to view them. It also creates a Flappy Bird clone in Pygame: the first version has physics issues, but a follow-up prompt fixes gravity and adds scoring and collision-based game over behavior. Across these demos, the recurring theme is fast inference and frequent first-try correctness on structured coding tasks, making DeepSeek Coder a compelling free option for developers and interview-style problem solving.
Cornell Notes
DeepSeek Coder is an open-source, code-heavy LLM trained on about two trillion tokens, with roughly 87% of the data coming from code. It comes in multiple sizes (1B to 33B parameters) and supports a large ~16k token context window. An instruction-tuned variant, DeepSeek Coder Instruct, adds about 2B tokens of instruction fine-tuning to improve task-following. In the transcript’s tests, the model generates correct Python functions, produces accepted LeetCode solutions for easy and medium problems, and even succeeds on a hard range-sum problem using prefix sums plus a Binary Indexed Tree. It also builds small Flask and Pygame projects, often working on the first attempt and improving after targeted feedback.
What training choices are most likely responsible for DeepSeek Coder’s coding performance?
How does the instruction-tuned model differ from the base model in practical use?
Why do the LeetCode demos matter more than the simple Python examples?
What algorithmic technique appears in the hard range-sum solution?
How well does DeepSeek Coder handle building small applications, not just functions?
What limitation shows up in the creative coding prompt?
Review Questions
- What specific training and tuning details (token mix, instruction fine-tuning, context window) are cited as reasons DeepSeek Coder performs well on coding tasks?
- Describe the core approach used to solve the hard “count of range sum” problem as reported in the transcript.
- In the Flask and Flappy Bird demos, what kinds of issues were corrected after the initial generation?
Key Points
- 1
DeepSeek Coder is trained on about two trillion tokens with roughly 87% code, supporting strong code generation.
- 2
Model sizes range from 1B to 33B parameters, and the context window is about 16k tokens.
- 3
DeepSeek Coder Instruct adds about 2B tokens of instruction fine-tuning to improve task-following.
- 4
The transcript reports benchmark advantages for DeepSeek Coder (about 8–10% on cited coding benchmarks) and competitive results versus GPT-3.5 Turbo on human evaluation.
- 5
In hands-on tests, the model produced accepted LeetCode solutions for easy and medium problems and also succeeded on a hard range-sum problem using prefix sums and a Binary Indexed Tree.
- 6
Beyond algorithms, it generated a working Flask email-collection/admin-view app and a Pygame Flappy Bird clone, with follow-up prompts fixing gameplay issues.