Llama 3 - 8B & 70B Deep Dive
Based on Sam Witteveen's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Llama 3 is released in two open-weight sizes—8B and 70B—each available as both base (pre-trained) and instruction-tuned variants.
Briefing
Meta’s Llama 3 release centers on two new open-weight language models—8B and 70B—that aim to outperform last generation’s Llama 2 while matching or challenging leading proprietary systems on key benchmarks. The most consequential detail is scale and training depth: both models were trained on more than 15 trillion tokens, with a reported context length of 8K, and both use grouped-query attention. That combination—very large token exposure plus architectural efficiency—helps explain why the smaller 8B model is positioned as a leap forward, even beating the largest Llama 2 variant in several comparisons.
The 8B and 70B models arrive in two forms: a base (pre-trained) version intended for fine-tuning, and an instruction-tuned version meant for everyday chat and task execution. Meta’s model cards describe text-only inputs for now, but the transcript notes strong signals that multimodal capability is likely next—especially hints from team members about a future vision-style model where images and other modalities could be added. For developers, the practical near-term takeaway is that Llama 3 is usable immediately in instruction form, while fine-tuning workflows are expected to expand as more scripts and community variants appear.
Benchmark comparisons in the transcript highlight where Llama 3 is strongest. On GSM-style math and other reasoning-oriented tasks, the 8B model is claimed to land around double the scores of Mistral Instruct and Gemma Instruct, suggesting a meaningful jump in reasoning quality for a relatively small model. The 70B model is described as more competitive than dramatically dominant: it performs strongly against Gemini Pro 1.5 and Claude 3 Sonnet on a range of evaluations, while also beating prior Llama 2 results. Meta’s own evaluation set is described as 800 prompts spanning 12 use cases—advice, brainstorming, classification, coding, creative writing, extraction, roleplay, reasoning, rewriting, summarization, and more—where the 70B model comes out ahead of several baselines including GPT 3.5, Mistral Medium, and Claude 3 Sonnet.
Training and compute details add context to the performance claims. The transcript cites a training cutoff of March 23 for the 8B model and December 2023 for the 70B model, implying the models were built from large datasets well before release. It also notes a reported training run using 24,000 GPUs—substantially less than the much larger GPU counts sometimes associated with other frontier efforts—raising questions about how the upcoming 405B model will be trained.
Access to Llama 3 weights comes with a license gate on Hugging Face, and the transcript flags restrictions that may disappoint people expecting “open source” behavior. Two standouts: a clause preventing use of Llama outputs to improve other large language models (outside Llama 3 and Llama 3 fine-tunes), and a requirement that any fine-tuned or merged model name begin with “Llama 3.” Commercial use is allowed if other terms are followed, but the output-based restriction limits certain downstream training strategies.
Finally, the transcript walks through practical ways to try Llama 3—Ollama, LM Studio, Hugging Chat, and hosted endpoints on major cloud providers—then demonstrates a Hugging Face setup using text-generation pipelines. Early hands-on tests suggest the model is generally capable at roleplay, concise answering, and some reasoning patterns, with mixed results on certain math variants that appear sensitive to prompting and system instructions. Looking ahead, the transcript points to an imminent 405B model and hints that future releases may expand context length, multimodality, and code-focused variants.
Cornell Notes
Llama 3 arrives as two open-weight models—8B and 70B—in both base and instruction-tuned formats. Both were trained on over 15 trillion tokens and use grouped-query attention, with a reported 8K context window and text-only inputs for now. Benchmark results in the transcript emphasize strong reasoning performance, including claims that the 8B model can outperform larger prior Llama 2 variants and that the 70B model is competitive with Gemini Pro 1.5 and Claude 3 Sonnet across a 12-category evaluation set. Access requires accepting a Hugging Face license with notable restrictions, including limits on using Llama outputs to train other large models. The practical message: Llama 3 is easy to try via Ollama, Hugging Chat, and hosted endpoints, while fine-tuning and future multimodal/code variants are expected to expand quickly.
What are the two Llama 3 models released so far, and how do their formats differ?
Why does the training scale matter for the expected quality of Llama 3?
How do the benchmark comparisons describe Llama 3’s strengths and limits?
What license restrictions could affect developers trying to build new models or datasets?
What practical options exist to run Llama 3 without hosting it yourself?
What did hands-on prompting tests suggest about Llama 3 behavior?
Review Questions
- Which Llama 3 format would you choose if you want to fine-tune the model yourself, and why?
- How do the transcript’s benchmark claims differ between the 8B and 70B models?
- What two specific license clauses are highlighted as most likely to limit downstream model training or reuse?
Key Points
- 1
Llama 3 is released in two open-weight sizes—8B and 70B—each available as both base (pre-trained) and instruction-tuned variants.
- 2
Both models were trained on over 15 trillion tokens and use grouped-query attention, supporting strong performance despite an 8K context window.
- 3
Meta’s reported evaluation set spans 12 task categories (800 prompts), where the 70B model is described as beating several baselines including GPT 3.5 and Claude 3 Sonnet.
- 4
The Hugging Face license includes restrictions that limit using Llama outputs to improve other large language models and requires “Llama 3” to appear at the start of fine-tuned/merged model names.
- 5
Llama 3 can be tried quickly via Ollama, LM Studio, Hugging Chat, and hosted endpoints on major cloud providers.
- 6
Early prompting tests suggest Llama 3 is strong at roleplay and instruction following, but math performance can be inconsistent depending on system prompts and question framing.