The Tech that’s *probably* inside GPT-5 just got Open Sourced!
Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Claude 3 Haiku can be pushed toward Claude 3 Opus-like quality using an open-source prompting pipeline that generates a task-specific system prompt from Claude 3 Opus outputs.
Briefing
Large language models don’t just get better by training bigger weights—many of the biggest gains come from “extracting” more capability out of models people already have. A viral open-source notebook tied to Matt Schumer’s work claims a smaller Claude 3 variant (Claude 3 Haiku) can be pushed close to Claude 3 Opus quality by feeding it carefully constructed examples and then generating a system prompt that makes the small model behave like the top performer. The practical pitch is straightforward: similar output quality, but far lower cost and latency, because the heavy lifting happens through prompting and example generation rather than by paying for the most expensive model on every request.
The notebook’s workflow starts with a task description plus a single input/output example given to Claude 3 Opus. From that seed, the repository generates a diverse set of additional examples, then uses the task and those examples to produce a system prompt suitable for Claude 3 Haiku (and other small or large models). It also saves the resulting system prompt and examples into a Python file formatted for generation—aimed at developers who want to drop the technique into their own products quickly. A concrete example in the transcript imagines building an AI story-writing website: instead of paying for Claude 3 Opus or GPT-4-class APIs for every user request, the approach targets Claude 3 Haiku while claiming Opus-level results, making the product cheaper to run for both the builder and end users.
That “distillation without retraining” theme is reinforced by another open-source technique called Quiet Star, attributed to Bindu Reddy. Quiet Star pushes reasoning into the generation process by having the model generate internal rationales—essentially token-by-token “inner monologues”—and then using a reward mechanism to teach which rationales lead to better outcomes. Reported results include a jump for a 7B model on common-sense question answering from 36% to 47%, alongside doubled math performance. The transcript also notes that while Quiet Star is open source and can be applied after a model is already trained (including to Claude 3 and ChatGPT), it costs more to run because it adds extra reasoning tokens.
The transcript then connects these ideas into a compounding stack: Quiet Star-style internal thinking, plus example-driven prompting that teaches a small model how to act, plus prompt-level Chain-of-Thought prompting that forces stepwise planning. The endgame is less about waiting for GPT-5 or Claude 4 to be released and more about using existing models as teachers—turning their capabilities into reusable prompting patterns and agent workflows.
Finally, the transcript points to Schumer’s open-source “Claude investor” agent, described as a constrained system that chains multiple Claude 3 calls to gather financial data, analyze sentiment and trends, and rank stocks with price targets—while explicitly warning it isn’t financial advice. The recurring message is that value can be extracted through orchestration and prompting, even when the underlying model weights remain closed. Open sourcing these prompting and agent patterns, the transcript argues, could accelerate adoption widely—because developers can implement the techniques without waiting for new frontier model releases.
Cornell Notes
The transcript argues that large-model performance can be improved dramatically without retraining the base model—by extracting capability through prompting, example generation, and reasoning scaffolds. A key example is an open-source notebook attributed to Matt Schumer that uses Claude 3 Opus to generate a system prompt and diverse examples so a smaller Claude 3 Haiku can perform close to Opus quality at lower cost and latency. Another technique, Quiet Star (Bindu Reddy), adds token-level “inner monologues” during generation and uses reward learning to favor better rationales, reporting gains like 36%→47% on common-sense QA for a 7B model and doubled math performance. The transcript suggests these methods can be compounded with Chain-of-Thought prompting and agent orchestration to push small models toward near top-tier behavior.
How can Claude 3 Haiku be made to approach Claude 3 Opus quality without training a new model?
What exactly does Quiet Star change during generation, and why does it help reasoning?
Why does the transcript emphasize “compounding” techniques like Quiet Star, example prompting, and Chain of Thought?
What trade-offs come with these reasoning-heavy methods?
How do the described agents fit into the broader theme of extracting value from existing models?
Review Questions
- What are the two inputs used to start the Opus→Haiku prompting pipeline, and how does the system prompt get produced?
- How does Quiet Star’s token-level inner monologue differ from prompt-level Chain-of-Thought prompting?
- What kinds of cost increases are mentioned for reasoning and agent workflows, and what strategies are suggested to manage them?
Key Points
- 1
Claude 3 Haiku can be pushed toward Claude 3 Opus-like quality using an open-source prompting pipeline that generates a task-specific system prompt from Claude 3 Opus outputs.
- 2
The Opus→Haiku method begins with a task description plus one input/output example, then expands that into diverse examples and uses them to construct the system prompt.
- 3
Quiet Star improves reasoning by generating token-level internal rationales and using a reward mechanism to reinforce the rationales that lead to better results.
- 4
Reported Quiet Star gains include a 7B model improving common-sense QA from 36% to 47% and doubling math performance, at the cost of higher inference compute.
- 5
Chain-of-Thought prompting is treated as a prompt-level planning scaffold, while Quiet Star is treated as a generation-time reasoning scaffold; both can be combined with example-driven prompting.
- 6
Agent systems (like a constrained Claude investor agent) can chain multiple model calls to execute tasks, but they can become expensive due to token volume.
- 7
A recurring thesis is that open-sourcing prompting/agent patterns can let developers extract more value from existing models without waiting for new frontier releases.