Get ChatGPT-5 Ready with These Prompting Principles

TL;DR

Switching from chat GPT-4 to reasoning models (o3, Claude Opus 4, Gemini 2.5 Pro) is treated as the prerequisite for the most effective prompting patterns.

Briefing Cornell Notes

Briefing

The biggest practical takeaway is simple: prompting improves dramatically once people stop defaulting to “chat GPT-4” and switch to reasoning models—then use a small set of evidence-backed prompting patterns to make those models check themselves, use tools, and follow structured instructions. The core message isn’t about memorizing dozens of tips; it’s about adopting a repeatable prompting system that reduces errors and makes outputs more reliable in 2025.

A major theme is model choice. Many users—down to CEOs—stick with “40” because it’s familiar, not because it’s best. The transcript argues that several reasoning models outperform that default: “o3” (described as a reasoning model that’s colder in personality but strong at doing work), “Claude Opus 4” (praised for writing, reading, and transparent tool use), and “Gemini 2.5 Pro” (positioned as a fast reasoning model with a large context window and tool use, available through Google Vertex and other interfaces). The claim is that the “95% problem” isn’t debating which model is best; it’s getting people to stop using the older baseline.

Once a reasoning model is in place, three techniques are presented as the most memorable, high-impact, and repeatedly validated. First is self-consistency via optionality: ask for multiple candidate answers (e.g., “give me five ways” to define a concept or solve a coding problem), then require the model to check those candidates for consistency. The point is that reasoning models can generate options cheaply, and the consistency check helps preserve the benefit of their inference process.

Second is “program of thought,” which reframes math and code prompting. Instead of asking for an explanation, the prompt should instruct the model to write a function or call a tool (like Python) to solve the problem. The transcript emphasizes that this improves numerical accuracy because the model can execute code rather than rely solely on verbal reasoning.

Third is “plan and solve”: request a step-by-step plan for the task first, then move into execution. The user can critique the plan before the model produces the final work, turning the planning stage into a quality-control step.

Beyond those tactics, the transcript shifts to structural principles for prompt design. Prompts need guardrails and edge cases—explicit fallback behavior such as “if unable to X, then do Y,” plus clear output structure and handling when instructions are partially missed. It also stresses “context positioning,” arguing that attention isn’t uniform: critical instructions should appear in the first 10% of the prompt, with key constraints reiterated near the end. Finally, it argues that negative examples matter more than positive ones: show what to avoid (e.g., banned phrases) so the model learns failure modes, not just desired behavior.

The closing insight is that models can help write better prompts through “metaprompting.” Techniques include a self-improvement loop (“write out my current prompt and how to improve it”), an uncertainty check (“what parts are unclear or ambiguous?”), and capability discovery (“how would you approach this with no constraints?”). Additional diagnostic prompts like Socratic questioning (“why this approach?” “what alternatives?”) and confidence/uncertainty prompts help surface hidden assumptions. The overall message ties everything together: reasoning models make these self-correction and metaprompting loops more effective, and the same prompting principles should remain useful even as newer model menus change.

Cornell Notes

Switching from “chat GPT-4” to reasoning models (o3, Claude Opus 4, Gemini 2.5 Pro) is presented as the fastest way to improve results, because the core prompting techniques rely on models that can take time to infer, check, and use tools. The transcript then highlights three evidence-based prompting patterns: (1) self-consistency by asking for multiple candidate answers and requiring consistency checks, (2) “program of thought” by prompting the model to write code/tool calls for math and coding, and (3) “plan and solve” by requesting a step-by-step plan before execution. Structural prompt design matters too: add guardrails and edge cases, place critical instructions early and late, and include negative examples showing what to avoid. Finally, “metaprompting” (self-improvement loops, uncertainty probes, capability discovery, Socratic questioning) helps the model reveal what it needs to perform well.

Why does the transcript treat “stop using chat GPT-4” as the biggest lever before any prompt-writing tips?

It frames model choice as the “95% problem”: most users default to chat GPT-4 because it’s familiar, not because it’s best for reasoning-heavy tasks. The transcript claims that reasoning models—o3, Claude Opus 4, and Gemini 2.5 Pro—support the exact mechanisms the prompting methods depend on: deeper inference, self-checking, and tool use. Without that reasoning-model behavior, techniques like self-consistency checks, tool-based math (“program of thought”), and metaprompting are harder to make reliably effective.

What is “self-consistency” in this context, and how is it different from asking multiple unrelated questions?

Self-consistency is implemented by asking for optionality within the same task, then checking consistency across the candidates. The transcript gives examples like: “give me five ways that you could define the answer to this question,” or “give me five possible solutions” for a coding problem, then have the model verify that the set of answers is consistent. The rationale is that generating multiple options is relatively cheap for reasoning models, while the consistency check reduces random incorrect answers.

How does “program of thought” change the way math and coding problems should be prompted?

Instead of asking for a verbal explanation (“please explain how you solve this”), the prompt should instruct the model to solve via code/tool use—e.g., “write a function to solve this” and have it call a tool like Python. The transcript argues this improves numerical accuracy because the model can compute rather than rely only on approximate reasoning.

What does “plan and solve” add that a direct request for an answer doesn’t?

It inserts a quality-control step: ask for a step-by-step plan first, then proceed to execution. The transcript emphasizes that the user can critique the plan before the model writes the final output. That makes the planning stage an opportunity to catch flawed assumptions or missing constraints early.

Why are guardrails/edge cases, context positioning, and negative examples treated as structural principles?

Guardrails and edge cases ensure the model knows what to do when conditions fail (e.g., “if unable to X, then do Y”), how to handle fallbacks, and how to format outputs. Context positioning assumes attention isn’t uniform: critical instructions should be in the first 10% of the prompt and reiterated near the end. Negative examples teach failure modes explicitly—showing what not to do (like banned phrases) is described as more important than only showing ideal examples.

What is “metaprompting” here, and what prompt phrases are offered to trigger it?

Metaprompting means asking the model to help improve the prompt itself or diagnose gaps in understanding. The transcript suggests phrases such as: a self-improvement loop (“Here’s my current prompt. Just write it out. How would you improve this prompt?”), an uncertainty check (“What parts of this request are unclear or ambiguous?”), and capability discovery (“How would you approach this if you had no constraints?”). It also mentions diagnostic approaches like Socratic questioning (“Why did you choose that approach?” “What alternatives did you consider?”) to surface implicit assumptions.

Review Questions

If a reasoning model can generate multiple candidates cheaply, what additional step should be required to reduce hallucinations, and why?
How would you rewrite a math prompt to use “program of thought” rather than asking for an explanation?
Which prompt elements should be placed in the first 10% and last 10% of a prompt, and what is the purpose of negative examples?

Key Points

1
Switching from chat GPT-4 to reasoning models (o3, Claude Opus 4, Gemini 2.5 Pro) is treated as the prerequisite for the most effective prompting patterns.
2
Use self-consistency by asking for multiple candidate answers and then requiring a consistency check across those candidates.
3
For math and coding, prompt for tool-based execution (“write a function” / call Python) rather than requesting only a verbal explanation.
4
Adopt “plan and solve” by requesting a step-by-step plan first, then critiquing that plan before execution.
5
Build prompts with guardrails and edge cases, including explicit fallbacks and clear output structure.
6
Design prompts with context positioning (critical instructions early and reiterated late) and include negative examples that show what to avoid.
7
Use metaprompting—self-improvement loops, uncertainty probes, capability discovery, and Socratic questioning—to make the model reveal what it needs to perform well.

Highlights

The transcript’s central claim is that the biggest bottleneck is not prompt wording—it’s defaulting to chat GPT-4 instead of using reasoning models that can check and use tools.

Three high-retention techniques are emphasized: self-consistency via multiple candidates, “program of thought” via code/tool calls, and “plan and solve” with critique before execution.

Prompt structure matters as much as content: add guardrails/edge cases, place key constraints in the first and last parts of the prompt, and include negative examples to teach failure modes.

Metaprompting is framed as a practical workflow: ask the model to improve your prompt, identify ambiguities, and probe its own capability boundaries.

Topics

Reasoning Models
Self-Consistency
Program of Thought
Prompt Guardrails
Metaprompting

Mentioned

Nate B Jones