7 Prompting Strategies from Claude 4's "System Prompt" Leak

TL;DR

Treat prompts as policy systems that prevent failure modes, not just instructions for what to do.

Briefing Cornell Notes

Briefing

A leaked “system prompt” attributed to Claude 4 is being treated less like a set of instructions and more like a safety-and-reliability policy engine—an approach that, if adopted by operators, could reduce failure modes while improving output quality. The central takeaway is a shift in mindset: prompts shouldn’t just tell a model what to do; they should define rules that prevent the model from going wrong, with special attention to edge cases, ambiguity, and tool use.

The breakdown starts by anchoring the model’s identity and stable context up front—concrete facts like the model’s capabilities and the current date—so the system doesn’t waste attention “working memory” on information that won’t change. That early stabilization is paired with explicit conditional refusal templates: if certain conditions are met, the model must refuse or follow a boundary. The emphasis is on clarity rather than restriction, arguing that ambiguity breeds inconsistent behavior and that consistent behavior comes from spelling out the exact limits.

Next comes a distinctive “three-tier uncertainty routing” scheme for handling ambiguous questions. The prompt directs the model to answer immediately for timeless information, to answer directly while offering verification for slow-changing information, and to search immediately for live information such as current prices. The practical lesson is that strong prompts include decision criteria—when to do what—rather than only commands. This becomes especially important for agentic setups where a policy must guide an autonomous system through uncertainty.

Tool use guidance is treated with unusual rigor through “lock tool grammar,” which includes both correct and incorrect function-call formats. The argument: negative examples teach the model how to use tools correctly by showing common failure patterns, not just ideal syntax. Complementing that, the prompt uses “binary style rules” that replace subjective guidance with hard on/off constraints—such as “Never start with flattery” and “No emojis unless requested”—because absolute rules are easier for models to follow than interpretive phrases like “be concise.”

The remaining tactics focus on keeping critical constraints active over long contexts. “Positional reinforcement” repeats key rules at strategic points throughout a lengthy instruction set, countering attention decay by acting like signposts every few hundred tokens. Finally, “posttool reflection” adds a deliberate pause after tool outputs, urging the model to process results before deciding the next step—an accuracy boost when tool outputs are messy or hard to parse.

Taken together, the guidance reframes prompting as “operating system” configuration: defensive programming for hallucinations, copyright, and harmful content should be explicit and exhaustive, not hand-waved. It also pushes for declarative policy framing (“If X always Y”) instead of procedural phrasing (“First do X, then do Y”), arguing that this can make prompting more systematic and easier to reason about. Even with uncertainty around whether the leaked text is authentic, the prompt-structure lessons are presented as directly reusable for operators building more reliable model behavior.

Cornell Notes

The leaked Claude 4 system prompt is presented as a blueprint for building reliable model behavior by treating prompts like policy and “defensive programming,” not magic instructions. Key techniques include early identity/context anchoring, explicit conditional refusal templates for edge cases, and a three-tier uncertainty routing system that tells the model when to answer directly, verify, or search. Tool reliability is strengthened with “lock tool grammar” that provides both valid and invalid function-call examples, plus “posttool reflection” that forces a thinking/checkpoint step after tool outputs. Across long contexts, critical rules are reinforced through repetition (“positional reinforcement”), and ambiguous style guidance is replaced with binary on/off constraints. The result is a prompting approach aimed at preventing failure modes while improving output consistency.

Why does the prompt start with “identity” and stable facts instead of jumping straight into task instructions?

It front-loads concrete, non-changing context—such as the model’s identity, current date, and core capabilities—so the model doesn’t repeatedly spend attention re-deriving basics. The stated rationale is instructional design: establishing steady context early reduces working-memory burden and helps the model stay consistent across the rest of a long instruction set.

How does “three-tier uncertainty routing” turn ambiguity into a decision process?

It uses a decision tree based on how quickly information changes: (1) timeless information → answer immediately; (2) slow-changing information → answer directly and offer to verify; (3) live information → search immediately (example given: today’s stock prices). The key lesson is that prompts should specify when to act, not only how to respond—especially for agentic systems that must choose actions under uncertainty.

What does “lock tool grammar” add beyond telling a model to call tools?

It teaches tool use with both correct and incorrect function-call formats. Instead of only showing the valid syntax, it explicitly marks invalid formats, arguing that negative examples are powerful teaching signals—similar to showing common ways people fall while learning to ride a bike. The goal is fewer tool-call errors and more consistent API behavior.

Why are “binary style rules” emphasized over softer guidance like “be concise”?

The prompt uses hard on/off constraints that are less interpretable and less subjective for the model. Examples include “Never start with flattery” and “No emojis unless requested,” which are clearer than instructions like “minimize formatting” or “be concise,” because the latter can be interpreted in multiple ways.

How does “positional reinforcement” help when prompts are extremely long?

It repeats critical instructions at strategic positions throughout a long context (described as every ~500 tokens). The rationale is attention degradation: as the model reads more tokens, earlier constraints can be forgotten. Repetition acts like speed-limit signs or signposts, helping the model retain key rules such as anonymization requirements (e.g., “all PII must be anonymized”).

What is the purpose of “posttool reflection” in an agent workflow?

After tool/function results, the prompt instructs the model to strongly consider outputting a thinking/reflection block before choosing the next action. The stated reason is that tool outputs can be hard to parse; a reflection step improves accuracy and helps determine what to do next—particularly relevant for Claude 4’s multi-step reasoning plus tool-use chains.

Review Questions

Which prompting tactic most directly addresses inconsistent behavior caused by ambiguity, and how does it do so?
Give an example of how “three-tier uncertainty routing” would handle a question about today’s stock price versus a timeless fact.
Why might repeating critical constraints (“positional reinforcement”) improve performance in long prompts compared with relying on a single instruction at the top?

Key Points

1
Treat prompts as policy systems that prevent failure modes, not just instructions for what to do.
2
Anchor stable identity and context early to reduce working-memory burden and improve consistency.
3
Use explicit if/then conditional refusal templates to define boundaries and edge cases clearly.
4
Add decision criteria for uncertainty (timeless vs slow-changing vs live) so the model knows when to answer, verify, or search.
5
Teach tool use with both correct and incorrect examples to reduce function-call and API errors.
6
Replace subjective style guidance with binary on/off rules that are easier for models to follow.
7
Repeat critical constraints throughout long contexts and add a post-tool reflection checkpoint to improve reliability.

Highlights

The prompt reframes prompting as defensive policy: 90% of the guidance is about what the model must not do, with the remaining portion focused on what it should do.

A three-tier uncertainty router tells the model when to answer immediately, when to verify, and when to search—turning ambiguity into an explicit decision tree.

Tool reliability is strengthened by “lock tool grammar,” which includes both valid and invalid function-call formats, plus a post-tool reflection step before acting next.

Topics

System Prompt Leaks
Prompting Policies
Uncertainty Routing
Tool-Use Reliability
Defensive Prompting

Mentioned

Nate B Jones