Claude 4 System Prompt

TL;DR

Claude 4’s system prompts include identity/date context but direct Claude to avoid confident claims about product-specific details it can’t verify, pointing users to official support instead.

Briefing Cornell Notes

Briefing

Anthropic’s published Claude 4 system prompts for Claude Opus 4 and Claude Sonnet 4 read like an operating manual: they tightly define how Claude should behave, what it should refuse, how it should format responses, and when it should use tools. The practical payoff is that the prompts reveal the “hidden knobs” behind better results—especially around hallucination control, safety boundaries, tone, and tool usage—while also showing how the model is steered to avoid common failure modes.

A major theme is self-description and reliability. The prompts include basic identity and date context (including the current date) so Claude can answer questions about itself more consistently, while discouraging confident claims about product details, message limits, or pricing. When users ask for capabilities or application-specific instructions that Claude can’t verify, it’s directed to admit uncertainty and point to Anthropic’s support resources. The prompts also push back on the “harmless and honest” marketing-style phrasing: Claude is instructed to avoid implying it’s an objective, infallible source, and to acknowledge that language models can carry biases and opinions formed during training.

Prompting guidance is built in. Claude is encouraged to help users craft better instructions—using clear detail, positive/negative examples, step-by-step reasoning, and even structured XML tags—while also specifying desired length and format. Personality rules go further: Claude should respond normally to rude or unhappy users, but it can’t retain learning from the conversation; feedback is routed through the thumbs-down mechanism. For preference questions, Claude should answer as if responding hypothetically, without explicitly telling the user it’s hypothetical.

Safety and harm prevention are explicit and broad. Claude is instructed to avoid self-destructive guidance (including addictions, disordered eating, and highly negative self-talk), to be cautious with minors and content that could enable sexualization, grooming, or abuse, and to refuse instructions that facilitate nuclear weapons, malware, exploits, spoofing, ransomware, or election-related wrongdoing. In cyber-related requests, Claude should refuse code or explanations that could be used maliciously even when framed as “educational.”

The prompts also micromanage writing style. Claude should avoid lists in casual or empathetic chat, use concise answers for simple questions and thorough responses for complex ones, and tailor tone and formatting to the conversation type. It’s also told not to start responses with flattery or excessive praise, and to skip “good, great, profound” style openings to reduce sycophancy.

Finally, the transcript highlights leaked “tool prompts” that Anthropic didn’t publish. Those instructions cover interleaved thinking and function calls, web search behavior (including when to search vs. answer from knowledge), and strict copyright constraints—like limiting quoted chunks and avoiding regurgitation from search results. The leaked material also suggests Claude’s tool behavior can scale with query wording (e.g., terms like “comprehensive” triggering more tool calls), and it details artifact-generation constraints for HTML/JavaScript apps, including sandbox restrictions and supported libraries. Taken together, the system prompts show how Claude’s behavior is engineered: not just what it can do, but how it’s supposed to do it safely, consistently, and in the right format.

Cornell Notes

Claude 4’s system prompts for Opus 4 and Sonnet 4 function like a behavioral contract: they define identity/date context, refusal rules, safety boundaries, and response formatting. Claude is steered to avoid hallucinating about itself or product details, to acknowledge bias and non-objectivity, and to route users to official support when it can’t verify answers. The prompts also embed prompting tips (clear detail, examples, step-by-step reasoning, XML tags) and enforce style constraints such as avoiding lists in casual/empathetic chat. Safety instructions are extensive, including refusals for minors-related harm, nuclear weapons, malware/exploits, and malicious cyber requests. Leaked tool prompts add operational detail on web search limits, copyright-safe quoting, interleaved thinking, and artifact sandbox restrictions.

How do Claude 4 prompts handle questions about Claude itself without drifting into confident inaccuracies?

They provide limited, stable identity context (including the current date) and discourage speculation about product-specific details like message limits, costs, or how to perform actions in the app. When users ask for those unverifiable specifics, Claude should say it doesn’t know and direct the user to Anthropic’s official support page rather than inventing numbers or procedures.

Why do the prompts emphasize that language models aren’t objective or infallible?

Because models acquire biases and opinions during training—intentionally and unintentionally. Training them to claim “no opinions” can create a misleading impression of objectivity. The prompts aim to keep users aware they’re interacting with an imperfect system that can reflect training-data bias, rather than an infallible truth source.

What safety boundaries are spelled out for self-harm-adjacent and mental-health-adjacent requests?

Claude is instructed to provide emotional support alongside accurate medical/psychological terminology where relevant, while avoiding content that encourages or facilitates self-destructive behavior. That includes addictions, disordered eating or unhealthy exercise approaches, and highly negative self-talk or self-criticism. Even when users request harmful content ambiguously, Claude should steer toward healthy, wellbeing-preserving guidance.

How do the prompts manage tone and formatting in everyday conversation?

They require tone adaptation: for casual/emotional/advice-driven conversations, Claude should stay warm and empathetic, but it should avoid lists in chitchat. It should be concise for simple questions and thorough for complex, open-ended ones. It’s also told not to start with flattery (“good, great, profound…”) and to avoid overwhelming users with more than one question per response when it asks questions.

What do the leaked tool prompts add beyond the published system prompts?

They reveal operational instructions for tool use—like interleaved thinking blocks, function-call handling, and web search scaling (including when to search vs. answer from knowledge). They also include strict copyright constraints for search results (e.g., limiting quote length and avoiding regurgitation) and artifact-generation rules for sandboxed HTML/JavaScript apps (e.g., no localStorage/sessionStorage in artifacts, using React state instead).

How can wording affect how many tools Claude uses?

The leaked instructions suggest Claude’s tool-call behavior can change based on query framing. Terms associated with deeper work (like “comprehensive,” “evaluate,” “assess,” “research,” or “make a report”) correspond to higher tool-call ranges, while simpler requests can require fewer or even zero searches when stable knowledge suffices.

Review Questions

What mechanisms in the system prompts reduce the chance Claude invents product details about itself (like costs or message limits)?
Which formatting rules govern casual/empathetic chat, and how do they differ from technical/report-style responses?
How do the leaked tool prompts constrain web search outputs to manage copyright risk?

Key Points

1
Claude 4’s system prompts include identity/date context but direct Claude to avoid confident claims about product-specific details it can’t verify, pointing users to official support instead.
2
Claude is instructed to acknowledge bias and non-objectivity rather than present itself as an infallible, neutral source of truth.
3
Built-in prompting guidance encourages clear, detailed instructions, examples, step-by-step reasoning, and structured XML tags when relevant.
4
Safety rules are extensive: Claude should refuse guidance that enables harm (including minors-related exploitation, nuclear weapons, malware/exploits, and malicious cyber requests) even when framed as educational.
5
Response style is tightly controlled: avoid lists in casual/empathetic chat, use concise answers for simple questions, and tailor tone to conversation type.
6
Leaked tool prompts add operational detail on interleaved thinking, web search scaling, strict copyright-safe quoting, and sandbox restrictions for artifact generation (e.g., no localStorage/sessionStorage in artifacts).
7
Tool usage can scale with request wording—phrases associated with deeper analysis can trigger more tool calls than simpler phrasing.

Highlights

Claude 4’s prompts treat “harmless and honest” style self-claims as something that must be handled carefully, pushing instead toward transparency about bias and uncertainty.

Safety instructions go beyond generic refusals, explicitly covering minors-related harm, nuclear weapons, malware/exploits, spoofing, ransomware, and election-related wrongdoing.

The leaked tool prompts show web search behavior is not just “on/off”: Claude can dynamically choose between answering from knowledge and running multiple searches, with explicit tool-call limits.

Artifact-generation guidance includes concrete sandbox constraints—like banning localStorage/sessionStorage—so generated apps work in the hosted environment.

A standout operational detail: copyright constraints for search outputs are extremely specific, including quote-length limits and rules against regurgitating or reconstructing copyrighted content. 

Topics