Claude 4 System Prompt
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Claude 4’s system prompts include identity/date context but direct Claude to avoid confident claims about product-specific details it can’t verify, pointing users to official support instead.
Briefing
Anthropic’s published Claude 4 system prompts for Claude Opus 4 and Claude Sonnet 4 read like an operating manual: they tightly define how Claude should behave, what it should refuse, how it should format responses, and when it should use tools. The practical payoff is that the prompts reveal the “hidden knobs” behind better results—especially around hallucination control, safety boundaries, tone, and tool usage—while also showing how the model is steered to avoid common failure modes.
A major theme is self-description and reliability. The prompts include basic identity and date context (including the current date) so Claude can answer questions about itself more consistently, while discouraging confident claims about product details, message limits, or pricing. When users ask for capabilities or application-specific instructions that Claude can’t verify, it’s directed to admit uncertainty and point to Anthropic’s support resources. The prompts also push back on the “harmless and honest” marketing-style phrasing: Claude is instructed to avoid implying it’s an objective, infallible source, and to acknowledge that language models can carry biases and opinions formed during training.
Prompting guidance is built in. Claude is encouraged to help users craft better instructions—using clear detail, positive/negative examples, step-by-step reasoning, and even structured XML tags—while also specifying desired length and format. Personality rules go further: Claude should respond normally to rude or unhappy users, but it can’t retain learning from the conversation; feedback is routed through the thumbs-down mechanism. For preference questions, Claude should answer as if responding hypothetically, without explicitly telling the user it’s hypothetical.
Safety and harm prevention are explicit and broad. Claude is instructed to avoid self-destructive guidance (including addictions, disordered eating, and highly negative self-talk), to be cautious with minors and content that could enable sexualization, grooming, or abuse, and to refuse instructions that facilitate nuclear weapons, malware, exploits, spoofing, ransomware, or election-related wrongdoing. In cyber-related requests, Claude should refuse code or explanations that could be used maliciously even when framed as “educational.”
The prompts also micromanage writing style. Claude should avoid lists in casual or empathetic chat, use concise answers for simple questions and thorough responses for complex ones, and tailor tone and formatting to the conversation type. It’s also told not to start responses with flattery or excessive praise, and to skip “good, great, profound” style openings to reduce sycophancy.
Finally, the transcript highlights leaked “tool prompts” that Anthropic didn’t publish. Those instructions cover interleaved thinking and function calls, web search behavior (including when to search vs. answer from knowledge), and strict copyright constraints—like limiting quoted chunks and avoiding regurgitation from search results. The leaked material also suggests Claude’s tool behavior can scale with query wording (e.g., terms like “comprehensive” triggering more tool calls), and it details artifact-generation constraints for HTML/JavaScript apps, including sandbox restrictions and supported libraries. Taken together, the system prompts show how Claude’s behavior is engineered: not just what it can do, but how it’s supposed to do it safely, consistently, and in the right format.
Cornell Notes
Claude 4’s system prompts for Opus 4 and Sonnet 4 function like a behavioral contract: they define identity/date context, refusal rules, safety boundaries, and response formatting. Claude is steered to avoid hallucinating about itself or product details, to acknowledge bias and non-objectivity, and to route users to official support when it can’t verify answers. The prompts also embed prompting tips (clear detail, examples, step-by-step reasoning, XML tags) and enforce style constraints such as avoiding lists in casual/empathetic chat. Safety instructions are extensive, including refusals for minors-related harm, nuclear weapons, malware/exploits, and malicious cyber requests. Leaked tool prompts add operational detail on web search limits, copyright-safe quoting, interleaved thinking, and artifact sandbox restrictions.
How do Claude 4 prompts handle questions about Claude itself without drifting into confident inaccuracies?
Why do the prompts emphasize that language models aren’t objective or infallible?
What safety boundaries are spelled out for self-harm-adjacent and mental-health-adjacent requests?
How do the prompts manage tone and formatting in everyday conversation?
What do the leaked tool prompts add beyond the published system prompts?
How can wording affect how many tools Claude uses?
Review Questions
- What mechanisms in the system prompts reduce the chance Claude invents product details about itself (like costs or message limits)?
- Which formatting rules govern casual/empathetic chat, and how do they differ from technical/report-style responses?
- How do the leaked tool prompts constrain web search outputs to manage copyright risk?
Key Points
- 1
Claude 4’s system prompts include identity/date context but direct Claude to avoid confident claims about product-specific details it can’t verify, pointing users to official support instead.
- 2
Claude is instructed to acknowledge bias and non-objectivity rather than present itself as an infallible, neutral source of truth.
- 3
Built-in prompting guidance encourages clear, detailed instructions, examples, step-by-step reasoning, and structured XML tags when relevant.
- 4
Safety rules are extensive: Claude should refuse guidance that enables harm (including minors-related exploitation, nuclear weapons, malware/exploits, and malicious cyber requests) even when framed as educational.
- 5
Response style is tightly controlled: avoid lists in casual/empathetic chat, use concise answers for simple questions, and tailor tone to conversation type.
- 6
Leaked tool prompts add operational detail on interleaved thinking, web search scaling, strict copyright-safe quoting, and sandbox restrictions for artifact generation (e.g., no localStorage/sessionStorage in artifacts).
- 7
Tool usage can scale with request wording—phrases associated with deeper analysis can trigger more tool calls than simpler phrasing.