Does ChatGPT Have a Theory of Mind? - Prompt Engineering Principles
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Theory of mind is defined as attributing beliefs, intentions, knowledge, and emotions to others, then predicting behavior based on those mental states.
Briefing
ChatGPT can produce answers that resemble “theory of mind” reasoning—inferring what someone else believes, knows, or feels—when prompted to list the assumptions behind a scenario and then choose the most likely interpretation. The core takeaway is that, with the right system-level instructions, the model doesn’t just pick an outcome from the facts on the page; it tries to model a person’s perspective, including gaps in information and emotionally driven behavior.
The transcript starts by defining theory of mind as the cognitive ability to attribute mental states—beliefs, intentions, desires, knowledge, and emotions—to oneself and others. A classic test case is the “crayons in boxes” setup: Alice places crayons in a blue box, leaves, Bob secretly moves them to a red box, and then Alice returns. Without theory of mind, a person might assume Alice knows the crayons are now in the red box. With theory of mind, the correct prediction is that Alice will look in the blue box because she holds an outdated belief based on what she last saw.
To probe whether ChatGPT can mimic that kind of perspective-taking, the experiment uses custom instructions (via “custom instructions”) that force a structured response: for each scenario, the model must (1) list the assumptions or knowledge needed to answer, (2) provide the most likely interpretation grounded in human psychology, and (3) give a final answer. Several made-up scenarios then test whether the model correctly accounts for what a character knows—or doesn’t know.
In a car-and-garage example, the model predicts that the person returning from the gym will assume the car is where it was last seen, because the returning partner was absent when the car was moved. In a grief scenario involving Joy losing her father, the model leans on social norms and emotional constraints to conclude that a planned Las Vegas trip is likely postponed or canceled, even without direct confirmation from Joy. In a self-diagnosis scenario, it predicts heightened anxiety (cyberchondria) when someone searches symptoms online and finds a match to a severe condition, while also noting that serious diagnoses are not the only possibilities.
The final test uses loss aversion: Tom is offered a bet where he loses $200 if heads and wins $250 if tails. The model’s likely choice is not to take the bet, reasoning that people tend to weigh losses more heavily than equivalent gains, even when the bet is framed as favorable in expected value.
Across these problems, the transcript’s conclusion is that the structured prompting approach yields answers that align with the expected “theory of mind” outcomes—especially where perspective differences or emotion-driven behavior matter. The results aren’t treated as certainty, but the model’s consistency across multiple, independently created scenarios is presented as strong evidence that it can perform a practical form of theory-of-mind-like inference when guided to do so.
Cornell Notes
Theory of mind is the ability to infer what other people believe, know, and feel. The transcript tests whether ChatGPT can mimic that by using a system-style instruction that forces it to (1) list the assumptions needed for each scenario, (2) interpret the situation using human psychology, and (3) give a final answer. In a “crayons moved while Alice was away” style logic, the model predicts Alice’s actions based on her outdated belief. Other scenarios show perspective and emotion modeling: it expects a grieving person to postpone a trip, predicts anxiety from symptom searches that match severe illnesses, and applies loss aversion to a coin-flip bet. The practical implication is that structured prompting can make the model reason more like a perspective-taker than a simple fact matcher.
How does theory of mind change the answer in the crayons-in-boxes scenario?
Why does the car-and-garage scenario lead to the prediction that the returning person assumes the car is still outside?
What mental-state reasoning drives the conclusion that Joy’s Las Vegas trip is likely postponed or canceled?
Why does the symptom-search scenario produce a cyberchondria-style outcome?
How does loss aversion outweigh expected value in the coin-flip bet scenario?
Review Questions
- In the crayons scenario, what specific piece of information must the model track to predict where Alice will look?
- Which prompting constraint in the transcript most directly encourages theory-of-mind-like reasoning, and why?
- Compare the grief and cyberchondria scenarios: what kinds of mental states (belief vs. emotion) are being inferred?
Key Points
- 1
Theory of mind is defined as attributing beliefs, intentions, knowledge, and emotions to others, then predicting behavior based on those mental states.
- 2
A structured prompting approach (list assumptions, interpret using human psychology, then answer) is used to elicit perspective-taking from ChatGPT.
- 3
In information-asymmetry problems, the model predicts actions based on what a character last knew, not what is true in the observer’s full-information reality.
- 4
Emotion and social norms matter in scenario reasoning: grief is treated as a likely reason to postpone leisure plans.
- 5
Online self-diagnosis is linked to anxiety escalation (cyberchondria) when severe conditions appear to match symptoms.
- 6
Loss aversion can dominate expected-value calculations, leading to a preference to avoid potential losses even when a bet is favorable in expectation.