Does ChatGPT Have a Theory of Mind? - Prompt Engineering Principles

TL;DR

Theory of mind is defined as attributing beliefs, intentions, knowledge, and emotions to others, then predicting behavior based on those mental states.

Briefing Cornell Notes

Briefing

ChatGPT can produce answers that resemble “theory of mind” reasoning—inferring what someone else believes, knows, or feels—when prompted to list the assumptions behind a scenario and then choose the most likely interpretation. The core takeaway is that, with the right system-level instructions, the model doesn’t just pick an outcome from the facts on the page; it tries to model a person’s perspective, including gaps in information and emotionally driven behavior.

The transcript starts by defining theory of mind as the cognitive ability to attribute mental states—beliefs, intentions, desires, knowledge, and emotions—to oneself and others. A classic test case is the “crayons in boxes” setup: Alice places crayons in a blue box, leaves, Bob secretly moves them to a red box, and then Alice returns. Without theory of mind, a person might assume Alice knows the crayons are now in the red box. With theory of mind, the correct prediction is that Alice will look in the blue box because she holds an outdated belief based on what she last saw.

To probe whether ChatGPT can mimic that kind of perspective-taking, the experiment uses custom instructions (via “custom instructions”) that force a structured response: for each scenario, the model must (1) list the assumptions or knowledge needed to answer, (2) provide the most likely interpretation grounded in human psychology, and (3) give a final answer. Several made-up scenarios then test whether the model correctly accounts for what a character knows—or doesn’t know.

In a car-and-garage example, the model predicts that the person returning from the gym will assume the car is where it was last seen, because the returning partner was absent when the car was moved. In a grief scenario involving Joy losing her father, the model leans on social norms and emotional constraints to conclude that a planned Las Vegas trip is likely postponed or canceled, even without direct confirmation from Joy. In a self-diagnosis scenario, it predicts heightened anxiety (cyberchondria) when someone searches symptoms online and finds a match to a severe condition, while also noting that serious diagnoses are not the only possibilities.

The final test uses loss aversion: Tom is offered a bet where he loses $200 if heads and wins $250 if tails. The model’s likely choice is not to take the bet, reasoning that people tend to weigh losses more heavily than equivalent gains, even when the bet is framed as favorable in expected value.

Across these problems, the transcript’s conclusion is that the structured prompting approach yields answers that align with the expected “theory of mind” outcomes—especially where perspective differences or emotion-driven behavior matter. The results aren’t treated as certainty, but the model’s consistency across multiple, independently created scenarios is presented as strong evidence that it can perform a practical form of theory-of-mind-like inference when guided to do so.

Cornell Notes

Theory of mind is the ability to infer what other people believe, know, and feel. The transcript tests whether ChatGPT can mimic that by using a system-style instruction that forces it to (1) list the assumptions needed for each scenario, (2) interpret the situation using human psychology, and (3) give a final answer. In a “crayons moved while Alice was away” style logic, the model predicts Alice’s actions based on her outdated belief. Other scenarios show perspective and emotion modeling: it expects a grieving person to postpone a trip, predicts anxiety from symptom searches that match severe illnesses, and applies loss aversion to a coin-flip bet. The practical implication is that structured prompting can make the model reason more like a perspective-taker than a simple fact matcher.

How does theory of mind change the answer in the crayons-in-boxes scenario?

Theory of mind requires tracking what Alice knows. Alice last saw the crayons in the blue box and was absent when Bob moved them to the red box. So Alice’s belief is outdated: she will look in the blue box, even though Bob (and an observer with full information) knows the crayons are now in the red box.

Why does the car-and-garage scenario lead to the prediction that the returning person assumes the car is still outside?

The key assumption is information asymmetry: the returning person parked outside before leaving, but the partner moved the car into the garage while the person was away. Because the returning person didn’t witness the move and lacks that new information, the most likely interpretation is that they will expect the car to be where it was last seen—outside the garage.

What mental-state reasoning drives the conclusion that Joy’s Las Vegas trip is likely postponed or canceled?

The model treats grief as a state that affects attention, energy, and social willingness. Given Joy lost her father three days earlier and hasn’t been in regular contact beyond condolences, the most likely interpretation is that she won’t be emotionally ready for a leisure trip. It also leans on social expectations: people often attend memorial services and spend time with family soon after a loss.

Why does the symptom-search scenario produce a cyberchondria-style outcome?

The reasoning centers on how people react when online information matches a feared diagnosis. The model assumes that seeing stomach cancer as aligned with symptoms can trigger anxiety or fear, even if the symptoms could fit many less serious conditions. The “most sensible course” is to consult a health professional, but the immediate psychological response is predicted to be worry.

How does loss aversion outweigh expected value in the coin-flip bet scenario?

Even though the bet is framed as favorable in expected value (win $250 on tails, lose $200 on heads), the model predicts Tom may decline. The explanation is that losses loom larger than gains: the pain of losing $200 can outweigh the pleasure of winning $250, especially for someone with limited resources like a college student.

Review Questions

In the crayons scenario, what specific piece of information must the model track to predict where Alice will look?
Which prompting constraint in the transcript most directly encourages theory-of-mind-like reasoning, and why?
Compare the grief and cyberchondria scenarios: what kinds of mental states (belief vs. emotion) are being inferred?

Key Points

1
Theory of mind is defined as attributing beliefs, intentions, knowledge, and emotions to others, then predicting behavior based on those mental states.
2
A structured prompting approach (list assumptions, interpret using human psychology, then answer) is used to elicit perspective-taking from ChatGPT.
3
In information-asymmetry problems, the model predicts actions based on what a character last knew, not what is true in the observer’s full-information reality.
4
Emotion and social norms matter in scenario reasoning: grief is treated as a likely reason to postpone leisure plans.
5
Online self-diagnosis is linked to anxiety escalation (cyberchondria) when severe conditions appear to match symptoms.
6
Loss aversion can dominate expected-value calculations, leading to a preference to avoid potential losses even when a bet is favorable in expectation.

Highlights

ChatGPT’s answers shift when it’s forced to model what a character believes rather than what is objectively true.

The car-in-garage example hinges on the returning person’s missing information: they act on the last observed state.

The grief scenario treats emotional readiness and social expectations as decisive factors for whether a trip happens.

The coin-flip bet illustrates how loss aversion can flip a decision away from a positive expected value outcome.

Topics

Theory of Mind
Prompt Engineering
Perspective Taking
Cyberchondria
Loss Aversion