ChatGPT Prompt Engineering Principles: Chain of Thought Prompting

TL;DR

Chain-of-thought prompting works best for problems that decompose naturally into dependent sub-steps rather than single-shot questions.

Briefing Cornell Notes

Briefing

Chain-of-thought prompting—breaking a riddle into a sequence of sub-problems, solving each in order, then combining the results—can dramatically improve an LLM’s accuracy on tasks that fail when answered “from the last sentence.” The core idea is simple: when a question depends on intermediate facts (location, identity, relationships, hidden constraints), forcing the model to enumerate the needed steps reduces guesswork and helps it avoid jumping to an unsupported conclusion.

The approach is presented as a reusable “principle,” not a universal fix. It’s most effective when the problem naturally decomposes into a chain of reasoning steps. The method starts by instructing the model to list, systematically and in detail, the sub-problems required to reach the final answer. After that checklist is produced, the model is prompted to solve each sub-problem—often using a “highest probability” preference when certainty isn’t available—before wrapping up with the final response.

A first example tests the method on a multi-hop riddle about Michael, a 31-year-old American, who is in France looking at a famous museum painting. The riddle then links the painting’s artist to Michael’s favorite childhood cartoon character, and asks for the country of origin of an object the character holds. A straightforward attempt to answer directly fails repeatedly. With chain-of-thought prompting, the model first identifies the needed intermediate pieces: Michael’s location (museum), the most famous painting, the painting’s artist, the cartoon character, and the object the character holds. It then fills them in step-by-step: Michael is inferred to be at the Louvre in France; the painting is the Mona Lisa by Leonardo da Vinci; the cartoon character is guessed as Teenage Mutant Ninja Turtles’ Leonardo (chosen as the most likely match given the Renaissance-artist naming pattern); and the object is a katana. With those components assembled, the final question—country of origin of the katana—lands on Japan. The key takeaway is that the model can reach a correct answer only after being guided through the riddle’s dependency chain.

A second example contrasts chain-of-thought behavior against a zero-shot response. The riddle: a ball is placed into a small box missing the bottom, carried to a postal office, placed into a bigger box, and shipped to a friend in New York; the question asks where the ball ends up. A zero-shot answer claims the ball is in the bigger box, which the narrator rejects because the small box’s missing bottom should allow the ball to fall out. When chain-of-thought prompting is used, the model enumerates additional uncertainties and intervening actions—whether the ball falls out when the box is moved, where it could land during transit, and how the end state is defined. It ultimately concludes the ball most likely fell out of the small box either in the office or on the way to the postal office, and after forcing a single final choice, it selects the office as the most probable location. The result is treated as an improvement because it aligns with the physical constraint introduced by the missing bottom and shows more careful handling of ambiguous steps.

Overall, the transcript frames chain-of-thought prompting as a practical prompt-engineering technique for riddles and multi-step problems: it turns “one-shot guessing” into structured decomposition, which can convert previously unsolvable questions into solvable ones.

Cornell Notes

Chain-of-thought prompting improves LLM performance on multi-step riddles by forcing a decomposition into intermediate sub-problems. Instead of answering directly, the model is instructed to list the steps needed to reach the final result, then solve each step in sequence and combine them. In a museum-and-cartoon riddle, this method produces the correct chain: Louvre → Mona Lisa → Leonardo da Vinci → Teenage Mutant Ninja Turtles’ Leonardo → katana → Japan. In a ball-and-shipping riddle, zero-shot reasoning incorrectly places the ball in the bigger box, while chain-of-thought reasoning accounts for the missing-bottom constraint and concludes the ball most likely fell out before shipping, with the office as the best final guess.

When does chain-of-thought prompting tend to work best, according to the transcript?

It’s framed as a technique that doesn’t fit every problem, but works well when the task naturally breaks into a sequence of dependent steps. The riddle-style examples are used to show that if the final answer depends on intermediate facts—like identifying a museum, then a painting, then an artist, then a cartoon character—direct answering from the last sentence is unreliable. In those cases, prompting the model to list and solve sub-problems in order improves output.

How does the museum/cartoon riddle demonstrate the benefit of step decomposition?

Direct attempts fail, but chain-of-thought prompting yields a structured checklist: (1) determine Michael’s location (museum in France), (2) identify the most famous painting, (3) identify the artist, (4) determine the cartoon character linked to the artist, and (5) identify the object the character holds. Solving those steps leads to: Louvre → Mona Lisa → Leonardo da Vinci → Teenage Mutant Ninja Turtles’ Leonardo (chosen as the most likely match) → katana → Japan.

Why does the transcript mention “highest probability” during intermediate steps?

When the model isn’t 100% certain—especially for parts of the riddle that require a guess—the transcript describes prompting it to choose the highest-probability option to keep progress moving. For example, the cartoon character step doesn’t have a clear deterministic answer from the riddle alone, so the model selects Teenage Mutant Ninja Turtles’ Leonardo as the most likely candidate, based on the Renaissance-artist naming pattern.

What went wrong with the zero-shot answer in the ball/shipping riddle?

A zero-shot response places the ball in the bigger box shipped to New York. The transcript rejects that because the small box is missing the bottom, so the ball should fall out when the box is moved or handled. The chain-of-thought version explicitly reasons about that physical constraint and the possible points where the ball could escape.

How does chain-of-thought reasoning change the ball/shipping riddle’s conclusion?

With chain-of-thought prompting, the model enumerates additional uncertainties and intervening actions: where the ball is immediately after placement, whether it falls out when the box is lifted or transported, and where it could end up before the bigger box is sealed. The resulting highest-probability outcome is that the ball most likely fell out of the small box either in the office or on the way to the postal office; after being pressed for a single final answer, it selects the office as the most likely location.

What is the practical “workflow” implied by the transcript?

First, prompt the model to list all sub-problems needed to solve the main question. Next, solve each sub-problem sequentially, using highest-probability guesses when certainty is impossible. Finally, verify that all required intermediate facts are present before producing the final answer.

Review Questions

Give an example of a clue in a riddle that would force you to decompose the problem into sub-questions rather than answering directly.
In the museum/cartoon example, which intermediate step is the biggest source of uncertainty and why does the transcript treat it differently?
In the ball/shipping example, what specific physical detail drives the difference between zero-shot and chain-of-thought outcomes?

Key Points

1
Chain-of-thought prompting works best for problems that decompose naturally into dependent sub-steps rather than single-shot questions.
2
Start by instructing the model to list, systematically, every intermediate problem required to reach the final answer.
3
Solve sub-problems in order, then synthesize the final response only after the intermediate pieces are filled in.
4
When certainty is impossible, directing the model to choose the highest-probability option can keep the reasoning moving.
5
Zero-shot answers can miss constraints embedded in the prompt (e.g., a missing-bottom box), while chain-of-thought reasoning tends to account for them.
6
For ambiguous riddles, chain-of-thought can surface uncertainties and possible intervening actions, leading to a more defensible “most likely” conclusion.

Highlights

Chain-of-thought prompting turns an unsolved riddle into a solvable chain by forcing intermediate identification: museum → painting → artist → cartoon character → object → country.

A zero-shot ball-shipping answer ignored the missing-bottom constraint, while chain-of-thought reasoning accounted for where the ball could fall out during handling.

The method uses “highest probability” guesses to handle uncertain steps without stalling the entire solution.

The technique emphasizes producing a checklist of sub-problems before attempting the final answer.

Topics

Chain-of-Thought Prompting
Prompt Engineering
Riddle Solving
LLM Reasoning
Zero-Shot vs Stepwise