ADVANCED ChatGPT Prompt Engineering: 7+ Chain Prompts in the Tree of Thougts Principle

TL;DR

Generate multiple candidate solutions first, then force numeric evaluation and ranking before choosing anything.

Briefing Cornell Notes

Briefing

Chain prompting built on the “tree of thought” idea is presented as a practical way to beat brittle single-shot answers from GPT-style models. The core method is a seven-step loop: start by stating the problem, generate multiple candidate solutions, have the model evaluate and rank them, discard the weakest options, then repeatedly brainstorm new competitors that build on the current best idea. After several rounds of “fierce competition,” the process outputs a refined winner plus a deeper analysis of how that winner could work.

The workflow begins with Prompt 1 defining the problem in plain terms. Prompt 2 asks for three distinct solutions, explicitly factoring in the most important outcome drivers. Prompt 3 evaluates each solution and assigns a numeric probability score (ranked on a 1–100 scale in the example). Prompt 4 then removes the two lowest-ranked ideas and compresses the remaining best option into a single “winning ID” summary that includes its probability. The loop starts again: Prompt 5 keeps the winning idea and asks for two new creative alternatives that compete with it, producing three candidates total. Prompt 6 evaluates and ranks the new set, and Prompt 7 repeats the “keep the best, drop the rest” step—cycling this process about five times to search for a stronger overall answer.

A relationship scenario demonstrates the mechanics. The problem: a 27-year-old considering breaking up after six years, citing stagnation and growing apart. The first round yields three approaches—direct communication, gradual distance, and mutual decision. After ranking, the direct communication option scores highest (85), so it becomes the anchor for the next loop. In subsequent rounds, the model keeps that winning approach while generating two fresh alternatives to challenge it, aiming to avoid getting stuck in the same narrow set of ideas. After five loops, the direct communication approach remains the top choice at 85, with a close runner-up at 80 (a “therapeutic intervention approach”). The presenter notes a practical limitation: as loops continue, the model can run out of context, which can cause repetition.

The final step adds depth. A “deep in the thought process” prompt produces scenario planning: implementation strategies, potential partnerships and resources, obstacles and mitigations, and possible unexpected outcomes and responses.

To make the technique usable, the transcript shifts from manual prompting to automation. A Python script is described as chaining the prompts and running the loop automatically, wrapped in a simple web UI with a progress bar. The script is then tested on a classic common-sense failure: drying clothes. The example claims that drying five wet items takes five hours, so drying 30 items would take 30 hours—an answer attributed to GPT-4 in the story. Running the same problem through the tree-of-thought chain yields a different result: drying all 30 simultaneously would still take five hours, assuming similar conditions and adequate airflow. The takeaway is that structured exploration—generate, rank, prune, and iterate—can improve reasoning and correct for assumptions that a single pass might miss.

Cornell Notes

The transcript presents a chain-prompting method based on “tree of thought” search. It starts by generating multiple candidate solutions, then has the model evaluate and rank them with numeric probability scores, discarding the weakest options. The best remaining idea is fed back into a loop where two new alternatives are brainstormed to compete against it, repeating about five times to find a stronger winner. A final prompt then expands the winning idea into implementation scenarios, resources, obstacles, and unexpected responses. The approach matters because it turns one-shot answers into iterative reasoning, and the examples suggest it can fix common-sense-style errors (like the clothes-drying problem) that a single response may get wrong.

How does the prompting loop decide which ideas survive to the next round?

Each cycle generates three candidate solutions, then assigns each a probability score (ranked on a 1–100 scale in the example). The process keeps only the highest-ranked “winning ID” and explicitly removes the two lowest-ranked options. That winning ID is then carried into the next loop as the anchor for generating two new competing ideas, so the search narrows while still exploring fresh alternatives.

What does “tree of thought” mean operationally in this workflow?

It’s implemented as repeated branching and pruning. Branching happens when the model brainstorms multiple distinct solutions (three in the example). Pruning happens when the model evaluates and ranks them, then discards the weakest two. The loop repeats several times, each time branching from the current best idea by adding two new alternatives that challenge it.

Why does the relationship example end up favoring “direct communication” at 85?

The first set of three solutions—direct communication, gradual distance, and mutual decision—gets evaluated and ranked. Direct communication receives the top probability score (85), so it becomes the winning ID. In later loops, the model keeps that winning approach while generating two new alternatives to compete with it, and after five loops direct communication remains the highest-scoring option. A close competitor appears at 80 (therapeutic intervention), but it doesn’t surpass 85.

What limitation appears when the loop runs many iterations?

The transcript notes that repetition can occur because the model runs out of context. As the loop continues, the available context window constrains how much fresh information can be retained, which can reduce diversity and cause the process to drift toward similar outputs.

How does the clothes-drying example illustrate the benefit of chain prompting?

A common-sense-style setup claims five clothes dry in five hours, so 30 clothes would take 30 hours. The tree-of-thought chain produces a different solution: drying all 30 simultaneously would take five hours, assuming similar environmental conditions and adequate airflow. The transcript frames this as a case where structured reasoning and iterative search correct an assumption that a single-shot answer might miss.

What role does automation play in making this method practical?

Manual chaining is described as time-consuming, so a Python script automates the full process. The script chains the prompts, runs the loop automatically, and is paired with a simple web UI that shows progress. The automation also makes it easy to test the method on new problems by entering a new prompt in the interface.

Review Questions

In the described loop, what exact steps correspond to branching, evaluation, and pruning?
Why might repeated loops lead to less variety in candidate solutions?
In the clothes-drying scenario, what assumption is necessary for the “five hours for 30 clothes” answer to hold?

Key Points

1
Generate multiple candidate solutions first, then force numeric evaluation and ranking before choosing anything.
2
Prune aggressively: discard the two lowest-ranked ideas and keep only the highest-scoring “winning ID.”
3
Use the winning idea as an anchor for the next round, then brainstorm two new alternatives to keep exploration alive.
4
Repeat the generate–rank–prune cycle several times (about five in the example) to search for a stronger final answer.
5
Add a final refinement step that turns the winning idea into actionable scenarios, resources, obstacles, and contingencies.
6
Automate the prompt chain with a Python script and a simple UI to reduce manual time and make testing easier.
7
Expect context-window limits to affect later iterations, potentially increasing repetition.

Highlights

The method repeatedly keeps only the top-ranked idea (by probability score) while generating two fresh competitors each loop, turning one-shot answers into iterative search.

A relationship scenario demonstrates the loop’s mechanics: direct communication stays on top at 85 after multiple rounds, with therapeutic intervention trailing at 80.

The clothes-drying test reframes the problem: with adequate airflow and similar conditions, drying 30 items simultaneously can still take five hours—contradicting a naive 30-hour assumption.

Automation via Python plus a web UI is positioned as the practical way to run the full chain without spending time manually re-entering prompts.

Topics

Tree of Thought Prompting
Chain Prompts
Prompt Evaluation
Python Automation
Common-Sense Reasoning

Mentioned

GPT-4