Get AI summaries of any video or article — Sign up free
DEEP Thoughts into Open AI’s DEEP Research feature thumbnail

DEEP Thoughts into Open AI’s DEEP Research feature

MattVidPro·
6 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Deep Research turns ChatGPT into an autonomous research agent that iterates through search, reading, reflection, and synthesis rather than producing one-shot answers.

Briefing

OpenAI’s “Deep Research” is being positioned as a shift from chat-based answers to autonomous, multi-step research that can browse, read sources, and produce citation-backed reports in minutes—making high-end knowledge work cheaper and faster than traditional human research. The feature sits inside ChatGPT for Pro users via a “Deep research” toggle, and it’s designed to run iterative search-and-synthesis loops: it asks follow-up questions to clarify scope, gathers sources (often from .edu/.gov and research publications), pauses to reflect on gaps or inconsistencies, and then compiles a structured write-up with inline citations.

Technically, OpenAI describes Deep Research as powered by a specialized o3 reasoning model for research-focused browsing, text processing, and data analysis, while the user-selected ChatGPT model may handle the final compilation and writing. In practice, it behaves like an agent: it starts a search, iterates, reads documents (including PDFs and images, with image/data-visualization output expected later), and updates its approach as new information appears. The transcript emphasizes that this “think, search, iterate” behavior can correct inefficient research paths by learning from early findings, and it can backtrack when contradictions or missing pieces show up—capabilities that mirror steps researchers take when drafting real papers.

A major theme is cost and speed. Deep Research is compute-heavy, with response times reported around 5–30 minutes per query (often 7–15 minutes in testing). Even so, the creator argues it can beat the time required for a comparable human literature review and draft—work that might take 20+ hours—while producing outputs that align with academic conventions (including APA/MLA formatting choices). The reports also include detailed logs of the research process and inline citations, which the transcript frames as crucial for trust and verification.

To demonstrate capability, two example prompts are used. One generates a detailed, source-rich paper on medical breakthroughs and AI-related developments, starting from early milestones (like the 1971 internist-1 diagnosis program), then moving through landmarks such as IBM Watson’s medical research use, Google’s deep learning for medical imaging, and DeepMind’s AlphaFold 2 protein-folding breakthrough. Another prompt asks for research into AI growth trends and whether an “inflection point” is near, producing quantitative claims about compute scaling (including a cited 300× training compute increase from 2012–2018), scaling laws, investment trends, publication growth, and potential triggers for acceleration—ranging from major compute breakthroughs to new architectures, recursive self-improvement, and coordinated “AI scientist” scenarios.

The transcript also stresses implications beyond drafting: Deep Research is framed as a stepping stone toward more general, agentic intelligence because it can connect disparate domains, generate novel hypotheses, and apply knowledge rather than merely summarize. At the same time, it’s not treated as fully autonomous—human prompting and oversight remain necessary, and hallucinations are still possible, so re-checking outputs is recommended.

Finally, the creator highlights a free alternative: Perplexity’s “deep research.” It’s offered with a limited number of free prompts per day and is compared on benchmarks (including “Humanity’s last exam” and simple QA accuracy), where it’s claimed to outperform many competitors and even approach OpenAI’s Deep Research. One test run is described as failing to complete the final answer despite collecting many sources, underscoring that free alternatives may still have reliability gaps. Overall, Deep Research is presented as a practical turning point: autonomous research agents that can browse, cite, and synthesize are arriving now, and they’re likely to reshape how science, industry analysis, and high-level writing get done.

Cornell Notes

Deep Research is framed as a move from one-off Q&A to an autonomous research agent inside ChatGPT that can browse, read sources, iterate, and produce structured, citation-backed reports. It’s designed to ask clarifying follow-up questions, adapt its search strategy based on early findings, and backtrack when gaps or inconsistencies appear—behaviors that resemble how humans write real papers. The transcript highlights compute-heavy performance (often 7–15 minutes) but argues it can still beat the time cost of human literature review and drafting. Example outputs include a medical/AI breakthroughs paper and an analysis of AI growth trends and potential “tipping point” dynamics. A free competitor, Perplexity’s deep research, is also tested and compared on benchmarks, though at least one run fails to finish the final response.

What makes Deep Research different from standard chat responses?

Deep Research is built for multi-step research: it runs an iterative loop of web search, document reading, and synthesis. It typically starts by asking follow-up questions to clarify scope (time range, emphasis areas, and formatting preferences), then gathers sources (often from .edu/.gov and research publications), and finally compiles a structured report with inline citations. It can pause to reflect on what it has found and backtrack to address gaps or inconsistencies, rather than producing a single-pass answer.

How does Deep Research handle research quality and academic formatting?

The transcript claims Deep Research prioritizes trustworthy sources such as research papers and quotes from established experts, and it often pulls from universities and government domains. It also supports academic-style formatting choices—e.g., prompting for APA vs. MLA—and includes inline citations plus detailed logs of the research process. In testing, it produced a paper-like structure that begins with historical milestones, then organizes content by categories (diagnostics, drug discovery, robotic surgery, etc.).

What evidence is used in the transcript to argue Deep Research is “agentic” and not just summarization?

The transcript points to behaviors like follow-up questions before starting research, ongoing updates to the research approach based on new data, and backtracking when contradictions or missing information appear. It also highlights that Deep Research can draw conclusions and make novel connections—such as analyzing AI growth trends and identifying potential triggers for acceleration—rather than only restating retrieved facts.

What quantitative claims appear in the AI growth “tipping point” example?

The transcript cites a compute scaling claim: from 2012 to 2018 (pre-ChatGPT era), training compute increased by about 300× and is described as exponential, with a cited “3.4 month doubling time” that outpaces Moore’s law. It also references foundation models (GPT-3, GPT-4, BERT, PaLM), AlphaFold 2, scaling of investment (e.g., $93.5B in 2021 and generative AI funding growing sharply by 2023), and publication growth (AI publications nearly tripling from 88,000 to 240,000 per year between 2010 and 2022).

How does the free alternative (Perplexity deep research) compare, and what reliability issue shows up?

Perplexity’s deep research is described as free with five prompts per day and is compared on benchmarks like “Humanity’s last exam” (claimed 21.1% for Perplexity deep research vs. 26.6% for OpenAI’s Deep Research) and a QA accuracy benchmark (claimed 93.9%). In a test run using the same AI “tipping point” prompt, it gathered many sources (42) but the final answer appears cut off/unfinished, suggesting reliability gaps even when research collection succeeds.

Review Questions

  1. How do follow-up questions, backtracking, and reflection distinguish Deep Research from a single-pass summarizer?
  2. What categories and milestones does the medical/AI breakthroughs example use to structure its report, and why does that structure matter?
  3. Which triggers for AI acceleration are discussed in the “tipping point” analysis, and what assumptions underlie those triggers?

Key Points

  1. 1

    Deep Research turns ChatGPT into an autonomous research agent that iterates through search, reading, reflection, and synthesis rather than producing one-shot answers.

  2. 2

    The feature is available to ChatGPT Pro users via a “Deep research” toggle and is described as using a specialized o3 reasoning model for research tasks.

  3. 3

    Reports are generated with inline citations, detailed research logs, and academic formatting options such as APA or MLA.

  4. 4

    Deep Research is compute-intensive, with reported runtimes often in the 7–15 minute range per query, but it’s argued to be faster and cheaper than equivalent human research.

  5. 5

    In examples, Deep Research organizes outputs like real papers—starting with historical context, then breaking down by categories (e.g., diagnostics and drug discovery) and ending with conclusions.

  6. 6

    The transcript frames Deep Research as a stepping stone toward more agentic, generalizable intelligence because it can connect domains and produce novel analysis, not just summarize sources.

  7. 7

    Perplexity’s deep research is presented as a free alternative with benchmark comparisons, but at least one test run fails to complete the final response despite collecting many sources.

Highlights

Deep Research is presented as a “think–search–iterate” agent that can ask clarifying questions, adapt its strategy midstream, and backtrack to close research gaps.
A medical/AI breakthroughs example starts from early diagnostic systems and builds through landmarks like IBM Watson in medicine and DeepMind’s AlphaFold 2, using category-based structure and citations.
The AI growth “tipping point” example ties acceleration to compute scaling, investment and publication growth, and speculative triggers like recursive self-improvement and coordinated “AI scientist” scenarios.
Perplexity’s free deep research is benchmarked against OpenAI’s Deep Research, but one run collects 42 sources and then appears to cut off before finishing the answer.

Topics