Introduction to Deep Research
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Deep research is an agentic web-browsing capability that can run for 5–30 minutes to produce comprehensive, fully cited research reports.
Briefing
OpenAI is rolling out “Deep research,” a new agentic capability that can browse the internet for many minutes, synthesize what it finds, and return a comprehensive, fully cited research report—essentially turning open-ended web searching into analyst-style deliverables. The key shift is that Deep research removes the usual speed/latency expectations placed on models, allowing it to run for roughly 5 to 30 minutes so it can plan, gather evidence across multiple sources, and adapt its approach as new information appears.
The motivation traces back to OpenAI’s reasoning models in the O Series, including o1, which can “think for a long time” but previously lacked crucial tool access—especially reliable internet browsing. Deep research fills that gap by conducting multi-step research on the web: it discovers relevant content, synthesizes it, and reasons over it while updating its plan as it uncovers more. The output is positioned as something close to what an analyst or domain expert might produce, with citations and structured formatting rather than a quick summary.
The rollout is tied to practical use cases across knowledge work and beyond. For work, Deep research is framed as a way to reduce the manual labor of gathering and reconciling information—such as market research, academic literature review, or building slide-ready content. For personal tasks, it’s pitched as a tool for high-stakes purchases and planning: for example, researching skis for conditions in Japan, then producing recommendations with a table comparing options.
A live walkthrough in ChatGPT shows how Deep research handles ambiguity. When given a complex prompt—like analyzing iOS and Android adoption rates, language-learning interest, and mobile penetration changes across developed versus developing countries—it first asks clarifying questions about assumptions (overall vs. category-specific adoption, how to interpret “mobile penetration,” and whether to focus on general or engaged interest). After requirements are set, it begins browsing and reasoning under the hood, opening pages and extracting information from multiple formats including images, tables, and PDFs. It also uses information from one search step to guide subsequent searches.
Under the hood, Deep research is powered by a fine-tuned version of the soon-to-be-released o3 reasoning model. Training used end-to-end reinforcement learning on “hard browsing” and other reasoning tasks, teaching the system to plan and execute multi-step trajectories, react to real-time information, and backtrack when needed. The model can browse over user-uploaded files, use a Python tool for calculations and generating plots, embed those plots in its final response, and embed images from websites. Citations are described as sentence- and passage-level.
Performance claims include a new high of 26.6% accuracy on the “Humanity’s Last Exam” benchmark from the Center for AI Safety and Scale AI, plus strong results on other evaluations requiring web browsing, multimodal capability, code execution, and reasoning over files. Internal expert evaluations emphasize that task success correlates more with economic value than with raw time-to-complete, and that allowing more tool calls improves performance. Even so, the team warns that hallucinations remain possible, urging users to verify sources.
Deep research launches later today in Pro, with plans to roll out to Plus and Team next, followed by Education and Enterprise. Longer term, OpenAI points to an AGI roadmap where agents can run longer and connect to custom context—such as enterprise data stores—so the same browsing-and-synthesis agent can operate on proprietary knowledge as well as public web content.
Cornell Notes
Deep research is an agentic capability that can browse the internet for 5–30 minutes, then synthesize findings into a comprehensive, fully cited research report. It’s designed to overcome a key limitation of earlier reasoning models: strong “thinking” without reliable access to web tools. In ChatGPT, it can ask clarifying questions up front, then iteratively search, open pages, extract information from multiple formats (including tables, PDFs, and images), and adapt its plan as new evidence appears. Powered by a fine-tuned o3 reasoning model trained with end-to-end reinforcement learning on hard browsing, it can also run calculations via Python and embed plots and images in final outputs. The rollout begins in Pro, with broader availability planned, and the longer-term goal is connecting such agents to custom enterprise context.
What problem does Deep research solve compared with earlier reasoning models like o1?
Why does Deep research allow long runtimes (5–30 minutes), and what changes as a result?
How does Deep research handle ambiguous or underspecified requests?
What does Deep research do “under the hood” while browsing?
What tools and capabilities does Deep research have beyond web browsing?
How is Deep research evaluated, and what cautions come with the performance claims?
Review Questions
- When would you want to use Deep research’s clarifying-question step, and what kinds of assumptions should you specify to get better outputs?
- How do long runtimes (5–30 minutes) change the research workflow compared with typical fast Q&A models?
- What additional capabilities (beyond browsing) does Deep research have for producing reports, calculations, and visualizations?
Key Points
- 1
Deep research is an agentic web-browsing capability that can run for 5–30 minutes to produce comprehensive, fully cited research reports.
- 2
The main upgrade over earlier reasoning models is tool access—especially multi-step internet browsing—so the system can gather evidence rather than rely on memory alone.
- 3
Deep research can ask clarifying questions first, then iteratively search, open pages, extract information from multiple formats, and adapt its plan as it learns more.
- 4
Deep research is powered by a fine-tuned version of the soon-to-be-released o3 reasoning model trained with end-to-end reinforcement learning on hard browsing and reasoning tasks.
- 5
The system can browse user-uploaded files, use a Python tool for calculations and plots, and embed both plots and website images in final outputs.
- 6
Reported benchmark performance includes 26.6% accuracy on Humanity’s Last Exam, with additional gains on evaluations requiring browsing, multimodal understanding, and code execution.
- 7
OpenAI warns that hallucinations remain possible, so users should verify citations when accuracy matters.