Riverside Take 04 Feb 3 2025 from Nate

TL;DR

OpenAI’s deep research mode is described as spending up to 30 minutes browsing the web and returning a full, cited report for complex questions.

Briefing Cornell Notes

Briefing

A new “deep research” mode tied to OpenAI’s full o3 model is pushing AI research performance sharply higher—while a Japan press conference signals a fast-track push toward artificial general intelligence (AGI) and Japan-specific AI infrastructure. The most concrete product update is that deep research can spend up to 30 minutes browsing the web, then return a full-length report with citations for complex tasks like legal research, scientific questions, mathematics, history, or even language evolution. The pitch is that it behaves like web-browsing research that produces graduate-level outputs, but with noticeably higher quality than earlier “deep research”-named offerings.

At the same time, SoftBank chairman Masayoshi Son brought Sam Altman to Japan to discuss a major OpenAI funding deal separate from “Stargate,” and the public commitments went beyond money. Son said an AGI announcement would happen in Japan in less than two years, and Altman agreed. The event also introduced a Japan-focused “fork” of OpenAI: a special company building a model called “Crystal” that would sit inside Japanese companies’ firewalls and independently review, optimize, and maintain their source code. The described workflow goes further than static code review—Crystal is said to listen to calls and continuously update code, with benefits initially exclusive to Japanese companies.

The transcript frames these moves as more than standard corporate partnership. SoftBank is portrayed as seeking a legacy moment—showing it helped bring AGI to Japan rather than merely funding a U.S. company that achieves AGI elsewhere. That framing matters because it hints at how AI deployment could be regionalized: not just models and APIs, but governance, data access, and operational control inside company environments.

The product and performance claims arrive with a separate benchmark story. Deep research rolled out over the weekend and quickly improved results on “Humanity’s Last Exam,” a test used to gauge how well AI can handle difficult, exam-like tasks. The transcript cites a progression: o1 around 9%, o1’s successor R1 around 10–11%, o3 mini-high around 13% on Friday, and then a jump to 25% after deep research kicked in by Sunday/Monday Japan time—using full o3 rather than o3 mini. The implication is that research-mode tooling (time, browsing, and report generation) can materially change benchmark outcomes in days.

Finally, the press conference included an unusual remark from Son about AI not “eating people,” justified with the claim that AI doesn’t need protein for energy. The transcript treats the comment as eccentric but also as part of a broader message: the leadership involved appears to believe AGI is close enough to plan for now, even as the practical details—like what Crystal will do inside firms and how it affects OpenAI’s relationship with Microsoft—remain open questions.

Cornell Notes

OpenAI’s “deep research” mode is presented as a step up from earlier web-scraping research tools: it can browse the web for up to 30 minutes, then produce a full report with accurate citations for complex questions. The transcript ties this capability to the full o3 model (not o3 mini), positioning it as smarter and more capable for tasks that normally require hours of expert work. In parallel, a Japan press conference with Sam Altman and Masayoshi Son signals an AGI timeline—an AGI announcement in Japan in less than two years—and a Japan-focused initiative called “Crystal.” Crystal is described as a firewall-contained model that reviews and optimizes company source code and is initially exclusive to Japanese companies. Benchmark results on “Humanity’s Last Exam” reportedly jumped to 25% after deep research was enabled, suggesting research-mode tooling can quickly move performance.

What does “deep research” actually do, and what kinds of questions is it meant for?

Deep research is described as a mode where the model spends up to 30 minutes carefully browsing the web, then returns a full paper-style output with accurate citations. It’s framed as useful for tasks that would take hours to understand—examples include legal cases, scientific questions, mathematics, history, and even topics like the evolution of Sanskrit.

How is OpenAI’s deep research positioned relative to other similarly named products?

The transcript contrasts it with Google’s “deep research,” described as fast web scraping followed by summaries, and with DeepSeek, described as a separate model product. While all involve browsing and reporting, the claim is that OpenAI’s deep research produces higher-quality results—enough to warrant a side-by-side comparison later.

What commitments about AGI were made at the Japan press conference?

Masayoshi Son said an AGI announcement would be made in Japan in less than two years, and Sam Altman agreed. The transcript links this to a major funding deal for OpenAI separate from “Stargate,” suggesting the parties view AGI as close enough to plan around now.

What is “Crystal,” and how is it supposed to work inside companies?

Crystal is described as a Japan-focused “fork” initiative: a special company building a model for Japanese companies that independently reviews and optimizes source code. It would run inside each company’s firewall, listen to calls, and maintain and update code—benefits initially exclusive to Japanese companies.

How did deep research affect performance on “Humanity’s Last Exam,” according to the transcript?

The transcript gives a timeline of benchmark improvements: o1 around 9%, R1 around 10–11%, and o3 mini-high around 13% by Friday. After deep research was enabled over the weekend, full o3 reportedly reached 25% by Sunday/Monday Japan time, jumping from the prior 13% level within about two days.

Why does the transcript treat the Japan deal and Crystal initiative as more than a typical investment?

It argues SoftBank wants a legacy opportunity—demonstrating it helped bring AGI to Japan—rather than simply funding a U.S. company whose AGI achievement happens elsewhere. The Crystal plan also implies regionalized deployment: not just models, but firewall-contained code review and ongoing maintenance tailored to Japanese companies.

Review Questions

What operational steps does deep research perform (including time limits and output format), and how do those steps relate to the quality of citations?
How do the transcript’s benchmark numbers on “Humanity’s Last Exam” change before and after deep research is enabled?
What are the stated goals and constraints of the Crystal initiative, and why might firewall placement matter for adoption?

Key Points

1
OpenAI’s deep research mode is described as spending up to 30 minutes browsing the web and returning a full, cited report for complex questions.
2
The deep research capability is tied to the full o3 model (not o3 mini), with claims of higher quality than other “deep research” offerings.
3
Masayoshi Son and Sam Altman publicly aligned on an AGI announcement in Japan in less than two years.
4
A Japan-specific initiative called “Crystal” is described as a firewall-contained model that reviews, optimizes, and maintains company source code and can listen to calls.
5
The transcript links a major SoftBank funding deal for OpenAI to the Japan press conference, separate from “Stargate.”
6
Benchmark performance on “Humanity’s Last Exam” reportedly jumped to 25% after deep research was enabled, indicating research-mode tooling can quickly move results.
7
The Crystal plan is framed as initially exclusive to Japanese companies, implying a regional approach to AI deployment and control.

Highlights

Deep research is pitched as up to 30 minutes of web browsing followed by a full paper with accurate citations—aimed at tasks that normally take hours of expert effort.

A Japan press conference set a concrete AGI timeline: an AGI announcement in Japan in less than two years, with Sam Altman agreeing.

“Crystal” is described as a firewall-contained model for Japanese companies that independently reviews and optimizes source code and continuously maintains it.

“Humanity’s Last Exam” reportedly rose from about 13% to 25% within roughly two days after deep research was enabled on full o3.

Topics

Deep Research
AGI Timeline
Crystal Model
Source Code Review
Humanity’s Last Exam

Mentioned

Sam Altman
Masayoshi Son
AGI
o3
o3 mini
R1
Stargate