Best AI Tools for Deep Research (Ranked by a PhD, Not Hype)
Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Score deep-research tools using workflow criteria: recency, citation volume, clarity, multimedia usefulness, and exportability—not just how much text is generated.
Briefing
Deep-research tools for academia are only useful if they deliver recent, well-cited scholarship in a form researchers can actually use. After running the same nanostructured-electrodes prompt through multiple systems and scoring them on recency (past two years), reference volume, clarity, multimedia support, and exportability, Gemini and Manis AI emerged as the top performers—while Storm landed at the bottom.
ChatGPT produced strong, readable research with clear explanations and visible multimedia elements like figures and tables pulled from real papers. It also generated a large set of sources (reported as 45), and the output included a dedicated section on recent breakthroughs. The weak spot was academic workflow compatibility: export options weren’t “academic friendly,” and the displayed citations didn’t fully match the claimed reference count (the interface showed far fewer than the initial total). That combination left ChatGPT in the middle of the pack at about 3.5 points.
SciSpace (using its deep review mode) leaned heavily into citation retrieval and structure. It returned a “top 20 papers” style synthesis with a compact but useful table and a clear breakdown into sections like current state, introduction, and key materials. It also performed well on recency, showing papers across 2023–2025. The tradeoff was depth and writing density: it offered fewer explanatory paragraphs than the strongest competitors, and the exportable value was mostly in references rather than a fully usable narrative report. Its score landed around 3.
Perplexity delivered a large number of sources (49 shown), with clickable references and a PDF export option. But the citation experience was inconsistent: the PDF export appeared to include only nine references, and the output lacked the same emphasis on “recent breakthroughs” and clear structuring seen elsewhere. With limited multimedia and weaker clarity, it finished around 3 (rounded up).
Gemini stood out for producing a highly referenced, exportable report. It generated an extremely citation-dense document—described as 28 pages of references and hundreds of cited items—where citations appeared to support nearly every sentence. While multimedia wasn’t present, the combination of “lots of references,” strong recency coverage, and export to Docs earned Gemini the highest practical score (rounded to 4). Manis AI also scored a 4 by separating the work into multiple downloadable files (current state, recent breakthroughs, scalability challenges, key materials), which is useful for researchers who want to reorganize content. Its main drawback was citation usability: references were present but not reliably linked inline, and some sections had broken links. Still, it beat most competitors overall.
Storm, built by Stanford, was the only tool described as free but it underperformed on academic usability. It provided many references and sentence/paragraph-level citation markers, yet the references weren’t easy to verify (no clear external reference view, requiring hover/click per item), recency was unclear, and exportability was lacking. It scored about 1.5 and finished last.
Overall ranking: Gemini and Manis AI lead for different reasons—Gemini for highly referenced, exportable reports; Manis for segmented, multi-file deep research. SciSpace is best when the primary goal is exporting references to a reference manager, while ChatGPT and Perplexity sit in the middle due to citation/export mismatches and weaker workflow fit.
Cornell Notes
Running the same academic prompt about nanostructured electrodes in organic solar cells across several deep-research tools highlighted a consistent pattern: citation quality and export/workflow fit matter as much as raw output length. ChatGPT delivered readable explanations and multimedia (figures/tables) but had export limitations and citation-count inconsistencies. SciSpace excelled at returning many references and a structured “top papers” view, with export options focused on reference lists. Perplexity produced many sources and clickable citations, but the PDF export appeared to include far fewer references than the on-screen count. Gemini and Manis AI scored highest: Gemini for extremely citation-dense, exportable Docs reports; Manis for segmented downloadable files (current state, breakthroughs, scalability, materials) despite weaker inline citation linking.
What criteria were used to judge whether a deep-research tool is actually usable for academia?
Why did ChatGPT score well on content quality but not top the leaderboard?
What made SciSpace attractive for researchers even though it didn’t win overall?
What citation/export mismatch hurt Perplexity’s score?
How did Gemini and Manis AI differ in what they do best?
Why did Storm finish last despite being free?
Review Questions
- Which scoring dimensions most directly affect whether a deep-research tool fits an academic workflow (not just whether it produces text)?
- Compare how citation counts behaved across tools when exporting (e.g., Perplexity’s on-screen vs PDF references). What does that imply for trusting outputs?
- Why might a tool with strong segmentation into files (like Manis AI) still underperform if inline citations and link reliability are weak?
Key Points
- 1
Score deep-research tools using workflow criteria: recency, citation volume, clarity, multimedia usefulness, and exportability—not just how much text is generated.
- 2
ChatGPT’s strengths were readable synthesis and multimedia, but export limitations and citation-count inconsistencies reduced its academic usability.
- 3
SciSpace is particularly strong when the goal is collecting and exporting references to a reference manager, even if the narrative depth is lighter.
- 4
Perplexity’s on-screen citation count may not match what appears in exported documents, so exported reference lists should be checked.
- 5
Gemini’s advantage is extremely citation-dense, exportable reporting (Docs), making it strong for writing and verification.
- 6
Manis AI’s advantage is segmented deliverables (separate files for breakthroughs, scalability, materials), but inline citation linking and link integrity can be unreliable.
- 7
Storm’s free access didn’t compensate for weak reference presentation, unclear recency, and poor exportability for academic work.