Get AI summaries of any video or article — Sign up free
I Compared Every Popular AI Literature Review Tool So You Don't Have To thumbnail

I Compared Every Popular AI Literature Review Tool So You Don't Have To

Andy Stapleton·
5 min read

Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Gemini and Scispace produced the most references (36 and 28), while Answer this produced only six, making it the weakest for citation-rich reviews.

Briefing

AI literature-review tools can generate usable drafts, but performance varies sharply across the basics: how many relevant citations they pull, how much coherent text they produce, whether the writing sounds like graduate-level academic prose, and—crucially—whether the output can be exported into formats researchers can actually edit.

In side-by-side tests using the same prompt about “self-healing nano composite transparent electrodes,” Gemini and Scispace led on reference volume. Gemini produced the most references (36), while Scispace followed closely (28). At the other end, Answer this delivered only six references, even after being pushed to maximize citations—an outcome that made it the weakest option for building a citation-rich literature review. The reference count mattered because a literature review is meant to synthesize a field, not just generate a few paragraphs of generic commentary.

Length and density separated the tools further. Thesis AI produced the longest output by far—described as “girthy”—with an estimated scale consistent with a thesis-level literature review (around 23,000 words for the generated section). Scispace and Gemini also produced substantial drafts, while Answer this was far shorter and didn’t “try its hardest” to fill out the requested material. The practical takeaway: short outputs may be fine for quick orientation, but they often fail to provide enough structure and thematic coverage to serve as a foundation for a real academic write-up.

Readability became the deciding factor for which tool felt most like something a researcher could plausibly adapt. Thesis AI scored best for sounding academically appropriate, even if some sentences ran long. Other tools were criticized for “thesaurus-y” word choices and for using terminology that didn’t match how the specific field typically writes—Answer this was singled out as the least readable, in part because its phrasing leaned into uncommon or unnatural wording.

Exportability—how easily the draft can be moved into a workflow—was another major differentiator. Thesis AI stood out as the most usable: it offered multiple export options including PDF plus formats that fit common academic editing pipelines such as Overleaf, Word, and notebook-style exports (with options like DOCX, LaTeX/“Latte,” and Markdown). Scispace was praised for doing well across other categories, but export options were limited in a way that required paying to download certain formats, and it lacked the “save into Word/Overleaf” convenience the workflow demands. ChatGPT was considered difficult to extract cleanly for editing, while Manis and Gemini were more workable but still not as seamless as Thesis AI.

Finally, AI-detection results were uniformly poor: every tool’s output was flagged as AI-generated with 100% confidence in the test used. The conclusion wasn’t that these tools are unusable—rather that they should be treated as starting points for human rewriting, not submissions as-is. Across the full set of criteria, the strongest overall choice for a researcher building an editable, readable, thesis-ready literature review was Thesis AI, while Scispace and Gemini were positioned as top alternatives when maximizing reference volume and getting a substantial draft quickly.

Cornell Notes

The tests compared six AI literature-review tools on citation coverage, draft length, readability, export/editing options, and AI-detection risk. Gemini and Scispace led on the number of references pulled (36 and 28), while Answer this lagged badly with only six references. Thesis AI produced the longest, densest draft and scored best on academic readability, using language that felt closer to what a graduate-level writer would actually use. Thesis AI also won on exportability, offering multiple editable formats and Overleaf integration, making it easier to turn the output into a working document. All tools were flagged as AI-generated by the detection check used, so outputs still require substantial human rewriting.

Which tools delivered the most citations, and why does that matter for a literature review?

Gemini produced 36 references and Scispace produced 28, making them the strongest choices when the goal is a citation-rich synthesis. Answer this produced only six references even after being prompted to maximize citations, which undermines the core purpose of a literature review: mapping and synthesizing the field rather than generating a short, thin overview.

How did draft length differ, and what does that imply for real academic use?

Thesis AI generated the longest output by far (described as thesis-level length, around 23,000 words for the literature review/introduction). Scispace and Gemini also produced substantial drafts, while Answer this was much shorter. Longer outputs tend to provide more themes, structure, and material for researchers to reorganize and rewrite into their own literature review.

Which tool sounded most like graduate-level academic writing, and what was the criticism of others?

Thesis AI scored best on readability, with academic phrasing that was understandable even when sentences ran long. Other tools were criticized for “thesaurus-y” wording—unnecessarily long or unnatural terms that don’t match how the specific research area typically writes. Answer this was singled out as the least readable, partly due to unusual terminology (e.g., “flexural endurance”).

Why was exportability treated as a deciding criterion?

Researchers need to edit and integrate drafts into their own workflow (Word, LaTeX/Overleaf, Markdown, or notebook formats). Thesis AI offered multiple editable exports and Overleaf integration, making it the most practical end-to-end tool. ChatGPT was criticized for being hard to extract for editing, and Scispace was criticized for download limitations that required payment for certain formats.

What did the AI-detection check reveal, and how should that affect usage?

Every tool’s output was flagged as AI-generated with 100% confidence in the detection test used. That doesn’t mean the tools can’t help, but it does mean they’re risky for direct submission and should be treated as a starting point for heavy human rewriting.

If someone prioritizes different goals—citations, readability, or workflow—what trade-offs emerged?

For maximum references and a broad snapshot, Gemini (36) and Scispace (28) were favored. For the most readable, thesis-style draft that’s also easy to export and edit, Thesis AI was the top pick. ChatGPT was less favored for academic writing mainly due to export friction, while Answer this was the weakest overall because it failed on citation count and readability.

Review Questions

  1. If a researcher’s top priority is citation coverage, which tools should they start with first, and what citation counts support that choice?
  2. What combination of factors made Thesis AI the strongest overall option, beyond just producing a long draft?
  3. How should the uniformly positive AI-detection results change how a researcher uses these outputs in an academic workflow?

Key Points

  1. 1

    Gemini and Scispace produced the most references (36 and 28), while Answer this produced only six, making it the weakest for citation-rich reviews.

  2. 2

    Thesis AI generated the longest, densest draft (around 23,000 words), aligning with thesis-level literature review expectations.

  3. 3

    Thesis AI scored best for readability, using academic language that felt closer to typical graduate writing; other tools were criticized for unnatural or overly “thesaurus-y” phrasing.

  4. 4

    Thesis AI led on exportability, offering multiple editable formats and Overleaf integration, which supports real editing workflows.

  5. 5

    ChatGPT was less practical for academic use because extracting the content for editing was difficult.

  6. 6

    AI-detection testing flagged all tools’ outputs as AI-generated with 100% confidence, so outputs require substantial human rewriting before any submission risk is considered.

  7. 7

    Overall recommendations split by need: Thesis AI for editable, readable thesis-style drafts; Scispace/Gemini for reference volume and substantial synthesis.

Highlights

Gemini delivered the most references (36) and also provided table-based summaries, making it strong for quick field snapshots.
Thesis AI combined the longest output with the best readability and the most workflow-friendly exports, including Overleaf integration.
Answer this underperformed sharply on citations (six references) and readability, even when prompted to maximize citations.
Every tool’s output was flagged as AI-generated with 100% confidence in the detection check used, reinforcing the “rewrite heavily” requirement.

Topics

  • AI Literature Review Tools
  • Citation Coverage
  • Academic Readability
  • Export Workflows
  • AI Detection

Mentioned