This AI Tool for Research Is So Bad, I’m Afraid of Getting Sued
Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The tool’s early draft generation (title/abstract/outline) can look credible, but later steps break the academic writing workflow.
Briefing
An AI writing tool marketed for academic research repeatedly fails at the basics—producing a literature review that doesn’t match the genre, mishandles citations, and leaves users stuck in slow or broken workflows. After testing it with a PhD-level literature review prompt about organic photovoltaic (OPV) devices, the output initially looks promising: it can generate a title, draft an abstract, and build an outline with a citation style selection (including IEEE). But the system quickly veers off track, forcing a rigid “introduction, methods, results, discussion, conclusion” structure that doesn’t fit how literature reviews are typically organized—where themes, research threads, and chronological or conceptual groupings matter more than lab-style sections.
The reference and citation features are where the tool most clearly falls apart. Source suggestions pull from databases such as Google Scholar, Semantic Scholar, and Archive, and the interface offers “source management” and reference searching. Yet the number of sources returned is extremely low (the tester repeatedly sees only a handful, such as four), and the generated bibliography often doesn’t appear correctly in the final document. Even when the tool claims it is processing references and generating an IEEE-formatted bibliography, the tester reports missing or non-cohesive citations—no clear way to locate references within the text, and regeneration attempts don’t fix the problem.
Beyond structure and citations, the experience is marked by long waits and an unpolished interface. Step-by-step generation includes frequent loading delays (sometimes described as taking up to two minutes per stage, but in practice closer to half an hour of clicking and waiting), and the UI appears to “jump” between states in a way that feels unstable. In the editor, the text reads like generic large-language-model output—described as mashed together, with repetitive section endings and sentences that don’t make sense in context (including “conclusion” phrasing repeated where it shouldn’t be). Attempts to regenerate, “humanize,” or “detect AI content” either do nothing or stall.
The tool also includes a suite of “productivity” functions—rewriter, paraphraser, advanced summarizer, grader, AI detector, and humanizer—but multiple tests report that these features don’t work reliably, often getting stuck during grading or failing to return results. Even when the system includes a warning that AI detection should not be used as the sole basis for decisions that could affect someone’s academic standing, the promised detection results never reliably appear.
Overall, the tester’s core conclusion is blunt: despite a smooth onboarding and early outputs that look usable, the tool is not suitable for academic writing because it cannot consistently generate coherent literature-review structure, produce dependable citations, or deliver functioning editing and verification tools. The result is a product that may waste significant time and still leave users without a submission-ready draft.
Cornell Notes
The tested academic AI tool can generate an initial draft (title, abstract, and an outline) and offers citation-style selection such as IEEE. However, it repeatedly fails where literature-review writing actually depends on genre-appropriate structure and reliable referencing. Instead of organizing by themes and research threads, it forces a lab-report style layout (introduction/methods/results/discussion/conclusion) and produces repetitive, sometimes nonsensical section wrap-ups. Source management returns very few references and the final document often lacks a usable bibliography or has citations that don’t connect to the text. Additional “productivity” features like grading and AI detection frequently stall or return no results, making the workflow unreliable for academic use.
What early features make the tool seem useful, and why do they not hold up?
How does the tool’s generated structure differ from what a literature review typically needs?
What goes wrong with citations and bibliography generation?
Why does the workflow feel unreliable even when generation completes?
What happens to the tool’s extra productivity features (grading, AI detection, humanizing)?
What is the practical takeaway for someone trying to use it for academic writing?
Review Questions
- If you were writing a literature review, what specific structural choices would you expect the tool to make—and how did the tested tool fail those expectations?
- What evidence from the test suggests the tool’s citation workflow is unreliable (consider both source counts and bibliography behavior)?
- Which auxiliary features (grader, AI detector, humanizer) failed in the tester’s experience, and what does that imply about the tool’s overall reliability?
Key Points
- 1
The tool’s early draft generation (title/abstract/outline) can look credible, but later steps break the academic writing workflow.
- 2
Literature reviews are theme- and thread-based; the tool instead forces a rigid introduction/methods/results/discussion/conclusion structure.
- 3
Source management returns very few references in the tester’s runs, despite the expectation of hundreds of papers in active fields.
- 4
IEEE citation selection does not guarantee a usable bibliography; citations and references often fail to appear or connect to the text.
- 5
The editor output shows generic, sometimes nonsensical phrasing and repetitive section endings that reduce coherence.
- 6
Extra productivity features (grading, AI detection, humanizing) frequently stall or return no results, undermining trust in the system.
- 7
Even with warnings about AI detection not being decisive, the detection results themselves did not reliably appear.