This AI Tool for Research Is So Bad, I’m Afraid of Getting Sued

TL;DR

The tool’s early draft generation (title/abstract/outline) can look credible, but later steps break the academic writing workflow.

Briefing Cornell Notes

Briefing

An AI writing tool marketed for academic research repeatedly fails at the basics—producing a literature review that doesn’t match the genre, mishandles citations, and leaves users stuck in slow or broken workflows. After testing it with a PhD-level literature review prompt about organic photovoltaic (OPV) devices, the output initially looks promising: it can generate a title, draft an abstract, and build an outline with a citation style selection (including IEEE). But the system quickly veers off track, forcing a rigid “introduction, methods, results, discussion, conclusion” structure that doesn’t fit how literature reviews are typically organized—where themes, research threads, and chronological or conceptual groupings matter more than lab-style sections.

The reference and citation features are where the tool most clearly falls apart. Source suggestions pull from databases such as Google Scholar, Semantic Scholar, and Archive, and the interface offers “source management” and reference searching. Yet the number of sources returned is extremely low (the tester repeatedly sees only a handful, such as four), and the generated bibliography often doesn’t appear correctly in the final document. Even when the tool claims it is processing references and generating an IEEE-formatted bibliography, the tester reports missing or non-cohesive citations—no clear way to locate references within the text, and regeneration attempts don’t fix the problem.

Beyond structure and citations, the experience is marked by long waits and an unpolished interface. Step-by-step generation includes frequent loading delays (sometimes described as taking up to two minutes per stage, but in practice closer to half an hour of clicking and waiting), and the UI appears to “jump” between states in a way that feels unstable. In the editor, the text reads like generic large-language-model output—described as mashed together, with repetitive section endings and sentences that don’t make sense in context (including “conclusion” phrasing repeated where it shouldn’t be). Attempts to regenerate, “humanize,” or “detect AI content” either do nothing or stall.

The tool also includes a suite of “productivity” functions—rewriter, paraphraser, advanced summarizer, grader, AI detector, and humanizer—but multiple tests report that these features don’t work reliably, often getting stuck during grading or failing to return results. Even when the system includes a warning that AI detection should not be used as the sole basis for decisions that could affect someone’s academic standing, the promised detection results never reliably appear.

Overall, the tester’s core conclusion is blunt: despite a smooth onboarding and early outputs that look usable, the tool is not suitable for academic writing because it cannot consistently generate coherent literature-review structure, produce dependable citations, or deliver functioning editing and verification tools. The result is a product that may waste significant time and still leave users without a submission-ready draft.

Cornell Notes

The tested academic AI tool can generate an initial draft (title, abstract, and an outline) and offers citation-style selection such as IEEE. However, it repeatedly fails where literature-review writing actually depends on genre-appropriate structure and reliable referencing. Instead of organizing by themes and research threads, it forces a lab-report style layout (introduction/methods/results/discussion/conclusion) and produces repetitive, sometimes nonsensical section wrap-ups. Source management returns very few references and the final document often lacks a usable bibliography or has citations that don’t connect to the text. Additional “productivity” features like grading and AI detection frequently stall or return no results, making the workflow unreliable for academic use.

What early features make the tool seem useful, and why do they not hold up?

At login, the tool offers “student paper” and “source management.” In the student-paper flow, it can generate a literature-review title and abstract from a short prompt, and it asks for academic level and citation style (the tester selected IEEE). It also produces an outline and suggests additional references. The problem is that later steps break the academic-writing requirements: the structure becomes mismatched to a literature review, citations/bibliography don’t reliably appear or connect, and regeneration doesn’t fix the underlying coherence and reference issues.

How does the tool’s generated structure differ from what a literature review typically needs?

The tester expects a literature review to be organized around themes, research threads, and possibly chronological development within the field. Instead, the editor pushes a rigid “introduction, methods, results, discussion, conclusion” framework and repeats “in conclusion” style endings for sections. The tester argues this resembles a peer-reviewed paper or project report format rather than a literature review, and it produces method-like language that doesn’t match what a literature review’s “methods” would mean (i.e., how studies were selected and analyzed).

What goes wrong with citations and bibliography generation?

Source suggestions exist (including Google Scholar, Semantic Scholar, and Archive), and the tool claims to process references and generate an IEEE bibliography. But the tester reports extremely low reference counts (e.g., only four sources in some searches) and a final document where citations are missing or not findable. Even when the tool is set to IEEE, the bibliography either doesn’t appear correctly or doesn’t integrate with the text, leaving the draft unusable for academic purposes.

Why does the workflow feel unreliable even when generation completes?

The process includes repeated waiting and loading, with the UI described as “going in and out” of states and sometimes taking far longer than expected. Regeneration attempts are slow and still fail to correct coherence and referencing. In the editor, “detect AI content” and “humanize” actions are reported as doing nothing or stalling, and the overall interface is described as unpolished.

What happens to the tool’s extra productivity features (grading, AI detection, humanizing)?

The tool includes rewriter/paraphraser, advanced summarizer, grader, AI detector, and humanizer. The tester reports that these features often don’t work: grading can get stuck at “grading your paper,” AI detection can stall, and humanization may not return results. The tester also notes that AI detection warnings appear, but actual detection results never reliably show up.

What is the practical takeaway for someone trying to use it for academic writing?

Even with promising initial outputs, the tool is not dependable for submission-quality academic work. The tester’s experience highlights three failure points: genre mismatch (literature review vs. lab-report structure), citation/bibliography unreliability (few sources and missing or disconnected references), and broken or stalled auxiliary tools (grading/detection/humanizing). The combined effect is wasted time and drafts that can’t be trusted for academic standards.

Review Questions

If you were writing a literature review, what specific structural choices would you expect the tool to make—and how did the tested tool fail those expectations?
What evidence from the test suggests the tool’s citation workflow is unreliable (consider both source counts and bibliography behavior)?
Which auxiliary features (grader, AI detector, humanizer) failed in the tester’s experience, and what does that imply about the tool’s overall reliability?

Key Points

1
The tool’s early draft generation (title/abstract/outline) can look credible, but later steps break the academic writing workflow.
2
Literature reviews are theme- and thread-based; the tool instead forces a rigid introduction/methods/results/discussion/conclusion structure.
3
Source management returns very few references in the tester’s runs, despite the expectation of hundreds of papers in active fields.
4
IEEE citation selection does not guarantee a usable bibliography; citations and references often fail to appear or connect to the text.
5
The editor output shows generic, sometimes nonsensical phrasing and repetitive section endings that reduce coherence.
6
Extra productivity features (grading, AI detection, humanizing) frequently stall or return no results, undermining trust in the system.
7
Even with warnings about AI detection not being decisive, the detection results themselves did not reliably appear.

Highlights

The tool repeatedly produces a lab-report style structure even when set to write a literature review, including repeated “conclusion” wrap-ups that don’t fit the genre.

Citation handling is unreliable: the bibliography often fails to integrate, and the tool returns only a handful of sources in searches where far more are expected.

“AI detector,” “humanize,” and “grader” functions frequently stall or return nothing, turning the workflow into repeated waiting without dependable outputs.

Topics

Mentioned

Andy Stapleton
OPV
IEEE
GPT-3.5
AI