ChatGPT vs Jenni: Best AI for Academic Writing?

TL;DR

A citation audit found that ChatGPT produced a literature-review bibliography with a high failure rate: 9 of 12 references were fake.

Briefing Cornell Notes

Briefing

General-purpose AI can generate literature reviews that look academically polished while quietly fabricating or corrupting the underlying citations—an issue that can undermine academic integrity. In a hands-on check of 12 references produced by ChatGPT for a literature review prompt (“role of AI in hotel industry”), only one citation was fully correct, two were only partially correct, and nine were fake. Even when ChatGPT supplied DOIs, every DOI in the list was incorrect, and verification via Google Scholar and DOI lookups revealed mismatches in publication details such as year, author information, and even article titles.

The verification process highlighted how citation errors can slip past surface-level formatting. After requesting APA-style output, ChatGPT produced in-text citations and a bibliography with APA 7 formatting and DOI numbers. But when each DOI and reference entry was checked one by one, multiple failure modes appeared: some DOI links did not resolve; others resolved to real articles with details that did not match the cited work; and some entries mixed book-style information with journal-style metadata, creating internal inconsistencies (for example, a year that pointed to a journal context where the title and journal name did not align). The result was a bibliography that could appear credible at a glance yet fail authenticity checks.

By contrast, the academic-writing tool Jenny (accessed via jenny.ai) was tested using the same literature review prompt. Jenny offered structured outline options (standard headings, smart headings, or no headings) and then generated draft text with in-text citations. It also provided a references list where each citation could be validated through DOI links. In the checks performed, the referenced articles opened to the original publications with matching details, and the DOIs were described as correct.

The practical takeaway is not that AI writing is inherently unusable, but that citation verification is non-negotiable for academic work. ChatGPT’s strength—producing fluent, academic-looking prose—can come with a high risk of fabricated or inaccurate references, especially when the bibliography and DOI data are treated as authoritative without independent verification. Specialized tools like Jenny, designed for research writing, aim to reduce that risk by tying generated citations to verifiable sources.

Overall, the experiment frames a clear decision rule for students and researchers: if accuracy, credibility, and academic integrity matter, rely on tools that support verified references and still confirm citations when stakes are high. General-purpose models may require extra diligence, because formatting and DOI labels alone do not guarantee that the underlying scholarship is real.

Cornell Notes

A citation audit of a ChatGPT-generated literature review found that most references were unreliable: 9 of 12 were fake, 2 were partially correct, and only 1 matched fully. Even the DOIs provided were wrong in every case, with verification via Google Scholar and DOI lookups revealing mismatched years, authors, titles, and sometimes non-matching journal/book metadata. When the same prompt was used in Jenny (jenny.ai), the tool generated a literature review with in-text citations and a bibliography whose DOI links led to the original articles with matching details. The core implication is that academic writing demands citation authenticity checks, and specialized research-writing tools can materially reduce the risk of fabricated references.

Why does a bibliography that “looks APA” still fail academic standards?

APA-style formatting and in-text citation placement can be produced even when the underlying sources are fabricated or mismatched. In the ChatGPT test, the bibliography was formatted in APA 7 and included DOIs, but DOI verification showed that the DOIs were incorrect and that many entries resolved to the wrong work (different titles, authors, or publication years) or did not align with the cited journal information.

What were the concrete outcomes of checking ChatGPT’s 12 references?

After verifying each citation individually, only one reference was completely correct. Two references were partially correct, while nine were fake. The audit also found that every DOI provided in the list was incorrect, reinforcing that DOI labels alone were not trustworthy without independent lookup.

What kinds of citation errors appeared during verification?

Multiple failure modes showed up: some DOI links failed to open; others led to real journal pages but with details that didn’t match the reference text (e.g., the title differed from what was cited, and author details didn’t match). There were also structural inconsistencies, such as book-like entries receiving metadata that didn’t correspond to the journal context implied by the citation details.

How did Jenny’s workflow support safer academic writing?

Jenny generated outlines and draft text with in-text citations, then provided a references list where DOI links could be clicked to confirm authenticity. The checks described in the transcript reported that the referenced articles opened to the original publications with matching details, and the DOIs were presented as correct.

What decision rule emerges for students and researchers using AI for academic writing?

Treat AI-generated citations as unverified until checked. If the work depends on citation accuracy and academic integrity, prefer specialized research-writing tools that emphasize verified references (like Jenny) and still validate DOIs and bibliographic details when possible—especially when the bibliography is generated automatically.

Review Questions

If a tool outputs APA 7 citations and DOIs, what verification step is still necessary before submitting academic work?
What evidence from the citation audit distinguishes “partially correct” from “fake” references?
How do the citation-validation capabilities described for Jenny differ from the failure patterns observed with ChatGPT?

Key Points

1
A citation audit found that ChatGPT produced a literature-review bibliography with a high failure rate: 9 of 12 references were fake.
2
ChatGPT’s provided DOIs were incorrect in every case, and DOI lookups revealed mismatches in bibliographic details.
3
Even when a DOI resolves to a real article, the cited metadata (title, authors, year) may still not match the reference entry.
4
Jenny (jenny.ai) generated literature-review citations with DOI links that, in the described checks, led to original articles with matching details.
5
Academic writing requires independent citation verification; formatting and DOI labels alone are not proof of authenticity.
6
Specialized research-writing tools can reduce citation fabrication risk, but accuracy still depends on validation for high-stakes work.

Highlights

Only 1 of 12 ChatGPT-generated references was fully correct; 9 were fake, and every DOI provided was wrong.

DOI verification exposed multiple mismatch types—wrong titles, wrong authors, wrong years, and even incorrect journal/book metadata alignment.

Jenny’s generated citations were validated by clicking DOIs, which opened the original articles with matching details.

The key risk isn’t just bad writing—it’s fabricated or corrupted source attribution that can pass formatting checks while failing authenticity checks.

Topics

Academic Citations
DOI Verification
AI Writing Tools
Fabricated References
Literature Review

Mentioned

Jenny
Dr. Gamran