Which one is Best for literature review? ChatGPT or Bard or Bing

TL;DR

Verify every AI-generated citation in Google Scholar before using it in a literature review.

Briefing Cornell Notes

Briefing

Choosing an AI tool for a literature review comes down to one practical test: whether the citations it generates actually exist in academic databases. In a side-by-side experiment on the topic “the relationship between green HRM and employee innovative behavior,” Microsoft Bing produced references that matched what could be found in Google Scholar, while ChatGPT and Google Bard produced multiple citations that failed verification.

The process began with ChatGPT. After generating a literature review and a bibliography, the references were checked one by one in Google Scholar. Several citations could not be located at all—examples included entries that “do not even exist anywhere” in Scholar—suggesting the tool sometimes fabricates or otherwise mis-specifies bibliographic details. Only some references were found, meaning the bibliography was incomplete or unreliable enough that a researcher would have to treat it with caution.

Google Bard was tested next using the same prompt and the same requirement to provide a bibliography. Bard returned fewer references than expected, and when those citations were checked in Google Scholar, at least one of the two provided references could be found while the other could not. The mismatch between the bibliography content and what Scholar returned reinforced the same concern: Bard’s citation list did not consistently correspond to verifiable sources.

Microsoft Bing served as the final comparison. Bing generated a literature review with a bibliography and, crucially, the two provided references were both successfully located in Google Scholar. The verification step worked cleanly for Bing’s citations, making it the most dependable option among the three in this specific test.

The experiment also added a timing angle. Bing’s citations were described as “latest,” whereas ChatGPT was characterized as more likely to rely on information available before 2022. That distinction matters because literature reviews depend on both accuracy and recency; a tool that returns verifiable, up-to-date references reduces the manual burden of hunting for sources.

The bottom line: for AI-assisted literature reviews, Microsoft Bing outperformed ChatGPT and Google Bard in citation reliability during this check. Even so, the recommendation is not to accept AI-generated bibliographies blindly. The safest workflow is to double-check every citation in Google Scholar (or another database) before using it in a final literature review, regardless of which tool is used.

Cornell Notes

An accuracy check of AI-generated citations found that Microsoft Bing produced a bibliography that matched entries in Google Scholar for a literature review on “green HRM” and “employee innovative behavior.” ChatGPT generated multiple references that could not be found in Google Scholar, indicating unreliable citation details. Google Bard also showed partial failure, with fewer references returned and at least one citation not appearing in Scholar. The practical takeaway is that citation verification is essential, but Bing performed best in this specific test. Recency may also favor Bing, while ChatGPT was described as more likely to draw from older data.

What criterion determined which AI tool was “best” for a literature review?

The deciding factor was whether the bibliography and in-text references produced by each tool could be verified in Google Scholar. Each generated citation was copied and searched in Google Scholar one by one; if a reference could not be found, it was treated as unreliable for literature review use.

Why did ChatGPT perform poorly in the citation verification step?

After ChatGPT generated a literature review and a bibliography, several of its references failed the Google Scholar check. At least two examples were described as not available anywhere in Scholar, implying that the tool produced citations that were either fabricated or incorrectly formatted.

How did Google Bard’s citation output compare to ChatGPT’s?

Bard returned only two references for the same topic and prompt. When those two were checked in Google Scholar, one reference was found while the other was not, showing partial reliability but still an unacceptable level of citation failure for a literature review workflow.

What made Microsoft Bing the most reliable option in this test?

Bing produced a literature review with two references, and both were successfully located in Google Scholar. The verification step worked for both citations, making Bing’s bibliography the most consistent with verifiable academic records among the three tools tested.

What role did “latest references” play in the conclusion?

The experiment noted that Bing’s references appeared to be more recent, while ChatGPT was characterized as more likely to provide information available before 2022. Since literature reviews require both accuracy and up-to-date scholarship, recency was treated as an additional advantage for Bing.

Review Questions

When evaluating AI tools for literature reviews, what specific verification step should be performed before accepting citations?
Which tool produced citations that failed to appear in Google Scholar, and what does that imply about using AI-generated bibliographies?
How did the number of references returned by Google Bard affect the reliability assessment in this experiment?

Key Points

1
Verify every AI-generated citation in Google Scholar before using it in a literature review.
2
ChatGPT produced multiple references that could not be found in Google Scholar, indicating unreliable citation accuracy.
3
Google Bard returned fewer references and still produced at least one citation that did not appear in Google Scholar.
4
Microsoft Bing produced citations that matched entries in Google Scholar for the tested topic, making it the most reliable in this comparison.
5
Recency may differ by tool; Bing was described as providing more recent references than ChatGPT.
6
Even when a tool performs well on citation checks, researchers should still double-check each source to avoid errors.

Highlights

ChatGPT’s bibliography included citations that were not discoverable in Google Scholar, undermining trust in its reference list.

Google Bard’s output was only partially verifiable: one of its two citations appeared in Google Scholar while the other did not.

Microsoft Bing’s two provided references were both found in Google Scholar, making it the strongest performer in this citation-verification test.

The experiment linked Bing’s reliability to both verifiable citations and a tendency toward newer references. 

Topics

Literature Review
Citation Verification
Green HRM
Employee Innovative Behavior
AI Tools Comparison