Battle of the AIs: Can Bing and Bard Beat ChatGPT at Research?
Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Bing is strongest for research discovery tasks—finding papers, generating seed references, and supporting early literature-review direction.
Briefing
The most practical takeaway from the comparison is that no single AI tool dominates research writing end-to-end: ChatGPT (with GPT-4) tends to win when the task is turning a paper’s content into polished language, while Microsoft Bing is strongest when the job is finding starting points—especially references—and navigating the early stages of a literature search.
All three systems were tested on common research workflows: locating papers, summarizing PDFs, and transforming academic material into outputs such as bullet-point summaries, press releases, and blog posts. For paper-finding, the baseline reality is that ChatGPT lacked internet access during the tests, so it couldn’t reliably surface the newest literature. That limitation showed up immediately when asked for the latest papers on organic photovoltaic materials. Bing, with internet access, produced usable links and references, and the results improved compared with earlier attempts using tools without strong retrieval. Still, even Bing’s paper suggestions sometimes required verification and follow-up expansion using external tools like Connected Papers, Litmaps, or ResearchRabbit.
When the workflow shifted from retrieval to reading and summarizing, ChatGPT’s performance stood out. The transcript describes a “text splitter” approach to feed paper text into ChatGPT, after which it produced structured five-bullet summaries that included more of the paper’s concrete details (and felt more trustworthy to the tester) than Bard’s higher-level, figure-light summaries. Bing also summarized, but it didn’t consistently follow the requested format (for example, not delivering exactly five bullets) and tended to be less detailed.
The same pattern repeated for rewriting tasks. Asked to convert a paper into a press release, ChatGPT produced a more faithful, publication-ready structure and captured the paper’s essence more accurately. Bard generated a quick draft but introduced factual errors—such as attributing findings to the wrong journal—while Bing’s output was less aligned with strict press-release conventions, though it still offered subheadings and a usable narrative.
For blog-style writing, ChatGPT again matched the brief more closely, producing language appropriate for a science publication audience. Bard’s output skewed toward a more formal, exploratory tone that wouldn’t easily pass editorial standards, and Bing’s responses leaned toward generic “how to write a science blog” guidance rather than generating an actual publishable draft.
Finally, when starting from scratch—like drafting an introduction for a literature review on transparent electrode materials—Bing looked best for providing seed references and field orientation, even though the transcript cautions that reference accuracy still needs checking. Bard also performed reasonably on mapping the topic (e.g., noting indium tin oxide and alternatives like carbon-based materials), but Bing’s reference scaffolding made it the better launchpad.
The conclusion is a division of labor: use Bing for the hardest parts of research discovery (references, initial exploration, and PDF interaction), and use ChatGPT for language-heavy transformation once the source material is in hand. Bard is described as comparatively weak for research-specific nuance and referencing reliability in these tests.
Cornell Notes
The comparison finds a split between “research discovery” and “research writing.” Bing (with internet access) is strongest at finding papers, generating seed references, and helping with early literature-review scaffolding, though outputs still require verification. ChatGPT (GPT-4) performs best when the input is already available—summarizing papers into precise bullet points and rewriting content into press releases and blog drafts with better fidelity to the source. Bard tends to produce higher-level or format-mismatched drafts and shows more risk of factual mistakes in rewriting tasks. The practical workflow is to use Bing to gather and orient, then use ChatGPT to turn that material into publishable language.
Why did ChatGPT struggle with “latest papers,” and how did that affect the results?
Which tool handled paper-to-summary tasks best, and what evidence supports that?
How did the tools perform when rewriting a paper into a press release?
What happened when the task shifted to blog writing for a publication audience?
When no paper was provided and the goal was to start a literature review, which system was most helpful and why?
Review Questions
- In this comparison, what specific tasks separate “research discovery” from “research writing,” and which tool is favored for each?
- What kinds of errors were observed when converting papers into press releases, and how did those errors differ across ChatGPT, Bard, and Bing?
- Why does the transcript repeatedly warn that reference accuracy must be verified, even when a tool provides citations?
Key Points
- 1
Bing is strongest for research discovery tasks—finding papers, generating seed references, and supporting early literature-review direction.
- 2
ChatGPT (GPT-4) is strongest for language-heavy transformation of known content, including five-bullet summaries, press releases, and publishable blog drafts.
- 3
ChatGPT’s lack of internet access during testing limited its ability to retrieve the newest papers, making it unreliable for “latest literature” queries.
- 4
Bard’s outputs often skew high-level or miss requested formatting, and it showed a higher risk of factual mistakes when rewriting for press-release style.
- 5
Bing can interact with PDFs more directly in the workflow described, reducing friction compared with text-pasting approaches.
- 6
Even when citations are provided, reference accuracy is not guaranteed; verification remains essential.
- 7
A practical workflow emerges: use Bing to gather and orient, then use ChatGPT to produce polished research communication.