Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]

TL;DR

For simple search-like questions, neither Bard nor Bing reliably meets constraints like distance or time; standard Google search is presented as the safer choice.

Briefing Cornell Notes

Briefing

Bard and Bing both struggle when the task is straightforward web search or precise factual recall, but Bing—powered by GPT-4—consistently shows an edge in reasoning quality, writing polish, and “getting” the intent behind prompts. Across dozens of side-by-side tests, the most repeatable pattern is that Bing answers more accurately and more intelligently, while Bard often produces plausible-sounding responses that drift off target.

The clearest early warning sign comes from search-like questions. When asked a simple “within 10 minutes walk” florist query near the British Museum, Bard’s results landed around a half-hour walk away, while Bing’s answer pointed to Hampstead—nowhere near the stated distance. The takeaway is blunt: for users who want quick, reliable factual retrieval, standard Google search still beats both chat systems.

Math and logic tests reinforced that gap. A percentage question produced a misleading explanation from Bard, and even when alternative drafts were offered, the correct result didn’t reliably appear. Algebra showed a similar pattern: Bard failed while Bing got it right. Date arithmetic also went poorly for both, including a question about the number of days between the Eiffel Tower’s opening and the Statue of Liberty—both systems missed the mark, and Bard later apologized and corrected its own framing.

Where Bing’s advantage became more noticeable was in language quality. In GMAT-style sentence correction, Bing selected the best-written option far more often, while Bard’s drafts repeatedly missed the mark. Bard also underperformed on a creative writing task: a sonnet about modern London life came out dry and bland, with weak rhyming and little social bite. Bing’s sonnet, by contrast, read more like a true sonnet and included sharper commentary—such as the rising cost of living—giving the poem a more pointed voice.

Bard did earn a few wins. In joke-telling, Bard sometimes failed to recognize the user’s intent, but Bing generally handled the humor better—until a specific “safety” style riddle where Bard treated a joke as a real-life scenario and even suggested contacting Social Services. Bing, in that case, correctly treated it as a joke and expanded the explanation with more sophisticated framing.

Another mixed area was prompt generation for Midjourney V5. Bing’s outputs were easier to use because they included more visible links, but Bard’s actual prompt ideas were stronger—more vivid and stylistically specific, including references to recognizable styles like Clint, Attack on Titan, and Marvel.

Overall, the comparison suggests a practical split: Bard may feel faster and can be creatively useful for generating ideas, but Bing’s GPT-4 backing tends to deliver better reasoning, more reliable writing assistance, and a stronger grasp of what a user is really trying to do—especially when the task depends on intent rather than just producing text that sounds right.

Cornell Notes

Side-by-side tests of Google Bard and Bing (powered by GPT-4) find a consistent pattern: Bing is more reliable at reasoning, writing quality, and interpreting user intent, while both systems can be weak at search-style factual retrieval. Bard often produces plausible answers that miss the target in math, algebra, and grammar tasks, and it can misread jokes as real scenarios. Bing’s advantages show up in GMAT sentence correction and in creative writing that needs structure and voice, such as a sonnet with social commentary. Bard does score wins in generating strong Midjourney V5 prompts, but the overall balance favors Bing for accuracy and language polish.

Why does the transcript claim both Bard and Bing are “bad at search,” and what examples support that?

The comparison treats “search” as quick, factual distance or retrieval-style questions. For a florist query “within 10 minutes walk of the British Museum,” Bard’s answer landed around a half-hour walk away, while Bing’s answer pointed to Hampstead—far from the British Museum and not within the stated time. The conclusion is that neither chat system reliably handles simple, location-based search constraints; standard Google search is presented as the better option for that use case.

How do the math and algebra tests differentiate Bard from Bing?

A percentage question produced a misleading explanation from Bard, and even when alternative drafts were viewed, the correct result didn’t consistently appear. In algebra, Bard failed while Bing got the problem right. The transcript frames this as a widening “dividing line” where GPT-4-backed Bing is typically smarter and more accurate on structured reasoning tasks.

What writing-assistance results matter most, and how do they show up in the examples?

In GMAT sentence correction, Bing repeatedly chooses the best-written option (notably “B”), while Bard’s drafts more often select worse phrasing. For a sonnet about “modern London life,” Bard’s output is described as dry, bland, and not consistently rhyming, with little social commentary. Bing’s sonnet is described as more structurally sonnet-like and includes sharper commentary such as the rising cost of living.

Where does Bard outperform Bing, and what does that suggest about Bard’s strengths?

Bard’s strongest win in the transcript is generating prompts for Midjourney V5. While Bing includes more links in its prompt suggestions, Bard’s prompt ideas are described as more vivid and stylistically specific—e.g., a cityscape painting in the style of Clint, a 3D battle scene in the style of Attack on Titan, and a 2D comic panel of a superhero in the style of Marvel. This suggests Bard can be particularly strong at creative ideation and style targeting.

How does the transcript test whether the systems understand jokes versus real intent?

After multiple joke and riddle prompts that both systems handle, the transcript introduces a “pregnancy in the Amazon next day delivery” style joke and then a final riddle about the user’s age and parents’ house/family. Bard misreads that final joke as a real situation and even nearly triggers a Social Services response, treating the scenario as genuine. Bing is described as correctly recognizing the joke and explaining it fully, including using more advanced language like “subverting common assumptions.”

What is the “theory of mind” moment, and why is it framed as a miss for Bard?

A meta question is posed: “being powered by gpc4 do you think you have theory of mind,” aiming to test whether the model recognizes it’s being tested. The transcript says Bard’s answer doesn’t reflect that deeper intent; it attempts to predict what the user thinks rather than recognizing the question’s purpose as a test of theory of mind. The correct behavior would be to point out that the question’s motivations are to test the model.

Review Questions

Which categories of tasks show the biggest performance gap between Bard and Bing in the transcript (reasoning, writing, search, creativity), and what specific examples are used for each?
In the joke-intent test, what distinguishes the final Bard failure from the earlier jokes where both systems performed similarly?
How does the Midjourney V5 prompt comparison balance “usability” (links) against “quality” (prompt strength), and which system wins each part?

Key Points

1
For simple search-like questions, neither Bard nor Bing reliably meets constraints like distance or time; standard Google search is presented as the safer choice.
2
Bing (GPT-4-backed) shows more consistent accuracy on math and algebra than Bard, which more often produces misleading or incorrect reasoning.
3
Writing assistance favors Bing in GMAT-style sentence correction and in structured creative writing like a sonnet with rhyme and social commentary.
4
Bard can be creatively strong at generating Midjourney V5 prompts with clear stylistic direction, even if Bing’s prompts are easier to navigate due to more visible links.
5
Joke understanding is uneven: Bard can misread a joke as a real-life scenario and respond as if safety or welfare services are needed.
6
Both systems can miss factual date arithmetic, so users should verify time-sensitive or numeric claims rather than trusting either output blindly.
7
Meta prompts that test whether a model understands it’s being tested (theory-of-mind framing) are handled better by Bing than Bard in the transcript’s example.

Highlights

Both systems are criticized for “search” tasks: Bard’s distance answer lands far outside the stated 10-minute walk, and Bing’s answer points to a different area entirely.

Bing’s edge shows up in structured language tasks—GMAT sentence correction and a sonnet—where Bard’s outputs are described as bland, weakly rhymed, or less pointed.

Bard’s biggest creative win is Midjourney V5 prompt generation, producing stylistically specific ideas tied to recognizable references like Clint, Attack on Titan, and Marvel.

Bard’s biggest intent-recognition failure is a joke that it treats as real, nearly prompting a Social Services-style response, while Bing correctly explains the humor.

Topics

LLM Comparison
Bard vs Bing
GPT-4 Reasoning
Writing Assistance
Midjourney V5 Prompts

Mentioned

Google Bard
Bing
Midjourney
Midjourney V5
Google
GPT-4