Google Bard - The Full Review. Bard vs Bing [LaMDA vs GPT 4]
Based on AI Explained's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
For simple search-like questions, neither Bard nor Bing reliably meets constraints like distance or time; standard Google search is presented as the safer choice.
Briefing
Bard and Bing both struggle when the task is straightforward web search or precise factual recall, but Bing—powered by GPT-4—consistently shows an edge in reasoning quality, writing polish, and “getting” the intent behind prompts. Across dozens of side-by-side tests, the most repeatable pattern is that Bing answers more accurately and more intelligently, while Bard often produces plausible-sounding responses that drift off target.
The clearest early warning sign comes from search-like questions. When asked a simple “within 10 minutes walk” florist query near the British Museum, Bard’s results landed around a half-hour walk away, while Bing’s answer pointed to Hampstead—nowhere near the stated distance. The takeaway is blunt: for users who want quick, reliable factual retrieval, standard Google search still beats both chat systems.
Math and logic tests reinforced that gap. A percentage question produced a misleading explanation from Bard, and even when alternative drafts were offered, the correct result didn’t reliably appear. Algebra showed a similar pattern: Bard failed while Bing got it right. Date arithmetic also went poorly for both, including a question about the number of days between the Eiffel Tower’s opening and the Statue of Liberty—both systems missed the mark, and Bard later apologized and corrected its own framing.
Where Bing’s advantage became more noticeable was in language quality. In GMAT-style sentence correction, Bing selected the best-written option far more often, while Bard’s drafts repeatedly missed the mark. Bard also underperformed on a creative writing task: a sonnet about modern London life came out dry and bland, with weak rhyming and little social bite. Bing’s sonnet, by contrast, read more like a true sonnet and included sharper commentary—such as the rising cost of living—giving the poem a more pointed voice.
Bard did earn a few wins. In joke-telling, Bard sometimes failed to recognize the user’s intent, but Bing generally handled the humor better—until a specific “safety” style riddle where Bard treated a joke as a real-life scenario and even suggested contacting Social Services. Bing, in that case, correctly treated it as a joke and expanded the explanation with more sophisticated framing.
Another mixed area was prompt generation for Midjourney V5. Bing’s outputs were easier to use because they included more visible links, but Bard’s actual prompt ideas were stronger—more vivid and stylistically specific, including references to recognizable styles like Clint, Attack on Titan, and Marvel.
Overall, the comparison suggests a practical split: Bard may feel faster and can be creatively useful for generating ideas, but Bing’s GPT-4 backing tends to deliver better reasoning, more reliable writing assistance, and a stronger grasp of what a user is really trying to do—especially when the task depends on intent rather than just producing text that sounds right.
Cornell Notes
Side-by-side tests of Google Bard and Bing (powered by GPT-4) find a consistent pattern: Bing is more reliable at reasoning, writing quality, and interpreting user intent, while both systems can be weak at search-style factual retrieval. Bard often produces plausible answers that miss the target in math, algebra, and grammar tasks, and it can misread jokes as real scenarios. Bing’s advantages show up in GMAT sentence correction and in creative writing that needs structure and voice, such as a sonnet with social commentary. Bard does score wins in generating strong Midjourney V5 prompts, but the overall balance favors Bing for accuracy and language polish.
Why does the transcript claim both Bard and Bing are “bad at search,” and what examples support that?
How do the math and algebra tests differentiate Bard from Bing?
What writing-assistance results matter most, and how do they show up in the examples?
Where does Bard outperform Bing, and what does that suggest about Bard’s strengths?
How does the transcript test whether the systems understand jokes versus real intent?
What is the “theory of mind” moment, and why is it framed as a miss for Bard?
Review Questions
- Which categories of tasks show the biggest performance gap between Bard and Bing in the transcript (reasoning, writing, search, creativity), and what specific examples are used for each?
- In the joke-intent test, what distinguishes the final Bard failure from the earlier jokes where both systems performed similarly?
- How does the Midjourney V5 prompt comparison balance “usability” (links) against “quality” (prompt strength), and which system wins each part?
Key Points
- 1
For simple search-like questions, neither Bard nor Bing reliably meets constraints like distance or time; standard Google search is presented as the safer choice.
- 2
Bing (GPT-4-backed) shows more consistent accuracy on math and algebra than Bard, which more often produces misleading or incorrect reasoning.
- 3
Writing assistance favors Bing in GMAT-style sentence correction and in structured creative writing like a sonnet with rhyme and social commentary.
- 4
Bard can be creatively strong at generating Midjourney V5 prompts with clear stylistic direction, even if Bing’s prompts are easier to navigate due to more visible links.
- 5
Joke understanding is uneven: Bard can misread a joke as a real-life scenario and respond as if safety or welfare services are needed.
- 6
Both systems can miss factual date arithmetic, so users should verify time-sensitive or numeric claims rather than trusting either output blindly.
- 7
Meta prompts that test whether a model understands it’s being tested (theory-of-mind framing) are handled better by Bing than Bard in the transcript’s example.