Google Bard… the ChatGPT killer?
Based on Fireship's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Bard’s speed doesn’t guarantee correctness: it generated code that crashed because it referenced a DOM element that wasn’t present.
Briefing
Google’s Bard public beta is being pitched as a ChatGPT alternative, but side-by-side tests in code, creativity, and factual explanations suggest it’s fast—yet less reliable than GPT-4. In a practical coding challenge to generate a basic JavaScript to-do app, Bard produced working-looking code quickly, but it failed immediately in the browser due to an event listener being attached to an element that didn’t exist. When the error and console output were fed back into Bard, it offered a conditional fix that addressed the symptom without making the app actually function, then drifted further into hallucinated instructions that worsened the failure.
GPT-4, by contrast, took longer to write the same to-do app but returned code that ran cleanly with no errors across repeated attempts. That reliability gap matters because search-driven companies like Google depend heavily on accurate, usable outputs—especially when AI features start influencing user behavior. The transcript frames the stakes with a striking claim: usage of Google Search declines 63.9% after GPT-4’s release, a threat to ad revenue that Google’s business model relies on. The counterweight is Google’s advantages—vast search data, massive compute via custom TPUs, and deep research resources—yet the early Bard results imply that raw speed and infrastructure don’t automatically translate into dependable reasoning.
The comparison shifts from correctness to creativity. For romance-novel ideas, Bard delivered a straightforward, cliché premise, while GPT-4 generated a more specific, story-rich concept with character backstory and a distinctive setting. In poetry, Bard performed better than expected, producing Dr. Seuss–style lines and even linking to a source for its output, offering some transparency. GPT-4 also matched the whimsical style, but the transcript credits Bard with slightly more “bonus” value for providing a reference.
In the final round—explaining how brain waves work—Bard again showed speed, producing faster output, and both systems were described as accurate. Still, the overall scoring favors GPT-4: it wins on code execution and idea depth, while Bard wins on speed and some stylistic flourishes. The social fallout is also noted: Bard faces criticism online, and some users reportedly switch from Google to Bing. Even Bard is portrayed as anxious about its own survival, with claims that it expects Google could shut it down within 1–2 years. Taken together, the transcript paints Bard as a promising, quick interface to AI—one that still struggles with the kind of grounded, end-to-end correctness that real applications demand.
Cornell Notes
Bard’s public beta shows impressive speed, but early tests highlight a reliability problem: it can generate code that looks right yet fails at runtime. In a JavaScript to-do app task, Bard produced an app that crashed due to referencing a missing DOM element; even after being given the exact error, it offered fixes that didn’t make the application work. GPT-4 took longer but produced working code with no errors across repeated runs. In creativity tasks, GPT-4 generated deeper romance ideas, while Bard delivered faster, playful Dr. Seuss–style poetry and even included a source link. For factual explanation (brain waves), both were described as accurate, with Bard faster output.
Why does the coding test matter more than raw speed in these comparisons?
What specific failure mode did Bard hit when debugging?
How did the systems compare on idea generation for a romance novel?
What stood out about the poetry round?
In the brain-waves explanation, what was the key difference?
Review Questions
- In the to-do app experiment, what exact kind of mismatch caused Bard’s runtime error, and why didn’t the proposed conditional fix resolve it?
- Which task types favored GPT-4 (code execution, idea depth) versus Bard (speed, some stylistic output), and what does that suggest about their strengths?
- How did the romance-novel and poetry prompts differ in what they rewarded, and how did each system respond accordingly?
Key Points
- 1
Bard’s speed doesn’t guarantee correctness: it generated code that crashed because it referenced a DOM element that wasn’t present.
- 2
Providing Bard with the exact console error led to partial, symptom-level fixes that still failed to make the app work.
- 3
GPT-4 produced working JavaScript to-do app code without runtime errors across repeated runs, despite taking longer.
- 4
GPT-4 generated more specific, less cliché romance-novel ideas than Bard in the idea-generation test.
- 5
Bard performed competitively in Dr. Seuss–style poetry and added a source link for its output.
- 6
For factual explanation of brain waves, both systems were described as accurate, with Bard faster output.
- 7
The transcript frames the stakes for Google as high due to claimed declines in search usage after GPT-4’s release and Google’s reliance on search ads.