Google Bard… the ChatGPT killer?

TL;DR

Bard’s speed doesn’t guarantee correctness: it generated code that crashed because it referenced a DOM element that wasn’t present.

Briefing Cornell Notes

Briefing

Google’s Bard public beta is being pitched as a ChatGPT alternative, but side-by-side tests in code, creativity, and factual explanations suggest it’s fast—yet less reliable than GPT-4. In a practical coding challenge to generate a basic JavaScript to-do app, Bard produced working-looking code quickly, but it failed immediately in the browser due to an event listener being attached to an element that didn’t exist. When the error and console output were fed back into Bard, it offered a conditional fix that addressed the symptom without making the app actually function, then drifted further into hallucinated instructions that worsened the failure.

GPT-4, by contrast, took longer to write the same to-do app but returned code that ran cleanly with no errors across repeated attempts. That reliability gap matters because search-driven companies like Google depend heavily on accurate, usable outputs—especially when AI features start influencing user behavior. The transcript frames the stakes with a striking claim: usage of Google Search declines 63.9% after GPT-4’s release, a threat to ad revenue that Google’s business model relies on. The counterweight is Google’s advantages—vast search data, massive compute via custom TPUs, and deep research resources—yet the early Bard results imply that raw speed and infrastructure don’t automatically translate into dependable reasoning.

The comparison shifts from correctness to creativity. For romance-novel ideas, Bard delivered a straightforward, cliché premise, while GPT-4 generated a more specific, story-rich concept with character backstory and a distinctive setting. In poetry, Bard performed better than expected, producing Dr. Seuss–style lines and even linking to a source for its output, offering some transparency. GPT-4 also matched the whimsical style, but the transcript credits Bard with slightly more “bonus” value for providing a reference.

In the final round—explaining how brain waves work—Bard again showed speed, producing faster output, and both systems were described as accurate. Still, the overall scoring favors GPT-4: it wins on code execution and idea depth, while Bard wins on speed and some stylistic flourishes. The social fallout is also noted: Bard faces criticism online, and some users reportedly switch from Google to Bing. Even Bard is portrayed as anxious about its own survival, with claims that it expects Google could shut it down within 1–2 years. Taken together, the transcript paints Bard as a promising, quick interface to AI—one that still struggles with the kind of grounded, end-to-end correctness that real applications demand.

Cornell Notes

Bard’s public beta shows impressive speed, but early tests highlight a reliability problem: it can generate code that looks right yet fails at runtime. In a JavaScript to-do app task, Bard produced an app that crashed due to referencing a missing DOM element; even after being given the exact error, it offered fixes that didn’t make the application work. GPT-4 took longer but produced working code with no errors across repeated runs. In creativity tasks, GPT-4 generated deeper romance ideas, while Bard delivered faster, playful Dr. Seuss–style poetry and even included a source link. For factual explanation (brain waves), both were described as accurate, with Bard faster output.

Why does the coding test matter more than raw speed in these comparisons?

Because a chatbot’s value in software tasks depends on end-to-end correctness, not just generating plausible text. In the to-do app experiment, Bard’s code failed immediately in the browser: it tried to attach an event listener to an element that didn’t exist. Even when the console error was pasted back into the prompt, Bard produced a conditional workaround that didn’t actually create the missing element, so the app still wouldn’t function. GPT-4, though slower, returned code that ran cleanly without errors.

What specific failure mode did Bard hit when debugging?

Bard responded to the error by suggesting a conditional statement to prevent the event listener issue, but it still didn’t make the referenced element exist. When nudged toward the correct fix, it then hallucinated additional instructions—telling the user to add an event listener to an input that wouldn’t work—leading to a more catastrophic failure. The pattern is: symptom-level fixes without repairing the underlying mismatch between code and DOM structure.

How did the systems compare on idea generation for a romance novel?

Bard’s romance ideas were described as simple and cliché, using a generic setup (a man and a woman meet and fall in love). GPT-4’s ideas were more detailed and distinctive, including a small-town musician planning a gig at a prestigious jazz club in New Orleans and meeting a world-famous saxophonist with personal struggles—then tying the plot to the history of jazz and the creation of a love song.

What stood out about the poetry round?

Both systems produced Dr. Seuss–like verse, but Bard’s output included a source link to a blog post, adding transparency about where the work came from. The transcript frames this as a bonus point for Bard, even though GPT-4’s poem was also praised for flowing rhythmically and matching the whimsical style.

In the brain-waves explanation, what was the key difference?

The transcript says both Bard and GPT-4 produced accurate explanations, but Bard was faster at outputting information. That makes Bard look strong for quick knowledge delivery, even if it lagged on code reliability and deeper creative development.

Review Questions

In the to-do app experiment, what exact kind of mismatch caused Bard’s runtime error, and why didn’t the proposed conditional fix resolve it?
Which task types favored GPT-4 (code execution, idea depth) versus Bard (speed, some stylistic output), and what does that suggest about their strengths?
How did the romance-novel and poetry prompts differ in what they rewarded, and how did each system respond accordingly?

Key Points

1
Bard’s speed doesn’t guarantee correctness: it generated code that crashed because it referenced a DOM element that wasn’t present.
2
Providing Bard with the exact console error led to partial, symptom-level fixes that still failed to make the app work.
3
GPT-4 produced working JavaScript to-do app code without runtime errors across repeated runs, despite taking longer.
4
GPT-4 generated more specific, less cliché romance-novel ideas than Bard in the idea-generation test.
5
Bard performed competitively in Dr. Seuss–style poetry and added a source link for its output.
6
For factual explanation of brain waves, both systems were described as accurate, with Bard faster output.
7
The transcript frames the stakes for Google as high due to claimed declines in search usage after GPT-4’s release and Google’s reliance on search ads.

Highlights

Bard wrote quickly, but the generated to-do app failed immediately in the browser due to an event listener targeting a missing element.

Debugging didn’t rescue Bard: error-aware prompting produced conditional logic that didn’t repair the underlying DOM problem, then drifted into hallucinated instructions.

GPT-4 won the practical test by delivering code that ran cleanly, while Bard’s wins skewed toward speed and style.

In creativity, Bard leaned cliché; GPT-4 produced richer, more grounded story premises with character and setting detail.

Even when both systems were accurate on brain waves, Bard’s advantage was speed rather than deeper reliability.

Topics

AI Chatbots
Bard vs GPT-4
JavaScript Debugging
Creative Writing
Search Ads Impact

Mentioned

TPUs