Bad AI Predictions: Bard Upgrade, 2 Years to AI Auto-Money, OpenAI Investigation and more
Based on AI Explained's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Palm 2-based Bard is presented as delivering more human-like translation speech quality, including for languages like Swahili, and as outperforming Google Translate on quality.
Briefing
AI progress is moving faster than major forecasts from just a few years ago—especially in translation quality, image understanding, and reading comprehension—while predictions about “AI auto-money” and AGI timelines have compressed dramatically. The through-line is that tasks once labeled unsolved are now close enough to be useful, and that shift is happening on a timescale forecasters didn’t anticipate.
The update begins with a 2021 book, “A Brief History of AI,” which lists many capabilities as not yet solved and even admits uncertainty about how to make computers perform certain tasks. That pessimism is contrasted with recent demonstrations using Bard’s underlying Palm 2 model. In translation, Bard produces more human-sounding text-to-speech even for Swahili, and the claim is that Palm 2 outperforms Google Translate on quality. In multimodal reasoning, Bard is given a meme and returns an interpretation that recognizes the image as a pizza shaped like the Death Star, connects the toppings to the reference, and even explains the humor by contrasting “death and destruction” with “food and enjoyment.” Bard also incorporates Google Lens for real-world assistance—identifying objects on walks—though it can refuse to answer when a human face appears in the frame. A more striking anecdote is that an image taken in a local park was sometimes recognized as the park’s location, even when it wasn’t widely known.
The transcript then pivots to benchmark-style evidence. A 2021 forecast for a math dataset predicted scores rising from 21 (2023) to 52 (2025), with a projected 80 only in 2028. Current performance is described as already near that mark: GPT-4 is said to reach 78% on the dataset without code interpreter or Wolfram Alpha, and experiments using GPT-4 with code interpreter push results to around 86%. The same pattern appears in language tasks. A 112-page novel generated by GPT-4 is presented as “interesting” even if not human-level, and the claim is that fine-tuning on an author’s work could bring models close to producing convincing full-length stories. Claude 2 is used to refine vocabulary in the GPT-4 novel, replacing generic phrasing with more vivid terms like “crystalline,” “ethereal,” and “inaugurable.” Reading comprehension is reinforced with a GRE verbal benchmark: GPT-4 is described as scoring at the 99th percentile, and the narrator reports personal practice results.
Finally, the transcript tackles the biggest forecast whiplash: AGI timelines. In 2021, predictions placed AGI in the late 2030s or early 2040s, but now estimates are pulled forward to 2026. Mustafa Suleiman of Inflection AI is cited as pushing even further, suggesting an AI could be built to make one million dollars in as little as two years—assuming strategy, product research, and some human approval. That money-making scenario is framed as potentially transformative for the global economy.
Regulation and deployment risk enter via an FTC investigation into OpenAI, described as a detailed document focused on internal communications about hallucinations and privacy risks. The transcript suggests that if penalties follow, companies may become more cautious about publicly releasing models. It closes with the competitive acceleration of funding—Elon Musk’s xAI offering up to $200 million in signing bonuses—and a nod to iRobot’s question of whether machines can create art, now with a more optimistic ending.
Cornell Notes
Recent AI capability gains are outpacing forecasts made in 2021, with improvements showing up in translation, image/meme interpretation, math benchmarks, and reading comprehension. Palm 2-based Bard is described as producing more human-like text-to-speech for languages like Swahili and as interpreting images in ways that connect visual details to references and humor. On math, a benchmark forecast predicted a score of 80 only in 2028, yet GPT-4 is claimed to already reach 78% (and about 86% with code interpreter). Story and comprehension tests also look close: a GPT-4-generated novel is treated as plausibly “interesting,” Claude 2 can make vocabulary less generic, and GPT-4’s GRE verbal performance is cited as near the top percentile. The stakes rise with compressed AGI timelines and “AI auto-money” predictions, alongside regulatory pressure from an FTC investigation into OpenAI.
How does the transcript use translation and speech quality to argue that AI progress is faster than expected?
What evidence is offered that multimodal models can interpret images beyond basic labeling?
How does the transcript challenge a 2021 math forecast with benchmark numbers?
What role do story-generation and vocabulary refinement play in the argument about near-term language capability?
Why does the transcript treat AGI timelines and “AI auto-money” as a major shift from earlier predictions?
How does regulation enter the picture, and what deployment consequence is suggested?
Review Questions
- Which specific benchmark forecast from 2021 is contradicted by current GPT-4 performance, and what numbers are given for both the forecast and the present results?
- What multimodal tasks are demonstrated with Bard/Palm 2 (translation, image/meme interpretation, Google Lens use), and what limitations are mentioned (e.g., face handling)?
- How does the transcript connect story-generation (novels, vocabulary refinement) to broader claims about reading comprehension and near-term capability gains?
Key Points
- 1
Palm 2-based Bard is presented as delivering more human-like translation speech quality, including for languages like Swahili, and as outperforming Google Translate on quality.
- 2
Bard’s multimodal capability is illustrated through meme interpretation that links visual details to references, reads embedded text, and explains humor.
- 3
A 2021 math benchmark forecast projected reaching a score of 80 in 2028, but GPT-4 is claimed to already be at 78% and about 86% with code interpreter.
- 4
Full-length story capability is framed as approaching usefulness: a GPT-4-generated 112-page novel is treated as “interesting,” and Claude 2 can make the prose less generic by swapping in more vivid vocabulary.
- 5
AGI timelines are described as compressing sharply—from late 2030s/early 2040s to around 2026—alongside “AI auto-money” predictions tied to Inflection AI’s Mustafa Suleiman.
- 6
Regulatory pressure from an FTC investigation into OpenAI is portrayed as a potential driver of greater caution about public model deployment.
- 7
Competitive momentum is reinforced by xAI’s reported $200 million signing bonuses for AI researchers.