AI - 2024AD: 212-page Report (from this morning) Fully Read w/ Highlights
Based on AI Explained's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Frontier models are increasingly converging in capability because of overlapping pre-training data, shifting competition toward scaling and deployment speed.
Briefing
A six-year “State of AI” report released by Andreessen Horowitz (a16z) Capital frames 2024 as a year when leading models stopped feeling like separate species and started converging—while costs, compute, and multimodal capability kept accelerating. The report’s headline theme is that major systems are increasingly overlapping in what they can do because they’re trained on similar large-scale data, pushing models such as Claude 3.5, Sonnet, Grok 2, and Gemini 1.5 toward GPT-4–level behavior rather than diverging into fundamentally different approaches. That convergence matters because it shifts the competitive question from “who has the best architecture” to “who can scale reliably, ship safely, and monetize faster.”
The report also revisits earlier forecasts and measures them against reality. One prediction—about spending more than a billion dollars to train a single frontier model—was judged harshly at the time, but later reporting suggests the scale is now firmly in the billions. OpenAI’s projected 2024 training costs were cited as roughly $3 billion for frontier-model training, excluding additional compute used for research and iteration. The same reporting also pointed to a longer runway for profitability, with internal sources suggesting OpenAI may not turn a profit until 2029. The implication is that the frontier race is not just about breakthroughs; it’s about sustained, expensive compute cycles.
Multimodality is treated as a practical turning point rather than a novelty. Meta’s Movie Gen is highlighted for generating audio alongside video, and the transcript contrasts that paper-based progress with tools that let users create short clips immediately—upload an image, choose an effect, and generate “melt,” “explode,” or “squish” style transformations. The report’s broader point is that models are moving from single-task text generation into systems that can manipulate multiple media streams in one workflow.
Science and medicine are also pulled into the spotlight. The Nobel Prize in Physics and Chemistry is mentioned in connection with neural-network-driven work (including AlphaFold), reinforcing a narrative that AI is increasingly “eating science” by accelerating discovery. One standout example from the report is “brain LM,” a transformer-based model that can be fine-tuned to predict clinical variables such as age and anxiety disorders from brain activity, and potentially support in-silico medication testing by simulating biologically meaningful responses.
Hardware and scaling dynamics get their own section. A chart tracks how quickly Nvidia data center GPUs are released and how teraflops per chip rise over time, while also noting that clustering more GPUs across data centers multiplies the effective compute. The transcript ties this to a belief that the next two years of progress are largely “baked in” by sheer scale.
Yet the report doesn’t treat risks as solved. Jailbreaking remains an active problem, with stealth attacks and compromised instruction hierarchies reported to persist for hours. A taxonomy of real-world misuse is used to ground concerns in current harms—especially impersonation scams and non-consensual intimate image generation—rather than hypothetical future catastrophes.
Finally, the report’s next-year predictions are presented as a mix of firm and vague claims. The transcript’s own critique focuses on how hard it is to measure “meaningful changes” or “breakout status” without clear metrics. It ends with a forward-looking bet: that the next major leap will come from models that break state-of-the-art performance across multiple modalities at once, alongside continued rapid scaling and valuation growth.
Cornell Notes
The report’s core message is that frontier AI models are converging in capability because they share heavy overlap in pre-training data, making the competitive edge less about novelty and more about scaling, shipping, and cost control. It places multimodality at the center of 2024 progress, highlighting systems that generate or transform multiple media types and pointing to tools that make these capabilities immediately usable. It also connects AI to science and medicine, including “brain LM,” which can predict clinical variables from brain activity and potentially support in-silico testing. Despite momentum, the report stresses that safety work is not “done”: jailbreaking techniques still work, and real-world misuse patterns remain concentrated in impersonation and non-consensual intimate imagery. The outlook for 2025 mixes measurable predictions with vague language, making outcomes harder to verify.
What does “model convergence” mean in practical terms, and why does it matter for competition?
How do the report’s cost figures change the way you think about the frontier race?
Why is multimodality treated as more than a feature in 2024?
What safety message comes through most strongly?
How does the report connect AI to scientific discovery and medical applications?
What does the transcript suggest about measuring next-year predictions?
Review Questions
- Which factors in the transcript are used to explain why leading models are converging in capability?
- What evidence is cited that jailbreaking remains an active, unsolved problem?
- How does “brain LM” differ from typical text-only models, and what medical use cases does it suggest?
Key Points
- 1
Frontier models are increasingly converging in capability because of overlapping pre-training data, shifting competition toward scaling and deployment speed.
- 2
OpenAI’s training costs are described as multi-billion-dollar at the frontier level, with additional research compute and iteration costs pushing total spend higher.
- 3
Multimodality is moving from research novelty to product workflows, with systems generating or transforming audio and video together.
- 4
AI is being tied to scientific acceleration and medical prediction, including brain LM’s ability to predict clinical variables from brain activity and support in-silico testing.
- 5
Safety remains unresolved: jailbreaking techniques still work, and real-world misuse is concentrated in impersonation and non-consensual intimate imagery.
- 6
Hardware release cadence and rising per-chip compute (plus larger GPU clustering) are presented as major drivers of near-term progress.
- 7
Next-year predictions are harder to verify when they rely on vague terms rather than measurable benchmarks.