Get AI summaries of any video or article — Sign up free
Biggest Week for AI in A WHILE! Meta’s Llama 4 & Apple goes Open Source, & More thumbnail

Biggest Week for AI in A WHILE! Meta’s Llama 4 & Apple goes Open Source, & More

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

The gp4 Omni mini model is highlighted for 128,000-token context, 16k output tokens per request, and low API pricing (15¢/1M input; 60¢/1M output).

Briefing

AI’s biggest story this week is a rapid shift toward cheaper, more capable models—paired with a clear push for multimodality and open access. A newly highlighted “gp4 Omni mini” model delivers a 128,000-token context window and 16k output tokens per request at strikingly low API prices: 15 cents per 1 million input tokens and 60 cents per 1 million output tokens. That pricing is framed as a major step-change for developers, with the creator arguing it reflects roughly a 99% cost reduction over two years for comparable intelligence, plus native multimodal support (text plus other modalities) that wasn’t standard at the same level just a couple of years ago.

The week also brought momentum on autonomous research and planning. OpenAI “Strawberry” is described as a new model approach aimed at letting AI not only answer questions but also plan ahead to navigate the internet and perform deeper, more reliable research than current systems. The transcript links this to a broader pain point: today’s models can struggle with dependable deep research, so a system that can act more autonomously could represent a meaningful upgrade. Timing remains speculative, with GPT-5 expectations placed early next year (Q1–Q2) rather than immediately.

On open-source releases, the pace accelerates across major players. Meta’s Llama 3 45B is reported as added to OpenRouter AI with Hugging Face model weights, signaling an imminent release and a likely fully open-source path. The transcript also points to Meta’s Llama 4 starting training in June, with claims that it will be fully multimodal—including audio—and that stopping Llama 3 70B training was meant to free resources for Llama 4. There’s also a regional release tension: Llama 3 45B may still reach the European Union, while “Llama Form” could face constraints tied to training-data use of EU Facebook and Instagram content.

Apple enters the open-source race with a lightweight 7B model that’s positioned as runnable on consumer hardware and even potentially on-device for “Apple Intelligence.” The model is described as fully open sourced and includes the pre-training dataset—an unusually transparent move. The transcript compares it favorably to Mistral 7B and suggests it could be practical for phones and laptops given its small size.

Beyond core model releases, the week highlighted tools that make AI easier to deploy locally and in real time. Pinocchio 2.0 is pitched as a frictionless way to run AI apps as local web services that look like normal websites, including “zero click launch” and the ability to expose a local machine to others. Fall AI’s live portrait demo shows real-time webcam-based face control and avatar transformation, while 11 Labs’ “Turbo 2.5” emphasizes faster, multilingual text-to-speech generation across 27 languages at lower cost.

Finally, a softer rumor sits at the end: Midjourney may be coming to Grok on X. The transcript treats this as the least certain and least important item, but notes it could matter if image generation becomes integrated into the X experience.

Taken together, the week reads like a coordinated push: lower inference costs, multimodal capability, more open releases, and tooling that brings AI closer to everyday users and developers—suggesting the post-summer slowdown may be giving way to another fast cycle of change.

Cornell Notes

The week’s central shift is toward cheaper, more capable AI models—especially those with long context windows and native multimodal abilities. A highlighted “gp4 Omni mini” model offers 128,000-token context and low API pricing (15¢/1M input tokens; 60¢/1M output tokens), making advanced usage more feasible for developers. OpenAI’s “Strawberry” concept targets more autonomous, reliable deep research by planning ahead and navigating the internet. Meta’s Llama roadmap emphasizes open access (Llama 3 45B weights on Hugging Face) and ongoing multimodal training (Llama 4 starting in June, including audio). Apple’s move to open-source a runnable 7B model with its pre-training dataset further accelerates the open ecosystem and on-device possibilities.

What makes the “gp4 Omni mini” pricing and specs a big deal for developers?

It combines a large 128,000-token context window with 16k output tokens per request and very low API rates: 15 cents per 1 million input tokens and 60 cents per 1 million output tokens. The transcript frames this as a major cost inflection—about a 99% reduction in cost over two years for similar intelligence—while also emphasizing native multimodality, which reduces the need for separate pipelines for different media types.

How does “Strawberry” aim to improve on today’s AI research limits?

“Strawberry” is described as a model approach designed to plan ahead so it can navigate the internet autonomously and perform deeper levels of research more reliably than current systems. The transcript ties this to a key weakness of many existing models: they can’t consistently deliver dependable deep research, even when they can generate plausible answers.

What signals suggest Meta’s Llama 3 45B and Llama 4 releases are moving quickly?

Llama 3 45B is said to have been added to OpenRouter AI with Hugging Face model weights attached, which the transcript treats as a strong indicator the release is close and likely fully open source. For Llama 4, the transcript cites claims that training began in June and that Meta paused Llama 3 70B training to start Llama 4, implying deliberate scheduling and fast execution.

Why is Apple’s open-source 7B model notable beyond just being “small”?

The transcript highlights that Apple’s 7B model is lightweight enough to run on consumer-grade GPUs and computers, and it’s fully open sourced with the pre-training dataset included. It’s positioned as better than Mistral 7B and is suggested as a candidate for on-device use (potentially even on iPhones/iPads) to support “Apple Intelligence.”

What does Pinocchio 2.0 change about running AI apps locally?

Pinocchio 2.0 is presented as making local AI apps indistinguishable from normal websites, with frictionless “zero click launch.” It also adds the ability to turn a local machine into a public web service so others can access the apps, making sharing AI projects easier without requiring users to install or configure complex setups.

Review Questions

  1. Which combination of context length, output limits, and token pricing makes the “gp4 Omni mini” model especially attractive for API developers?
  2. What problem with current AI deep research does “Strawberry” target, and how does autonomous internet navigation fit that goal?
  3. How do open-source signals (OpenRouter listing, Hugging Face weights, included pre-training datasets) influence expectations for what developers can do next?

Key Points

  1. 1

    The gp4 Omni mini model is highlighted for 128,000-token context, 16k output tokens per request, and low API pricing (15¢/1M input; 60¢/1M output).

  2. 2

    Native multimodality is treated as a major upgrade because it reduces reliance on separate systems for different media types.

  3. 3

    OpenAI’s “Strawberry” concept focuses on planning and autonomous internet navigation to deliver more reliable deep research.

  4. 4

    Meta’s Llama 3 45B is described as nearing release with Hugging Face weights and OpenRouter availability, while Llama 4 training reportedly began in June with audio multimodality.

  5. 5

    Apple’s open-source 7B model is positioned as runnable on consumer hardware and includes the pre-training dataset, raising expectations for on-device “Apple Intelligence.”

  6. 6

    Pinocchio 2.0 aims to make local AI apps frictionless by running them like normal websites and enabling public access to local services.

  7. 7

    Several demos and model updates (Fall AI live portrait, 11 Labs Turbo 2.5) emphasize real-time interaction and faster, cheaper multilingual text-to-speech.

Highlights

A 128,000-token context window paired with 15¢/1M input tokens and 60¢/1M output tokens is framed as a practical cost breakthrough for developers.
“Strawberry” is pitched as a route to more reliable deep research by planning ahead and navigating the internet autonomously.
Meta’s Llama 4 is described as starting training in June and targeting full multimodality, including audio.
Apple’s open-source 7B model includes its pre-training dataset and is positioned as feasible on consumer hardware and possibly on-device.