Get AI summaries of any video or article — Sign up free
HUGE Open AI Announcements: GPT-4 Turbo, GPTs in ChatGPT, Assistants API, new modalities thumbnail

HUGE Open AI Announcements: GPT-4 Turbo, GPTs in ChatGPT, Assistants API, new modalities

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

GPT-4 Turbo adds up to 128,000 tokens of context, aiming for better accuracy over long passages than earlier GPT-4 context limits.

Briefing

OpenAI’s Dev Day announcements put a clear emphasis on scaling what GPT-4 can do—faster, cheaper, and with far longer context—then packaging those upgrades into new ways for developers and everyday users to build custom AI experiences. The centerpiece is GPT-4 Turbo, positioned as a major step up from existing GPT-4 offerings: it supports up to 128,000 tokens of context (with claims of accuracy improvements over long passages), adds “JSON mode” for reliably structured outputs, improves function calling (including the ability to call multiple functions in one go), and introduces consistent seed output so identical prompts can produce repeatable results. OpenAI also pairs the model with retrieval features so applications can pull in knowledge from uploaded documents or external databases, rather than relying only on what fits inside the prompt.

The practical impact is cost and capability. OpenAI says GPT-4 Turbo is dramatically cheaper than GPT-4—about 3x lower for prompt tokens and 2x lower for completion tokens—along with specific pricing examples (1 cent per 1,000 input tokens and 3 cents per 1,000 output tokens). That matters because many real-world products are constrained not by model quality alone, but by inference cost and the engineering overhead of handling long documents, structured outputs, and tool use. OpenAI also updates ChatGPT Plus so GPT-4 Turbo becomes the default model, and it reduces friction in the interface by removing the model picker drop-down.

Beyond the model, OpenAI pushes “GPTs” as a new layer inside ChatGPT: tailored versions of ChatGPT built for specific purposes using instructions, expanded knowledge, and actions. These custom GPTs can be created without coding, configured with uploaded files (leveraging retrieval), and connected to tools such as web browsing, image generation (including Dolly 3), and code interpreter. They can also be published—initially via links and later through a GPT store—opening the door to community-built bots, a leaderboard-style spotlighting system, and potential revenue sharing for popular GPTs.

A major theme is tool-using assistants that feel closer to autonomous agents. Demos show GPTs connecting to services like Google Calendar through Zapier to check schedules, identify conflicts, and even message people when permissions are granted. OpenAI also highlights security controls: GPTs ask for user permission before accessing data or performing actions.

For developers, OpenAI introduces an Assistants API with new “modalities” and a smoother developer experience built around threads and messages. The API is demonstrated with a travel app assistant that can interact with app UI components (including Apple Maps), stream responses, and invoke multiple functions with guaranteed JSON output and no added latency. Retrieval is showcased as a way to ingest long documents—like flight tickets and Airbnb details—without requiring developers to build complex chunking pipelines.

Finally, OpenAI expands its multimodal ecosystem: Dolly 3 gets an API, GPT-4 Turbo includes vision capabilities, text-to-speech is offered via an API with multiple voices, and Whisper V3 is released for speech-to-text. Taken together, the announcements aim to make high-end AI cheaper to run, easier to integrate, and more usable—whether building full applications with the API or creating custom assistants inside ChatGPT without writing code.

Cornell Notes

OpenAI’s Dev Day announcements center on GPT-4 Turbo: a GPT-4 upgrade with up to 128,000 tokens of context, improved long-context accuracy, JSON mode for valid structured outputs, stronger function calling (including multiple functions per turn), and consistent seed output for repeatability. OpenAI pairs the model with retrieval so apps can ground answers in uploaded documents or external databases, not just prompt text. The new “GPTs” feature lets users build tailored ChatGPT versions using instructions, knowledge files, and actions/tools—then share them via links and later a GPT store. For developers, the Assistants API introduces threads/messages and retrieval/tool integration, demonstrated with assistants that can stream responses and invoke app functions (e.g., updating Apple Maps) using guaranteed JSON outputs. The overall goal is to make long-context, tool-using AI cheaper and easier to deploy.

What makes GPT-4 Turbo a step change for real applications, beyond “faster GPT-4” claims?

GPT-4 Turbo is positioned as both more capable and more deployable: it supports up to 128,000 tokens of context, adds “JSON mode” to ensure responses are valid JSON (useful for API integrations), and improves function calling so multiple functions can be invoked in one go while following instructions more reliably. OpenAI also highlights consistent seed output—given the same input prompt, the model can produce the same output—making it easier to test and iterate on prompt-driven behavior.

How does retrieval change what developers can build with GPT-4 Turbo?

Retrieval lets an assistant pull knowledge from outside sources—uploaded documents or databases—so the system can repeatedly reference that information without stuffing everything into the prompt. OpenAI also updates the knowledge cutoff for GPT-4 Turbo to April 2023 and says it will continue improving it. In demos, assistants ingest PDFs (like flight tickets and Airbnb details) and then use that content to populate the app UI, reducing the need for developers to build custom chunking pipelines.

What are “GPTs” in ChatGPT, and what components can they combine?

GPTs are tailored versions of ChatGPT built for a specific purpose using (1) instructions, (2) expanded knowledge (including uploaded files that feed retrieval), and (3) actions/tools. They can be configured to use capabilities like web browsing, Dolly 3 image generation, and code interpreter, and they can connect to external services via actions (demonstrated with Zapier). GPTs are designed to ask for permission before accessing data or performing actions.

Why does the Assistants API matter for building agent-like experiences?

The Assistants API is built around threads and messages, letting developers maintain conversation state and stream responses back to an app. It also emphasizes tool use: the demo shows function calling that guarantees JSON output with no added latency and supports invoking multiple functions at once. Combined with retrieval, this enables assistants that can interact with app components (like updating Apple Maps pins) while grounding responses in user-provided documents.

How do pricing and context length connect to product feasibility?

Long-context capability and lower cost directly affect whether document-heavy products are practical. OpenAI claims GPT-4 Turbo is considerably cheaper than GPT-4 (3x lower for prompt tokens and 2x lower for completion tokens) and supports up to 128,000 tokens of context. Together, these changes reduce both the cost of running large prompts and the engineering pressure to aggressively summarize or chunk content.

What multimodal and speech updates were announced alongside GPT-4 Turbo and GPTs?

OpenAI announced Dolly 3 getting its own API, GPT-4 Turbo offering vision capabilities, and text-to-speech via an API with multiple voices. It also released Whisper V3 for speech-to-text, described as converting spoken audio into text for downstream LLM reasoning and then producing output via text-to-speech.

Review Questions

  1. Which GPT-4 Turbo features most directly improve API integration reliability and tool use (and why)?
  2. How do retrieval and uploaded files differ from simply increasing context length?
  3. What permissions and publishing mechanisms are described for custom GPTs, and how might those affect adoption?

Key Points

  1. 1

    GPT-4 Turbo adds up to 128,000 tokens of context, aiming for better accuracy over long passages than earlier GPT-4 context limits.

  2. 2

    JSON mode and improved function calling (including multiple functions per turn) are designed to make GPT outputs easier to wire into production systems.

  3. 3

    Consistent seed output is presented as a way to make prompt-driven behavior more repeatable for testing and iteration.

  4. 4

    Retrieval is positioned as a core capability: assistants can ground answers in uploaded documents or external databases rather than relying only on prompt text.

  5. 5

    “GPTs” in ChatGPT let users build tailored assistants using instructions, knowledge files, and actions/tools, with permission prompts before data access or actions.

  6. 6

    OpenAI’s Assistants API introduces threads/messages and tool integration, demonstrated with assistants that can update app UI components like Apple Maps using guaranteed JSON function calls.

  7. 7

    OpenAI expanded multimodal and speech offerings with Dolly 3 API access, GPT-4 Turbo vision, text-to-speech APIs, and Whisper V3 speech recognition.

Highlights

GPT-4 Turbo’s headline upgrade is 128,000-token context, paired with claims of improved long-context accuracy and developer-friendly output controls.
JSON mode and multi-function calling are framed as practical building blocks for reliable integrations, not just better chat quality.
GPTs turn ChatGPT into a customizable platform: instructions + uploaded knowledge + actions, with permission gating before actions run.
The Assistants API demo shows agent-like behavior where natural language triggers real UI updates (e.g., Apple Maps pins) via function calls with guaranteed JSON output.
Retrieval is repeatedly emphasized as the mechanism for grounding assistants in long documents without forcing developers to hand-roll chunking.

Mentioned

  • API
  • JSON
  • GPT
  • LLM
  • YC