Get AI summaries of any video or article — Sign up free
New Products: A Deep Dive thumbnail

New Products: A Deep Dive

OpenAI·
6 min read

Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

GPTs are built from instructions (system message), knowledge (uploaded files with retrieval), and actions/tools that connect to external systems.

Briefing

OpenAI rolled out a hands-on look at two building blocks for an “agents-like” future: GPTs inside ChatGPT and the new Assistants API for embedding agentic experiences into apps. The core message is that developers can now package instructions, external knowledge, and real-world actions into reusable assistants—then either share them as custom ChatGPTs (GPTs) or wire them directly into their own products (Assistants API). That shift matters because it reduces the glue code developers previously had to write for state, retrieval, tool use, and context management.

For GPTs, the demo centered on a new GPT creation workflow that starts conversationally and then exposes deeper controls. A builder can chat with a GPT-in-progress to iteratively shape its behavior, then switch to a configuration view to inspect and edit the underlying “GPT anatomy”: instructions (system message), knowledge, and custom actions/tools. The UI also includes a testing pane to see how the GPT responds to real user prompts before publishing. A pirate-themed GPT (“Captain Coder,” then “salty” variants) illustrated how instructions can define personality and conversation starters, while a “Tasky” GPT demonstrated how actions connect a GPT to external systems.

Tasky used actions wrapped around the Asana API via retool, with OAuth and end-user confirmation built into the flow. In practice, the GPT could read a user’s to-dos and then create an actual Asana task after confirming the user’s intent—turning a chat interaction into a concrete workflow. A separate “Danny DevDay” GPT showcased knowledge: instead of relying on pretraining, it was given a PDF (Sam’s Keynote script) and could answer questions and summarize content using retrieval over the uploaded file. The demo emphasized that knowledge isn’t just for one-off summaries; the GPT can “talk to” the information as part of an ongoing conversation.

The most ambitious combined demo (“Mood Tunes”) stitched together instructions, knowledge, actions, and multimodal capabilities. It generated a mixtape concept from an image input, used browsing when needed to fill gaps not present in its knowledge set, produced album art via DALL·E, and then used an action connected to the Hue API to change lighting based on the chosen mood. It also offered to play a track on Spotify, illustrating how GPTs can orchestrate multiple external tools in a single experience.

The second half of the session moved from ChatGPT-native GPTs to the Assistants API, designed to let developers build similar assistant experiences inside their own apps. The Assistants API introduces three stateful primitives—Assistant (stored instructions, selected model, and tools), Threads (conversation state), and Messages (user/assistant posts)—plus a Runs primitive that packages one invocation of an assistant. Behind the scenes, runs handle context truncation, tool calls, and saving outputs back into the thread, reducing the need for developers to manage message history and retrieval plumbing themselves.

Tooling is central. Code Interpreter lets assistants write and run code in a sandbox to analyze data and generate charts. Retrieval provides built-in document augmentation without developers manually computing embeddings or building semantic search. Function calling lets assistants invoke developer-defined functions with structured arguments. Two upgrades were highlighted: JSON mode for guaranteed valid JSON outputs, and parallel function calling so multiple functions (e.g., play music and set volume) can execute in one pass. The session closed with a roadmap: multimodal support by default, bringing your own code execution, and asynchronous real-time integration via WebSockets/Webhooks.

Cornell Notes

OpenAI presented GPTs and the Assistants API as two paths to build agent-like systems. GPTs let developers create custom ChatGPTs by combining instructions (system message), knowledge (uploaded files with retrieval), and actions (tool integrations like Asana via retool). The Assistants API brings similar capabilities to developers’ own apps using stateful primitives: Assistant (instructions/tools/models), Threads (conversation state), Messages (posts), and Runs (one assistant invocation). Runs handle context truncation and saving outputs automatically, while tools like Code Interpreter, Retrieval, and Function Calling provide sandboxed code, built-in document search, and developer-defined function execution. JSON mode and parallel function calling improve reliability and reduce latency when multiple actions are needed.

What are the three core components of a GPT, and how did the demos make each one concrete?

GPTs were described as three parts: (1) instructions (also called the system message) to set personality and conversation behavior; (2) knowledge to ground responses in uploaded documents; and (3) actions/tools to connect the GPT to external systems. The pirate GPT demonstrated instructions by switching tone and identity (“salty pirate skilled in AI”). Tasky demonstrated actions by wrapping the Asana API via retool, using OAuth and end-user confirmation before creating a real Asana task. Danny DevDay demonstrated knowledge by uploading a PDF (Sam’s Keynote script) and then answering questions like summarizing the keynote using retrieval over the file.

How does the GPT builder UI change the workflow from “prompting” to “building”?

The creation UI starts conversationally: builders can chat with a GPT builder to iteratively shape the GPT. Then a configuration tab exposes internals—editing instructions, knowledge, custom actions, and tools. A testing tab lets builders try prompts against the configured GPT before publishing. The demo also showed publishing/sharing so the resulting GPT can be used by others, including via a mobile app experience where GPTs appear in the interface.

What problem does the Assistants API solve compared with earlier chat-style APIs?

Developers previously had to manage limited context windows, repeatedly send instructions, store and truncate message history, and implement retrieval plumbing (embeddings, chunking, semantic search). The Assistants API replaces that with stateful primitives: Assistant stores instructions/models/tools once; Threads store conversation history; Messages record user/assistant posts; and Runs package one invocation. Runs automatically handle loading messages, truncating to fit context, calling tools, and saving resulting messages back to the thread.

How do Code Interpreter, Retrieval, and Function Calling differ as tools?

Code Interpreter is a hosted sandbox where the model can write and run code to perform math, process files, analyze data, and generate charts. Retrieval augments the assistant with knowledge from outside the model by letting developers (or end users) upload documents; the system handles parsing, chunking, embeddings, and deciding when to retrieve, including citations/quotes. Function Calling lets developers define custom functions; the model selects which function to call and provides structured arguments, while the developer executes the function and returns outputs to the assistant.

What do JSON mode and parallel function calling change for developers?

JSON mode constrains model outputs so they always return valid JSON, making it safer to directly execute or parse results in application code. Parallel function calling allows the model to call multiple functions in one go—reducing latency and cost versus making separate API calls. The car assistant demo showed this by running both “play music” and “set audio volume” as separate function calls in parallel based on a single user request.

How did the “Mood Tunes” demo illustrate end-to-end orchestration across capabilities?

Mood Tunes combined multiple GPT anatomy elements and tools: it used an image input to infer a mood, generated a mixtape concept, used browsing when a needed band wasn’t in its knowledge set, created album art via DALL·E, and then used an action connected to the Hue API to change lighting to match the mood. It also offered to play a track on Spotify, demonstrating how a single assistant flow can coordinate knowledge, browsing, generation, and external device/app actions.

Review Questions

  1. In the Assistants API, what responsibilities are handled by Runs versus what must developers manage themselves?
  2. Compare how knowledge grounding works in GPTs versus retrieval in the Assistants API—what automation is emphasized in each?
  3. Why do JSON mode and parallel function calling matter for building reliable, low-latency assistant features?

Key Points

  1. 1

    GPTs are built from instructions (system message), knowledge (uploaded files with retrieval), and actions/tools that connect to external systems.

  2. 2

    A new GPT creation workflow supports iterative building via conversation, then deeper configuration and testing before publishing.

  3. 3

    Actions can integrate real services (e.g., Asana via retool) with OAuth and end-user confirmation before data is sent.

  4. 4

    Knowledge grounding in GPTs can come from uploaded documents (like a keynote PDF), enabling retrieval-based answers and file-grounded conversation.

  5. 5

    The Assistants API introduces stateful primitives—Assistant, Threads, Messages, and Runs—so developers don’t have to manually manage context truncation or message storage.

  6. 6

    Code Interpreter, Retrieval, and Function Calling provide three distinct tool categories: sandboxed code execution, built-in document search, and developer-defined function execution.

  7. 7

    Function calling improvements include JSON mode for valid JSON outputs and parallel function calling to execute multiple actions in one invocation.

Highlights

GPTs can be created conversationally, then inspected and edited through a configuration UI that exposes instructions, knowledge, and actions.
Tasky turned chat into a real workflow by confirming intent and creating an actual Asana task through an action integration.
The Assistants API’s Runs primitive packages model invocation, context truncation, tool calls, and saving outputs back into the thread automatically.
Retrieval is positioned as a one-click alternative to building embeddings and semantic search from scratch.
Parallel function calling enables multiple tool/function calls (e.g., play music and set volume) in a single pass, cutting latency.

Topics

  • GPTs Creation UI
  • GPT Anatomy
  • Assistants API Primitives
  • Tooling: Code Interpreter
  • Tooling: Retrieval
  • Tooling: Function Calling
  • JSON Mode
  • Parallel Function Calling

Mentioned

  • API
  • OAuth