How Building AI Products is Different | Notion After Hours
Based on Notion's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
AI product success depends on the harness—permissions, steerability, undoable actions, and UI—not just on model output quality.
Briefing
Building AI products at Notion hinges on a hard, practical tradeoff: deliver real value in messy, real-world inputs without over-constraining the model. The team’s core focus isn’t just getting a model to generate the “right” text—it’s designing the surrounding harness, permissions, steerability, and UI so the system can act safely and usefully inside enterprise workflows.
Several engineers and AI leaders trace that philosophy to long-horizon, stateful problems. Driving taught them that a model can’t be evaluated only in a controlled demo; what matters is what happens after step two or step three when the environment changes. That same combinatorial explosion shows up in chat: earlier messages reshape what’s possible later. In Notion’s framing, AI agents are essentially “driving” through enterprise knowledge workspaces—where context, prior actions, and permissions determine whether the next step is correct.
The conversation also draws a clear line between model labs and agent labs. Model capability matters, but the differentiator is expertise in integration and product layers: enterprise permissions, interpretability, undoable actions, and interfaces that meet customer expectations. Early on, Notion had to be explicit about what the system could do because users didn’t know how to use it. Over time, adoption improved as the product shifted from instruction-heavy interactions to goal-based setup—users specify an outcome (like configuring a relational database or generating meeting notes), while scaffolding and setup happen behind the scenes.
Decision-making for what to build follows a portfolio mindset rather than a single bet. Work is split across maintaining and improving what exists, tackling user pain points that demand new capabilities, and pushing capability boundaries. The team emphasizes that surprises are normal—early versions of features like meeting notes were “bad,” and some approaches (such as fine-tuning or tool-calling strategies) didn’t pan out. When assumptions fail, the response is not incremental patching but sometimes tearing down infrastructure and rebuilding.
Speed has become a competitive advantage. Demos are now far easier to produce than in earlier AI eras, but the real challenge is turning a demo into something valuable under real inputs. The team credits faster feedback loops—coding agents and rapid iteration—to the ability to reinvent parts of custom agent setup only weeks before launch. That requires low ego engineering: deleting code, revisiting assumptions, and letting the best idea win even when it’s someone’s own.
Finally, the group connects product building to everyday use. People describe using Notion AI to coordinate real-life tasks (like drafting a dog-care plan from messy group messages) and to manage personal workflows (credit card rewards tracking). Those stories reinforce the central message: the product’s success depends on harness design that makes models reliable enough to be useful, while still leaving room for the model to explore what it can do.
Cornell Notes
Notion’s AI product work centers on more than model quality: success depends on the harness around the model—permissions, steerability, undoable actions, and UI—that lets agents operate safely in enterprise settings. The team treats agent behavior as a stateful, long-horizon problem where earlier context changes later outcomes, similar to how driving must be evaluated in the real world. Adoption improves as interactions shift from users learning instructions to users stating goals, while setup scaffolding happens automatically. Building decisions follow a portfolio approach (improve current features, address pain points, and expand capabilities), with frequent experimentation and willingness to tear down and rebuild when assumptions fail. Fast iteration and low-ego engineering—especially deleting code—are portrayed as essential to keeping pace.
Why does the team compare agent behavior to driving rather than just “chatting with a model”?
What distinguishes an “agent lab” from a “model lab” in practice?
How does Notion reduce the barrier for users to adopt AI?
What’s the team’s approach to deciding what AI features to build?
What happens when an AI strategy fails—incremental fixes or bigger resets?
Why does “deleting code” show up as a core engineering principle?
Review Questions
- How does the stateful nature of chat change how agents should be evaluated compared with single-turn demos?
- What elements of the harness (permissions, steerability, undo, UI) most directly determine whether an AI agent is safe and useful in enterprise settings?
- Why does a portfolio approach to AI feature development matter when capability boundaries and user needs are both shifting quickly?
Key Points
- 1
AI product success depends on the harness—permissions, steerability, undoable actions, and UI—not just on model output quality.
- 2
Agent behavior must be evaluated as a long-horizon, stateful system where earlier context changes later outcomes.
- 3
Notion’s agent approach emphasizes enterprise integration expertise, treating agents like tools that operate safely inside knowledge workspaces.
- 4
Adoption improves when users state goals rather than writing detailed instructions, with scaffolding handled automatically.
- 5
Feature planning uses a portfolio of efforts: improve what exists, address user pain points, and expand capabilities.
- 6
Rapid iteration and faster feedback loops enable major changes to setup and harness design close to launch.
- 7
Low-ego engineering—including deleting code and rebuilding infrastructure when assumptions fail—is treated as essential to keep pace with fast-moving model capabilities.