How Building AI Products is Different

TL;DR

AI product success depends on the harness—permissions, steerability, undoable actions, and UI—not just on model output quality.

Briefing Cornell Notes

Briefing

Building AI products at Notion hinges on a hard, practical tradeoff: deliver real value in messy, real-world inputs without over-constraining the model. The team’s core focus isn’t just getting a model to generate the “right” text—it’s designing the surrounding harness, permissions, steerability, and UI so the system can act safely and usefully inside enterprise workflows.

Several engineers and AI leaders trace that philosophy to long-horizon, stateful problems. Driving taught them that a model can’t be evaluated only in a controlled demo; what matters is what happens after step two or step three when the environment changes. That same combinatorial explosion shows up in chat: earlier messages reshape what’s possible later. In Notion’s framing, AI agents are essentially “driving” through enterprise knowledge workspaces—where context, prior actions, and permissions determine whether the next step is correct.

The conversation also draws a clear line between model labs and agent labs. Model capability matters, but the differentiator is expertise in integration and product layers: enterprise permissions, interpretability, undoable actions, and interfaces that meet customer expectations. Early on, Notion had to be explicit about what the system could do because users didn’t know how to use it. Over time, adoption improved as the product shifted from instruction-heavy interactions to goal-based setup—users specify an outcome (like configuring a relational database or generating meeting notes), while scaffolding and setup happen behind the scenes.

Decision-making for what to build follows a portfolio mindset rather than a single bet. Work is split across maintaining and improving what exists, tackling user pain points that demand new capabilities, and pushing capability boundaries. The team emphasizes that surprises are normal—early versions of features like meeting notes were “bad,” and some approaches (such as fine-tuning or tool-calling strategies) didn’t pan out. When assumptions fail, the response is not incremental patching but sometimes tearing down infrastructure and rebuilding.

Speed has become a competitive advantage. Demos are now far easier to produce than in earlier AI eras, but the real challenge is turning a demo into something valuable under real inputs. The team credits faster feedback loops—coding agents and rapid iteration—to the ability to reinvent parts of custom agent setup only weeks before launch. That requires low ego engineering: deleting code, revisiting assumptions, and letting the best idea win even when it’s someone’s own.

Finally, the group connects product building to everyday use. People describe using Notion AI to coordinate real-life tasks (like drafting a dog-care plan from messy group messages) and to manage personal workflows (credit card rewards tracking). Those stories reinforce the central message: the product’s success depends on harness design that makes models reliable enough to be useful, while still leaving room for the model to explore what it can do.

Cornell Notes

Notion’s AI product work centers on more than model quality: success depends on the harness around the model—permissions, steerability, undoable actions, and UI—that lets agents operate safely in enterprise settings. The team treats agent behavior as a stateful, long-horizon problem where earlier context changes later outcomes, similar to how driving must be evaluated in the real world. Adoption improves as interactions shift from users learning instructions to users stating goals, while setup scaffolding happens automatically. Building decisions follow a portfolio approach (improve current features, address pain points, and expand capabilities), with frequent experimentation and willingness to tear down and rebuild when assumptions fail. Fast iteration and low-ego engineering—especially deleting code—are portrayed as essential to keeping pace.

Why does the team compare agent behavior to driving rather than just “chatting with a model”?

Driving is treated as a stateful, long-horizon task: where you are and how you interact affects what’s possible next, and evaluation can’t stop after a single step. Chat works the same way because earlier messages reshape later possibilities. That creates a combinatorial explosion of scenarios, so agents must be evaluated after multiple steps and in realistic conditions, not only in a controlled demo.

What distinguishes an “agent lab” from a “model lab” in practice?

An agent lab still needs modeling expertise, but it places heavier weight on integration and product engineering: enterprise permissions, interpretability, UI standards, and the ability to undo actions. The goal is to make model outputs usable inside real customer workflows, including constraints like access control and safe execution—areas that grow in importance for teams building non-model-specific enterprise products.

How does Notion reduce the barrier for users to adopt AI?

The interaction shifts from users needing to learn how to instruct the model to users stating a goal. Instead of expecting detailed prompts, users can describe outcomes (e.g., setting up a relational database or generating meeting notes), and the system handles scaffolding and setup behind the scenes. This lowers entry friction and makes AI feel like a capability the product provides, not a skill the user must master.

What’s the team’s approach to deciding what AI features to build?

Feature selection follows buckets and a portfolio mindset: (1) mandatory work to improve retention and keep AI interesting, (2) new builds driven by user pain points, and (3) capability expansion—pushing what the system can do. Success requires running multiple threads because it’s unclear which effort will mature into the most valuable product direction.

What happens when an AI strategy fails—incremental fixes or bigger resets?

When assumptions break, the team is willing to retry and revisit them, including tearing apart infrastructure and rebuilding. Examples mentioned include wasted resources on approaches like tool-calling setup that didn’t work as intended, and earlier attempts such as fine-tuning an agent model in 2024. The key is recognizing when a problem isn’t worth revisiting versus when the underlying architecture must change.

Why does “deleting code” show up as a core engineering principle?

As models improve and harnesses evolve, scaffolding that once seemed necessary can become counterproductive. The team emphasizes low ego and rapid iteration: deleting old code, removing unnecessary guidance, and finding the “gentlest strokes” needed to keep critical constraints intact (permissions, steerability, undo). The aim is to avoid over-prompting that “nerfs” the model while still enforcing safety and control.

Review Questions

How does the stateful nature of chat change how agents should be evaluated compared with single-turn demos?
What elements of the harness (permissions, steerability, undo, UI) most directly determine whether an AI agent is safe and useful in enterprise settings?
Why does a portfolio approach to AI feature development matter when capability boundaries and user needs are both shifting quickly?

Key Points

1
AI product success depends on the harness—permissions, steerability, undoable actions, and UI—not just on model output quality.
2
Agent behavior must be evaluated as a long-horizon, stateful system where earlier context changes later outcomes.
3
Notion’s agent approach emphasizes enterprise integration expertise, treating agents like tools that operate safely inside knowledge workspaces.
4
Adoption improves when users state goals rather than writing detailed instructions, with scaffolding handled automatically.
5
Feature planning uses a portfolio of efforts: improve what exists, address user pain points, and expand capabilities.
6
Rapid iteration and faster feedback loops enable major changes to setup and harness design close to launch.
7
Low-ego engineering—including deleting code and rebuilding infrastructure when assumptions fail—is treated as essential to keep pace with fast-moving model capabilities.

Highlights

Agents are framed as “driving” enterprise knowledge workspaces: state, context, and permissions determine what’s possible next.

The biggest product shift is moving from instruction-heavy interactions to goal-based setup, lowering the barrier to entry.

Demos are easier to build now, but turning them into reliable, valuable systems under real inputs remains the hard part.

When strategies fail, the team sometimes tears down and rebuilds infrastructure rather than patching around broken assumptions.

The harness challenge is to provide just enough guidance to enforce safety and control without nerfing the model’s freedom to explore. 

Topics

AI Product Building
Agent Harness
Enterprise Permissions
Goal-Based UX
Iteration Speed

How Building AI Products is Different | Notion After Hours