Context Engineering & Coding Agents with Cursor
Based on OpenAI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Tab’s next-action model is trained using large-scale acceptance/rejection data and updated in near real time via online RL.
Briefing
Cursor’s approach to AI coding hinges on a shift from “autocomplete” to autonomous coding agents—powered less by clever prompting and more by deliberate context engineering. The core claim is that software engineering can move faster when models get the right information at the right time, and when heavy computation (like code retrieval) is pushed offline so runtime stays fast and cheap.
The talk traces that evolution through Cursor’s own product history. Tab began as next-word prediction inspired by GitHub Copilot, then progressed to predicting the next line and ultimately the next action. Tab now handles more than 400 million requests per day, and that scale feeds a feedback loop: accepted suggestions are reinforced, rejected ones are penalized, and the model is updated in near real time using online RL. A key constraint is latency—suggestions slower than about 200 milliseconds disrupt developer flow—so the latest release shows fewer suggestions but with higher confidence.
From there, Cursor moves into coding agents. Instead of only generating text, agents can create or update entire code blocks after conversational prompts. Cursor emphasizes adjustable autonomy: early steps included inline diffs that use the current line plus broader file context, followed by Composer for multi-file edits with a more conversational workflow. In 2024, Cursor added a fully autonomous agent that uses more tokens for tool calling and can self-gather context, reducing the need for users to supply everything up front.
Context engineering becomes the centerpiece. As context windows grow, models still struggle to recall reliably, so the goal is minimal, high-quality context rather than maximal context. Retrieval is treated as fundamental. For codebase search, Cursor compares traditional string search tools (like GP/RIP Grep) with semantic search built on embeddings. Semantic search helps the agent find the correct file even when names differ from what the model “expects” (e.g., mapping a request for “top navigation” to header.tsx). Cursor also moved from an off-the-shelf embedding model to a custom one and runs A/B tests; semantic search increased follow-up questions and token usage, but the biggest win is shifting compute and latency to indexing time. The result: faster, cheaper agent responses at runtime without sacrificing acceptance quality.
The talk then widens from retrieval to agent UX and extensibility. Cursor argues that CLIs are useful but not the end state; agents should be scriptable and available across surfaces—terminal, web, phone, Slack bug reports, or Linear backlog triage. It also highlights specialized agents beyond editing, including Bugbot, an internal tool that reads and reviews code to find logic bugs and reportedly caught issues missed during reviews.
Long-horizon performance depends on planning and research upfront, plus deeper product integration: storing plans, editing files accordingly, and giving agents tools like to-do lists so they don’t lose track or waste tokens. Safety and trust remain central, with a human-in-the-loop model for shell commands via one-time prompts or allow lists.
Finally, the vision points beyond today’s interfaces: manage multiple agents in parallel (locally or in cloud sandboxes), explore model “competition” (different reasoning levels or providers), and let agents verify their work by running code and using browser automation. Michael Kourser’s outlook frames Cursor’s goal as automating coding by combining model capability, autonomy, and human-computer interaction—freeing engineers from toil so they can focus on hard problems, design, and building what matters.
Cornell Notes
Cursor’s evolution of AI coding centers on context engineering and autonomy: models perform better when they receive intentional, minimal, high-quality context and when retrieval work is done ahead of time. Tab progressed from next-word prediction to next-action suggestions, using large-scale acceptance/rejection data and near-real-time online RL updates. Cursor’s coding agents then expanded from inline diffs and conversational multi-file edits to fully autonomous agents that can self-gather context via tool calling. Semantic search with embeddings (paired with string search) improves how agents retrieve the right code, while indexing shifts compute and latency offline for faster runtime. The product also emphasizes adjustable autonomy, human-in-the-loop safety for shell commands, and longer-horizon planning to raise code quality.
How did Tab evolve from basic autocomplete to a next-action system, and why does that matter for agent performance?
What is “context engineering” in this framework, and why isn’t bigger context always better?
Why does Cursor prefer semantic search (embeddings) alongside string search tools like GP/RIP Grep?
How does Cursor control autonomy so developers stay in charge?
What enables longer-horizon coding tasks beyond simple prompt changes?
What does “multiple agents” require, and what trade-offs appear in local vs cloud execution?
Review Questions
- What specific feedback loop does Tab use to improve next-action predictions, and how quickly are updates applied?
- How does semantic search change the runtime cost profile of coding agents compared with GP/RIP Grep alone?
- What product-level mechanisms (beyond prompting) does Cursor use to support longer-horizon agent work and maintain task focus?
Key Points
- 1
Tab’s next-action model is trained using large-scale acceptance/rejection data and updated in near real time via online RL.
- 2
Cursor treats context engineering as minimal, high-quality context plus retrieval, because larger context windows can reduce recall accuracy.
- 3
Semantic search with embeddings improves code retrieval accuracy (e.g., mapping intent to header.tsx) and shifts compute/latency to indexing time for faster runtime.
- 4
Coding agents are designed with adjustable autonomy, starting from inline diffs and conversational multi-file edits and progressing to fully autonomous tool-using agents.
- 5
Trust and safety are enforced through human-in-the-loop controls for shell commands, including one-time prompts and allow lists that can be shared with teams.
- 6
Long-horizon tasks improve when planning, to-do management, and deeper product integration are built into the agent workflow.
- 7
Multiple-agent execution requires careful isolation—locally via tools like git work trees and in the cloud via sandbox VMs—each with distinct setup and latency trade-offs.