Build Anything with Claude, Here's How

TL;DR

Sonnet 4.5 is framed as a major coding upgrade: five times cheaper than Opus 4.1 while reportedly outperforming it on many benchmarks, including SWEBench verified results.

Briefing Cornell Notes

Briefing

Anthropic’s Sonnet 4.5 and the expanding Claude “Cloth” toolchain are being positioned as a practical jump in software engineering: cheaper, faster, and better at verified coding—while new agent features (Cloth Code 2.0, context editing, and browser control) push Claude from “chat” toward hands-on automation. The pitch is straightforward: Sonnet 4.5 is framed as a top-tier coding model that costs about one-fifth as much as Opus 4.1 (five times cheaper) while beating it on nearly every benchmark, including SWEBench verified software engineering results with and without reasoning.

Cloth Code 2.0 is presented as the biggest upgrade to the agent workflow, adding checkpoints (rewind to a prior conversation state), a revamped terminal interface with a mascot, context editing, and an ID extension. Those changes matter because they address two recurring failure modes in agent coding: losing track of earlier decisions and running out of usable context. The transcript introduces “context anxiety,” a behavior tied to the model’s awareness of its context window—when Claude nears the limit, it may summarize progress to save tokens but also rush toward completion, potentially taking shortcuts. Context editing is described as a mechanism that compresses older, less relevant tool outputs and conversation history so the model can keep more room for what matters most.

A major demonstration shows Sonnet 4.5 generating a working JavaScript app from a single prompt—no corrections, no errors—then iterating on physics behaviors like repulsion. The same theme continues with “Claude Imagine,” described as a new way to build software using pre-built components that generate UI procedurally like an interactive mini operating system. Presets generate interfaces in real time as the user explores, though the transcript flags a downside: the output may be hard to reproduce as stable, deterministic code.

The most consequential shift for day-to-day productivity is “browser use” via an official Claude browser extension. The transcript claims this is a pilot limited to 1,000 users on the Max plan and demonstrates Claude taking control of a browser window: reading page content, taking screenshots, locating buttons via the DOM, and clicking through an email inbox demo (including archiving specific messages). It then moves to a Gmail scenario with custom instructions (archive declined Google Meet emails), emphasizing that the agent still needs clear prompts and occasional handholding—yet it can execute multi-step actions reliably.

For developers, the transcript highlights an “agent SDK” behind Cloth Code: a feedback loop that gathers context, takes actions with tools, verifies results, and repeats. Concrete guidance is given for building reliable agents: organize context in files (often Markdown/text) rather than dumping everything into the context window, load only relevant sections to avoid context overload, use purpose-built tools instead of random ones, and add verification steps (tests, screenshots, DOM checks, database read-backs) so automation can be trusted in production.

Finally, the transcript argues for a workflow that combines Claude’s strengths with Codex’s strengths. Cloud Code is framed as faster and more pleasant for targeted, safe changes and UI iteration, while Codex (notably “Codex High” with GPT-5 Codex High) is framed as more thorough for risky refactors and deep bug fixes. The practical takeaway: use both, understand what each does well, and keep improving as a software engineer rather than relying on “vibe coding.” The transcript ends with a hands-on project—optimizing time to first token via preloading based on typing cadence—built using Sonnet 4.5 and Claude’s coding tools to prove the concept.

Cornell Notes

Sonnet 4.5 is presented as a major coding upgrade: it’s described as five times cheaper than Opus 4.1 while outperforming it on many benchmarks, including SWEBench verified results. Cloth Code 2.0 adds agent capabilities like checkpoints, context editing, and a revamped terminal experience, aimed at making long-running coding sessions more reliable. A key new concept is “context anxiety,” where the model’s awareness of remaining context can cause it to rush when the window is nearly full; context editing mitigates this by compressing older, less relevant history. The transcript also highlights browser-use automation via an official Claude extension, plus an agent SDK built around a context→action→verification loop. The overall message: combine Claude’s agent tooling with disciplined software engineering and verification to move from demos to dependable software.

Why does Sonnet 4.5’s cost/performance claim matter for building software with AI agents?

The transcript ties Sonnet 4.5’s value to both economics and capability: it’s described as “five times cheaper than Opus 4.1” while beating Opus 4.1 on nearly every benchmark. That combination is important because agent workflows (tool use, browser automation, iterative coding) can consume many calls; lower per-call cost makes it more feasible to run longer sessions and more verification steps without hitting budget limits as quickly. The transcript also cites SWEBench verified software engineering performance as a concrete benchmark where Sonnet 4.5 is said to be best-in-class both with reasoning disabled and enabled.

What is “context anxiety,” and how does context editing reduce its impact?

“Context anxiety” is described as a tendency to rush tasks as the model approaches its context window limit. Since Sonnet 4.5 is framed as aware of how many tokens it has used and how much is left, it may summarize progress to save tokens—but that can also lead to shortcuts if it thinks only ~15% of the context remains. Context editing is then explained as a compaction step: older tool outputs and early conversation content are compressed into essentials, while the most recent messages remain fully available, increasing usable context for the current task.

How does Claude Imagine differ from typical “vibe coding” workflows?

Claude Imagine is described as generating an app procedurally using pre-built components, so the interface appears as the user explores rather than waiting for a large block of code to be written at once. The transcript frames it like a mini operating system: components are added one by one, and the user can interact with what’s already generated while the rest continues to build. The tradeoff noted is reproducibility—because it’s generating tokens for everything, the output may not translate into stable, deterministic software you can easily version and maintain.

What makes the official Claude browser extension a step toward real automation?

The transcript’s demo emphasizes agent control of a real browser: Claude requests permission to read page content, takes screenshots, then uses the HTML DOM to find exact buttons and click them. In the Gmail example, it filters messages matching a criterion (“declined Google Meet calls”), selects relevant items, and archives them—while still requiring clear instructions and sometimes stopping when the user intervenes (e.g., “maybe not all”). The pilot limitation (1,000 users on Max) is also highlighted, implying the feature is early but designed for practical workflows like inbox triage.

What does the agent SDK’s feedback loop add beyond “generate code” alone?

The transcript describes a four-step loop: gather context, take action with tools, verify the work, and repeat. Verification is the key differentiator for production reliability: after code changes, the agent can run tests; after UI actions, it can take screenshots and inspect the DOM; after database writes, it can perform read queries to confirm the expected row exists. It also stresses tool discipline—use purpose-built tools for the task rather than unrelated tools that distract the agent.

Why does the transcript recommend using both Cloud Code and Codex instead of choosing one?

Cloud Code is portrayed as faster and more pleasant for targeted, safe changes—especially quick UI tweaks—because it iterates quickly and “does the dumbest thing” quickly. Codex (especially Codex High) is portrayed as more thorough and less error-prone for risky work like deep refactors or hard bugs, but sometimes it overthinks and takes longer. The suggested workflow is to use each where it fits: Cloud Code for fast iteration and Codex for higher-stakes correctness, then cross-check plans by having one model evaluate the other’s plan.

Review Questions

How does awareness of the context window lead to “context anxiety,” and what mechanism in Cloth Code 2.0 is meant to counter it?
In the agent SDK loop (context→action→verification→repeat), which verification methods are mentioned for code, UI, and databases?
What criteria does the transcript use to decide when to prefer Cloud Code versus Codex High for a given programming task?

Key Points

1
Sonnet 4.5 is framed as a major coding upgrade: five times cheaper than Opus 4.1 while reportedly outperforming it on many benchmarks, including SWEBench verified results.
2
Cloth Code 2.0 adds checkpoints, context editing, and a revamped terminal experience to make agent sessions more controllable and less prone to context loss.
3
“Context anxiety” describes rushing behavior near the context limit; context editing mitigates it by compressing older, less relevant history into essentials.
4
The official Claude browser extension demonstrates agent control via DOM inspection and click automation, with Gmail-style inbox actions as a practical use case.
5
The agent SDK emphasizes a production-grade loop: gather context, take tool-based actions, verify outcomes (tests/screenshots/DB reads), then iterate.
6
A recommended workflow uses both Cloud Code and Codex: Cloud Code for fast, safe iteration and UI work; Codex High for risky refactors and deep bug fixes.
7
The transcript argues that long-term leverage comes from improving software engineering fundamentals (terminal, git, architecture) rather than relying solely on AI “vibe coding.”

Highlights

Sonnet 4.5 is positioned as both cheaper and stronger than Opus 4.1, with SWEBench verified performance cited as a standout proof point.

Context editing is presented as a direct countermeasure to “context anxiety,” compressing older history so the model can focus on the latest task details.

The browser-use extension is demonstrated taking control of a real inbox: reading the page, locating buttons through the DOM, and executing multi-step actions.

The agent SDK’s verification step is treated as the difference between demos and deployable automation.

The transcript’s practical coding workflow pairs Cloud Code’s speed with Codex High’s thoroughness, depending on risk level.

Topics

Claude Sonnet 4.5
Cloth Code 2.0
Context Editing
Browser Use Extension
Agent SDK Feedback Loop

Mentioned

David Ondrej
David Andre
SWEBench
GPT-5
DOM
API
UI
SDK
VPM