Claude 4.5 Built Slack in 30 Hours Straight—Here's My Take After Testing

TL;DR

Claude Sonnet 4.5 is positioned as a work-oriented model for Excel and PowerPoint, not just general coding assistance.

Briefing Cornell Notes

Briefing

Claude Sonnet 4.5 is emerging as a practical “work-builder” model—especially for Excel and PowerPoint—backed by a track record of long, careful coding runs (including a reported 30-hour rebuild of Slack). The significance isn’t just raw capability; it’s the model’s apparent preference for taking time, checking its work, and producing outputs that are usable in real office workflows. That matters because the path to more autonomous software development depends less on demos and more on whether models can reliably operate inside the constraints of specific task environments. Early testing described here suggests the model performs well when prompts are specific and well-structured; vague or poorly framed instructions lead to outputs that are hard to build or even unreadable. In other words, the advantage comes from intent: if users can define the work clearly and place the model in a framework that matches the task, the system becomes “superpowered.”

That same theme—agents and models becoming operational at scale—shows up in Walmart’s deployment of a “super agent” spanning 200+ AI tools, with a reported 95% autofix rate on bugs. The takeaway is less about any single agent and more about orchestration: federated agent workflows can be stitched across complex development ecosystems today, delivering measurable value. The message for builders is direct: agents are already functioning inside large enterprises, and scaling them is primarily a matter of workflow design, intent definition, and orchestration rather than waiting for a future breakthrough.

OpenAI’s product moves reinforce the idea that model readiness and distribution are being treated as a combined strategy. ChatGPT Pulse and Sora are framed as new advertising surfaces, signaling a push toward ads as a major business line—potentially extending beyond consumer use into B2B placements. The launch pattern also hints at how OpenAI will roll out future models: assess readiness, then attach the model to a new surface quickly. Sora’s evolution from earlier “not quite ready” timing to a consumer product launch is presented as evidence of that approach.

AWS is also leaning into agent infrastructure with an “agent core” MCP server, positioning it as open-source runtime/gateway/identity/memory plumbing that helps developers build production-ready agents that can securely call external tools and maintain context across sessions. The strategic angle is cloud economics: open-source can help preserve AWS developer mindshare and revenue by making AWS the default place to build.

Microsoft Copilot is moving in the same direction—expanding beyond a single model by enabling Copilot to work with Claude models and supporting a multi-agent enterprise strategy. The underlying concern is distribution: Microsoft wants to keep Copilot as the front door for office productivity, even if that requires integrating competing models. That sets up a new competitive dynamic where customers may negotiate less vendor lock-in as major platforms offer model choice inside familiar interfaces.

Finally, Salesforce’s “Agentforce” push aims to bring natural-language coding into enterprise environments with security governance and compliance controls, connecting agents to Salesforce org data. The pitch is clear—reduce shadow IT and provide a governed “vibe coding” path—but the real test will be adoption: selling to CTOs is easier than persuading product managers, marketers, and CS leaders to switch if the tool doesn’t deliver day-to-day value. Across all these updates, the through-line is operationalization: models and agents are moving from novelty to integrated, governed work systems—if the prompts, workflows, and distribution channels are right.

Cornell Notes

Claude Sonnet 4.5 is highlighted as a work-oriented model that can build and edit Excel files and create PowerPoint, with a coding style that favors longer, careful runs and “check your work” behavior. The practical lesson is that results depend heavily on prompt quality and the fit between the model and the task framework—good intent yields usable artifacts, while vague instructions produce outputs that are hard to build or unreadable. Walmart’s deployment of a super agent across 200+ AI tools (with a reported 95% autofix rate) reinforces that agent orchestration is already delivering value in large enterprises. OpenAI’s ChatGPT Pulse and Sora are framed as new advertising surfaces, while AWS’s agent core MCP server and Microsoft Copilot’s multi-model support point to an emerging market where developers can build and choose models inside major platforms. Salesforce’s Agentforce adds enterprise-governed “vibe coding,” with adoption depending on real usefulness beyond security promises.

Why does Claude Sonnet 4.5’s Excel/PowerPoint strength matter more than another general-purpose coding claim?

The emphasis is on office-work primitives: building and editing Excel and generating PowerPoint. That shifts the model from “code generation” toward producing artifacts people actually use in day-to-day business workflows. The reported Slack rebuild (done in about 30 hours) and the described tendency to take more time to think and verify suggest a reliability pattern—useful when outputs must be correct, not just plausible.

What’s the key constraint for getting strong results from Sonnet 4.5?

Prompting and problem framing. The model performs well when given a good, specific prompt for PowerPoint; it produces difficult-to-build or unreadable results when the prompt is lousy. The underlying idea is that autonomous development only works when the task is architected into a framework that matches the model’s operating style and the user’s intent is clearly defined.

How does Walmart’s “super agent” deployment change the view of AI agents?

It demonstrates agents operating across a complex ecosystem at scale—across 200+ AI tools—with a reported 95% autofix rate on bugs. That signals orchestration is already practical in production environments, and that scaling depends on workflow design, intent definition, and federated agent orchestration rather than waiting for a future agent breakthrough.

What do ChatGPT Pulse and Sora suggest about OpenAI’s go-to-market strategy?

They point to distribution through new advertising surfaces. The framing is that OpenAI is going after ads people, with implications for marketing budgets and product placement. The launch pattern also implies a readiness-and-surface approach: once a model is judged ready, OpenAI attaches it to a new product surface quickly (Pulse as a likely model-specific surface; Sora as its own model after earlier timing that wasn’t ready for consumer launch).

Why is AWS’s agent core MCP server positioned as strategically important?

It’s open-source infrastructure for building production-ready AI agents, including runtime, gateway integration, identity management, memory, and secure tool calling. It integrates with 40+ MCP-aware clients (including Anthropic Cloud Code and Cursor). The strategic angle is cloud revenue preservation: making AWS the place where agents are built can keep developers anchored even as competitors add their own agent tooling.

What competitive shift is implied by Microsoft Copilot working with other models like Claude?

It suggests major enterprise interfaces may start offering model choice inside the same product surface. That reduces the feasibility of strict AI vendor lock-in and increases customer leverage—IT departments can negotiate better terms when multiple models can be selected within Copilot. The distribution logic is that Microsoft wants to keep Copilot as the “next generation of Office,” even if it must integrate competitors’ models to do so.

Review Questions

What specific capabilities of Claude Sonnet 4.5 are emphasized as enabling office-work automation, and why do they matter for autonomous development?
How do prompt quality and task framing influence the reliability of model outputs in the examples given?
Which developments suggest a move away from AI vendor lock-in, and what role does distribution inside enterprise interfaces play?

Key Points

1
Claude Sonnet 4.5 is positioned as a work-oriented model for Excel and PowerPoint, not just general coding assistance.
2
Longer, careful runs and a “check your work” behavior are presented as reliability advantages for building real artifacts.
3
Agent success at scale depends on workflow orchestration and clear intent, not only on model quality—Walmart’s 200+ tool deployment is cited as evidence.
4
OpenAI’s ChatGPT Pulse and Sora are framed as new advertising surfaces, indicating a distribution-first strategy tied to model readiness.
5
AWS’s agent core MCP server aims to make AWS the default infrastructure for production AI agents through open-source tooling and broad client integration.
6
Microsoft Copilot’s ability to work with other models (including Claude) signals a shift toward multi-model enterprise experiences that may weaken vendor lock-in.
7
Salesforce’s Agentforce targets enterprise “vibe coding” with governance, but adoption will hinge on whether it delivers day-to-day value beyond security compliance.

Highlights

Claude Sonnet 4.5 is singled out for building and editing Excel and generating PowerPoint—capabilities tied directly to business workflow primitives.

Walmart’s super agent spans 200+ AI tools and is reported to achieve a 95% autofix rate on bugs, underscoring orchestration at scale.

OpenAI’s Pulse and Sora are treated as advertising surfaces, suggesting model distribution is being engineered as aggressively as model capability.

AWS’s agent core MCP server provides open-source infrastructure (runtime, identity, memory, secure tool calling) integrated with 40+ MCP-aware clients.

Microsoft Copilot integrating Claude models points to a future where enterprise users can choose models inside familiar interfaces, pressuring vendor lock-in.

Topics

Claude Sonnet 4.5
AI Agents Orchestration
OpenAI Pulse and Sora
AWS Agent Core MCP
Microsoft Copilot Multi-Model
Salesforce Agentforce