Manus AI Agent TESTED | First Impression

TL;DR

Manus can chain multi-step tasks across a live browser session, a sandboxed file system, and code execution to produce publishable outputs.

Briefing Cornell Notes

Briefing

Manus AI Agent proves it can do end-to-end “agentic” workflows in a browser and a sandbox: it logs into X.com, researches a concept, drafts and ultimately posts a tweet, then builds a research-backed presentation with images, and finally turns an uploaded CSV of API pricing links into a report with generated graphs. The most consequential takeaway is not just that it can answer questions—it can operate tools (a live browser session, a file system, and code execution) to produce publishable outputs.

In the first test, Manus is given an explicit goal: log into x.com, research “vibe coding,” and post a tweet. It connects to a containerized/virtualized browser environment and asks for takeover to proceed. After logging into an experimental account without issue, it performs web research across multiple sources, compiles a short factual summary, and drafts tweet text. Manus then presents options for the user to choose from rather than posting immediately. When instructed again to post a selected option, it navigates the X interface, pastes the drafted text into the “What’s happening” box, and successfully posts. The result is functional but not flawless—posting required a workaround after earlier behavior suggested it might avoid direct posting due to built-in safety limits.

The second test shifts from social posting to document generation. Manus is tasked with creating a presentation on “Anthropic MCP servers,” including what they are, how to use them, and why they matter, plus images. It spins up a working directory, pulls documentation and reference material from Anthropic’s pages, and writes a set of research notes. From there, it generates a slide deck in MDX format (“Presentation.mdx”), assembling content sections such as key components, architecture, tool resources, prompts, and a conclusion/Q&A structure. Image handling is the main friction point: it briefly gets stuck searching for images (including an architecture diagram), but the run continues and produces a usable deliverable. The final output is described as “not perfect,” yet coherent and easy to follow, with a clear to-do-like progression during creation.

The final test evaluates file-driven analysis. Manus receives an uploaded CSV containing URLs to API pricing pages across providers (including Google, OpenAI, DeepSeek, and Anthropic). It browses the linked pages, extracts token pricing for multiple models, and writes a combined dataset. Some links are missed—DeepSeek pricing doesn’t get captured during the first pass—and OpenAI pricing encounters a Cloudflare verification barrier, limiting access to the newest figures. Even with those gaps, Manus proceeds by generating a Python script, installing missing plotting libraries in the sandbox (e.g., seaborn), and producing graphs and a report with embedded visuals. The deliverable includes input and output token comparisons across selected models (e.g., Claude 3.5 Sonnet, Claude 3.5 Haiku, Gemini 2.0 Flash, GPT-4 Turbo, and GPT-3.5 Turbo), with the report delivered via a generated URL.

Across three sessions, Manus looks capable of chaining browsing, research, structured writing, and code-based visualization into practical outputs. The performance is still uneven—access restrictions and image/pricing extraction issues show where agent tooling remains brittle—but the overall workflow readiness is strong enough to feel genuinely useful for iterative, real-world tasks.

Cornell Notes

Manus AI Agent was tested on three practical workflows: social posting, presentation creation, and pricing-data analysis. It successfully logs into X.com inside a containerized browser, researches “vibe coding,” drafts tweet options, and—after a second instruction—posts a selected tweet. It then builds an Anthropic MCP servers presentation by collecting documentation, writing research notes, and generating an MDX slide deck with images (though image search can stall). Finally, it ingests an uploaded CSV of pricing URLs, extracts token pricing for multiple models, and generates graphs and a report via sandboxed Python—despite missing at least one provider link and hitting Cloudflare blocks for OpenAI’s latest pricing. The key value: it can execute multi-step tasks, not just generate text.

How did Manus handle the “vibe coding” task from research to posting on X.com?

Manus opened a containerized browser session and requested takeover to proceed. After logging into an experimental X account, it researched “vibe coding” by visiting sources in a live browsing flow, then compiled factual notes (including attribution to Andre Karpathy in early 2025). It drafted tweet text and presented multiple options for selection rather than posting immediately. When instructed to post a chosen option, it navigated back to the X “What’s happening” composer, pasted the drafted tweet, and posted it successfully—though the tester noted it wasn’t “super effective” on the first attempt and seemed to avoid direct posting unless explicitly pushed again.

What did the Anthropic MCP servers presentation workflow look like end-to-end?

Manus created a working directory (named for the MCP presentation), then performed research by visiting Anthropic documentation pages related to Model Context Protocol (MCP). It produced a large research-notes file, generated an outline, and moved into slide creation. It attempted to gather images (including an MCP architecture diagram) and stored them in an images directory, but it briefly got stuck searching for images. The run continued and generated “Presentation.mdx” in MDX format, embedding images/logos from the saved directories and producing sections like “What are servers,” “Key components,” “Architecture,” “Tool resources,” “Prompts,” and a conclusion/Q&A structure.

What were the biggest obstacles when extracting pricing data from the uploaded CSV?

Manus browsed the pricing URLs listed in the CSV and extracted token pricing into a combined dataset, but it missed at least one DeepSeek entry during the run. OpenAI pricing also ran into a Cloudflare verification/capture issue (“application error” / client-side verification), preventing access to the newest figures. Despite these gaps, it still gathered pricing for other providers (including Anthropic/Claude and Gemini) and proceeded to generate a report using the partial dataset.

How did Manus generate graphs and the final pricing report?

After collecting pricing data, Manus generated and executed a Python script inside the sandbox to produce visualizations. When required plotting libraries weren’t available, it installed dependencies (the run mentions installing seaborn). It then created an MDX report that embedded generated images using HTML, and delivered the result via a link for viewing. The graphs compared input and output token costs across selected models (e.g., Claude 3.5 Sonnet/Haiku, Gemini 2.0 Flash, GPT-4 Turbo, GPT-3.5 Turbo).

What does the test suggest about Manus’s “agent” maturity?

The workflow chaining is strong: browser automation, research synthesis, structured document generation, and code-driven charting all worked in sequence. However, reliability depends on external constraints—social posting may be gated by safety behavior, image retrieval can stall, and pricing extraction can fail due to missing links or anti-bot protections like Cloudflare. The overall impression is promising but still early, with incremental improvements needed for robustness.

Review Questions

In the X.com test, what step caused posting to succeed, and what earlier behavior suggested posting might be restricted?
Which parts of the MCP presentation workflow were most affected by failures or stalls, and how did Manus recover?
When pricing extraction hit Cloudflare for OpenAI, what downstream steps still worked to produce graphs and a report?

Key Points

1
Manus can chain multi-step tasks across a live browser session, a sandboxed file system, and code execution to produce publishable outputs.
2
A containerized browser environment enables actions like logging into X.com, navigating pages, and posting content after research and drafting.
3
Manus drafted multiple tweet options after researching “vibe coding,” and posting required an additional instruction to proceed with direct publication.
4
Presentation generation worked by collecting documentation, writing research notes, and producing an MDX slide deck with images—though image search can loop or stall.
5
CSV-driven workflows can extract token pricing from linked pages, but missing links and anti-bot protections (e.g., Cloudflare) can create data gaps.
6
Even with partial data, Manus can generate a Python-based analysis pipeline that produces graphs and embeds them into a final report.
7
The overall performance looks strong for early access, but reliability issues remain around external site access, media retrieval, and safety constraints.

Highlights

Manus researched “vibe coding,” drafted tweet options, and—after a second push—successfully posted the selected tweet on X.com from within an automated browser session.

The Anthropic MCP servers presentation was generated as an MDX deliverable (“Presentation.mdx”), built from documentation research plus an image directory workflow.

Pricing analysis turned an uploaded CSV of provider links into a combined dataset and generated input/output token comparison graphs via sandboxed Python.

External restrictions mattered: DeepSeek pricing was missed, and OpenAI pricing was blocked by Cloudflare verification, forcing the report to rely on partial data.

Topics

AI Agents
Browser Automation
Social Posting
MCP Servers
Token Pricing Graphs