Manus AI Agent TESTED | First Impression
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Manus can chain multi-step tasks across a live browser session, a sandboxed file system, and code execution to produce publishable outputs.
Briefing
Manus AI Agent proves it can do end-to-end “agentic” workflows in a browser and a sandbox: it logs into X.com, researches a concept, drafts and ultimately posts a tweet, then builds a research-backed presentation with images, and finally turns an uploaded CSV of API pricing links into a report with generated graphs. The most consequential takeaway is not just that it can answer questions—it can operate tools (a live browser session, a file system, and code execution) to produce publishable outputs.
In the first test, Manus is given an explicit goal: log into x.com, research “vibe coding,” and post a tweet. It connects to a containerized/virtualized browser environment and asks for takeover to proceed. After logging into an experimental account without issue, it performs web research across multiple sources, compiles a short factual summary, and drafts tweet text. Manus then presents options for the user to choose from rather than posting immediately. When instructed again to post a selected option, it navigates the X interface, pastes the drafted text into the “What’s happening” box, and successfully posts. The result is functional but not flawless—posting required a workaround after earlier behavior suggested it might avoid direct posting due to built-in safety limits.
The second test shifts from social posting to document generation. Manus is tasked with creating a presentation on “Anthropic MCP servers,” including what they are, how to use them, and why they matter, plus images. It spins up a working directory, pulls documentation and reference material from Anthropic’s pages, and writes a set of research notes. From there, it generates a slide deck in MDX format (“Presentation.mdx”), assembling content sections such as key components, architecture, tool resources, prompts, and a conclusion/Q&A structure. Image handling is the main friction point: it briefly gets stuck searching for images (including an architecture diagram), but the run continues and produces a usable deliverable. The final output is described as “not perfect,” yet coherent and easy to follow, with a clear to-do-like progression during creation.
The final test evaluates file-driven analysis. Manus receives an uploaded CSV containing URLs to API pricing pages across providers (including Google, OpenAI, DeepSeek, and Anthropic). It browses the linked pages, extracts token pricing for multiple models, and writes a combined dataset. Some links are missed—DeepSeek pricing doesn’t get captured during the first pass—and OpenAI pricing encounters a Cloudflare verification barrier, limiting access to the newest figures. Even with those gaps, Manus proceeds by generating a Python script, installing missing plotting libraries in the sandbox (e.g., seaborn), and producing graphs and a report with embedded visuals. The deliverable includes input and output token comparisons across selected models (e.g., Claude 3.5 Sonnet, Claude 3.5 Haiku, Gemini 2.0 Flash, GPT-4 Turbo, and GPT-3.5 Turbo), with the report delivered via a generated URL.
Across three sessions, Manus looks capable of chaining browsing, research, structured writing, and code-based visualization into practical outputs. The performance is still uneven—access restrictions and image/pricing extraction issues show where agent tooling remains brittle—but the overall workflow readiness is strong enough to feel genuinely useful for iterative, real-world tasks.
Cornell Notes
Manus AI Agent was tested on three practical workflows: social posting, presentation creation, and pricing-data analysis. It successfully logs into X.com inside a containerized browser, researches “vibe coding,” drafts tweet options, and—after a second instruction—posts a selected tweet. It then builds an Anthropic MCP servers presentation by collecting documentation, writing research notes, and generating an MDX slide deck with images (though image search can stall). Finally, it ingests an uploaded CSV of pricing URLs, extracts token pricing for multiple models, and generates graphs and a report via sandboxed Python—despite missing at least one provider link and hitting Cloudflare blocks for OpenAI’s latest pricing. The key value: it can execute multi-step tasks, not just generate text.
How did Manus handle the “vibe coding” task from research to posting on X.com?
What did the Anthropic MCP servers presentation workflow look like end-to-end?
What were the biggest obstacles when extracting pricing data from the uploaded CSV?
How did Manus generate graphs and the final pricing report?
What does the test suggest about Manus’s “agent” maturity?
Review Questions
- In the X.com test, what step caused posting to succeed, and what earlier behavior suggested posting might be restricted?
- Which parts of the MCP presentation workflow were most affected by failures or stalls, and how did Manus recover?
- When pricing extraction hit Cloudflare for OpenAI, what downstream steps still worked to produce graphs and a report?
Key Points
- 1
Manus can chain multi-step tasks across a live browser session, a sandboxed file system, and code execution to produce publishable outputs.
- 2
A containerized browser environment enables actions like logging into X.com, navigating pages, and posting content after research and drafting.
- 3
Manus drafted multiple tweet options after researching “vibe coding,” and posting required an additional instruction to proceed with direct publication.
- 4
Presentation generation worked by collecting documentation, writing research notes, and producing an MDX slide deck with images—though image search can loop or stall.
- 5
CSV-driven workflows can extract token pricing from linked pages, but missing links and anti-bot protections (e.g., Cloudflare) can create data gaps.
- 6
Even with partial data, Manus can generate a Python-based analysis pipeline that produces graphs and embeds them into a final report.
- 7
The overall performance looks strong for early access, but reliability issues remain around external site access, media retrieval, and safety constraints.