Project Mariner (Google AI Agent) - First 5 Tests and Impression
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Project Mariner successfully completed a YouTube search and returned a concrete view count of 25,000 for “Google Flow all about AI.”
Briefing
Google’s Project Mariner browser agent delivers a mixed but promising first impression: it can reliably navigate the web, complete straightforward search-and-find tasks, and even execute simple code via an online Python runner—while running into hard limits around actions like sending emails and some interactive chat flows.
In the first test, Mariner successfully searched YouTube for the creator’s “Google Flow all about AI” video and returned a concrete metric: 25,000 views. The workflow looked like a typical agent loop—prepare a session, browse to the relevant site, run the query, then confirm completion—ending with an explicit “task complete” status and the view count displayed.
The second test targeted Gmail automation. Mariner could gather information about the latest Claude Anthropic live stream events from the web, but it stalled when asked to send an email. Even after the user manually logged into Gmail via takeover, the agent refused to complete the send step, producing a “cannot do that” style blocker. The result was a partial win: it could collect the live-stream details, but it couldn’t perform the final outbound action.
The third test focused on DeepMind’s diffusion model page and joining a weight list. Mariner found the correct page, clicked through to the “join the weight list” area, and handled cookie acceptance. It then reached a sign-in-gated form and successfully updated a field—changing a “profession” value to “engineer.” That demonstrated the agent’s ability to interact with multi-step web forms, though the overall experience was described as “far from perfect,” suggesting friction or brittleness in real-world flows.
The fourth test was the most impressive: Mariner was asked to find a way to test a small Python snippet online. It identified a site (v3schools) as a place to run code, initially crashed, then recovered after retry instructions. On the next attempt, it ran the code and produced the expected output: “The sum of seven and five is 12.” The agent required additional guidance (like excluding comments), but it still managed to locate an execution environment and complete the task end-to-end.
The final test attempted to converse with ChatGPT about the future of software engineers. Mariner struggled with the interaction layer: it navigated toward ChatGPT-related pages, but responses didn’t load, and an “internal error has occurred” message appeared when trying to run the prompt. The user ultimately abandoned the chat portion.
Overall, Mariner’s early strengths center on web navigation, search, and form completion, plus the ability to execute simple code through external tools. Its early weaknesses show up when the task requires privileged actions (like sending emails) or reliable interactive chat behavior. The takeaway is clear: the agent can be useful for research and browsing tasks, but it still needs guardrails and better reliability for action-heavy and conversational workflows.
Cornell Notes
Project Mariner performed well on web navigation tasks and delivered a clear win on code execution. It found a YouTube video and reported 25,000 views, then located DeepMind’s diffusion model weight-list page, accepted cookies, and updated a sign-in form field to “engineer.” The agent also searched for an online Python runner, recovered after a crash, and successfully executed a simple script to produce the result 12. Email sending failed due to an apparent action blocker, and the attempt to hold a live conversation with ChatGPT ran into loading/internal errors. The pattern suggests strong browsing and form skills, with limitations on privileged actions and interactive chat reliability.
What was the most reliable early capability demonstrated by Project Mariner?
Why did the Gmail test end without a successful email being sent?
How did Mariner handle the DeepMind weight-list signup flow?
What made the Python code execution test stand out?
What went wrong when trying to converse with ChatGPT?
Review Questions
- Which tasks did Mariner complete end-to-end successfully, and which steps failed due to blockers or errors?
- What evidence suggests Mariner can recover from failures during web-based code execution?
- How do the Gmail and ChatGPT failures differ in nature (permissions vs. interaction reliability)?
Key Points
- 1
Project Mariner successfully completed a YouTube search and returned a concrete view count of 25,000 for “Google Flow all about AI.”
- 2
It could gather information about Claude Anthropic live streams but failed to send emails due to an action/permission blocker.
- 3
It navigated to DeepMind’s diffusion model weight-list page, accepted cookies, and updated a sign-in form field to “engineer.”
- 4
Mariner found an online Python execution site (v3schools), recovered after a crash, and executed a simple script to produce 12.
- 5
Interactive chat with ChatGPT was unreliable, with prompts failing to load and an “internal error has occurred” message appearing.
- 6
Early performance is strongest for browsing, searching, and form interaction, while action-heavy and conversational tasks need improvement.