Build Anything With HYBRID AI AGENTS: Here`s How
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use a browser agent to extract content from sites without APIs, then save the results (e.g., markdown or JSON) for later processing.
Briefing
Hybrid AI agents are emerging as a practical workaround for data sources that don’t offer APIs: combine a browser automation agent to extract information from websites, then feed that collected text or structured data into an LLM to generate code or analysis. The workflow demonstrated here uses “browser use” (an open-source browser agent backed by Y Combinator) to navigate documentation pages, capture relevant content, and save it for downstream use—turning static web pages into usable context for building real applications.
The first build starts with Anthropic’s API documentation. Instead of relying on an API or manual copy-paste, the agent opens the Anthropic docs site, actively scrolls to load the needed sections, and returns the relevant Python guidance as text. That output is saved to a markdown file (cla_docs.md). Next, a second step uses Gemini (via a Gemini API key) to generate a Python chatbot script that calls the Anthropic API—targeting a Claude 3.5 Sonnet model. After cleaning up formatting issues (like stray backticks), the resulting chatbot is run and successfully handles a test request, confirming the pipeline: browser agent → documentation capture → LLM-assisted code generation → working API client.
A second example shows the same hybrid pattern applied to price tracking. The agent visits PriceCharting for “Zelda: Breath of the Wild Master Edition,” extracts the “new” price, and stores fields like title, URL, and price in a JSON file. When the initial attempt can’t reliably identify the “new price,” the workflow adapts by clicking into “new sold listings” and pulling a set of recent sales (e.g., the last eight). The agent then extracts sale dates, titles, and prices—producing a small dataset that can be analyzed later.
That dataset becomes input for a separate reporting step using Gemini again. A Python script loads the collected sales data and generates a markdown report (e.g., Zelda report) summarizing price trends and key insights such as variability and downward movement, plus observations about market behavior (including bundle-related effects). The result is a lightweight “agentic ETL” loop: scrape and structure data from websites without APIs, then use an LLM to turn it into code and human-readable analysis.
The transcript also gestures at broader use cases: logging into sites via browser automation (including authenticated flows like LinkedIn), collecting content from platforms like Reddit, and producing quick analyses. With Nvidia’s GTC 2025 approaching, the creator frames this as an early look at where agent stacks are heading—toward more common hybrid systems that stitch together browser agents, LLMs, and conventional Python tooling to automate real-world research and reporting.
Cornell Notes
Hybrid AI agents combine browser automation with LLM reasoning to extract information from websites that lack APIs, then use that extracted content to generate code or analysis. The workflow demonstrated first scrapes Anthropic’s API documentation using a browser agent, saves the results to a markdown file, and then uses Gemini to generate a Python chatbot that calls the Anthropic API (Claude 3.5 Sonnet). After fixing minor formatting issues, the chatbot runs successfully. A second workflow scrapes PriceCharting for Zelda “Master Edition” pricing and recent sold listings, stores structured results in JSON, and then uses Gemini to produce a markdown report on price trends and variability. The approach turns “web pages” into reusable datasets and development inputs.
How does the system handle websites that don’t provide an API?
What’s the end-to-end pipeline for generating a working chatbot from documentation?
Why did the PriceCharting workflow switch from “new price” to “new sold listings”?
How is the scraped pricing data turned into a report?
What makes these examples “hybrid” rather than purely browser automation?
Review Questions
- In the Anthropic documentation workflow, what are the two distinct roles played by the browser agent and by Gemini?
- What specific adaptation improved extraction reliability on PriceCharting, and what fields were ultimately captured for analysis?
- How does the final markdown report generation depend on the structure of the JSON data collected by the browser agent?
Key Points
- 1
Use a browser agent to extract content from sites without APIs, then save the results (e.g., markdown or JSON) for later processing.
- 2
A two-step pipeline can convert documentation into working code: scrape docs → generate Python client code with an LLM → run and test.
- 3
When a targeted field (like “new price”) fails to extract, adjust the navigation path (e.g., switch to “new sold listings”) and re-collect from a more reliable page section.
- 4
Store extracted data in structured formats (markdown for text context, JSON for datasets) so downstream scripts can analyze it consistently.
- 5
Use an LLM to transform structured sales data into human-readable reports, including trend and variability observations.
- 6
Hybrid agent stacks are especially useful for authenticated or interactive sites where manual scraping or API access is impractical.