Build Anything With HYBRID AI AGENTS: Here`s How

TL;DR

Use a browser agent to extract content from sites without APIs, then save the results (e.g., markdown or JSON) for later processing.

Briefing Cornell Notes

Briefing

Hybrid AI agents are emerging as a practical workaround for data sources that don’t offer APIs: combine a browser automation agent to extract information from websites, then feed that collected text or structured data into an LLM to generate code or analysis. The workflow demonstrated here uses “browser use” (an open-source browser agent backed by Y Combinator) to navigate documentation pages, capture relevant content, and save it for downstream use—turning static web pages into usable context for building real applications.

The first build starts with Anthropic’s API documentation. Instead of relying on an API or manual copy-paste, the agent opens the Anthropic docs site, actively scrolls to load the needed sections, and returns the relevant Python guidance as text. That output is saved to a markdown file (cla_docs.md). Next, a second step uses Gemini (via a Gemini API key) to generate a Python chatbot script that calls the Anthropic API—targeting a Claude 3.5 Sonnet model. After cleaning up formatting issues (like stray backticks), the resulting chatbot is run and successfully handles a test request, confirming the pipeline: browser agent → documentation capture → LLM-assisted code generation → working API client.

A second example shows the same hybrid pattern applied to price tracking. The agent visits PriceCharting for “Zelda: Breath of the Wild Master Edition,” extracts the “new” price, and stores fields like title, URL, and price in a JSON file. When the initial attempt can’t reliably identify the “new price,” the workflow adapts by clicking into “new sold listings” and pulling a set of recent sales (e.g., the last eight). The agent then extracts sale dates, titles, and prices—producing a small dataset that can be analyzed later.

That dataset becomes input for a separate reporting step using Gemini again. A Python script loads the collected sales data and generates a markdown report (e.g., Zelda report) summarizing price trends and key insights such as variability and downward movement, plus observations about market behavior (including bundle-related effects). The result is a lightweight “agentic ETL” loop: scrape and structure data from websites without APIs, then use an LLM to turn it into code and human-readable analysis.

The transcript also gestures at broader use cases: logging into sites via browser automation (including authenticated flows like LinkedIn), collecting content from platforms like Reddit, and producing quick analyses. With Nvidia’s GTC 2025 approaching, the creator frames this as an early look at where agent stacks are heading—toward more common hybrid systems that stitch together browser agents, LLMs, and conventional Python tooling to automate real-world research and reporting.

Cornell Notes

Hybrid AI agents combine browser automation with LLM reasoning to extract information from websites that lack APIs, then use that extracted content to generate code or analysis. The workflow demonstrated first scrapes Anthropic’s API documentation using a browser agent, saves the results to a markdown file, and then uses Gemini to generate a Python chatbot that calls the Anthropic API (Claude 3.5 Sonnet). After fixing minor formatting issues, the chatbot runs successfully. A second workflow scrapes PriceCharting for Zelda “Master Edition” pricing and recent sold listings, stores structured results in JSON, and then uses Gemini to produce a markdown report on price trends and variability. The approach turns “web pages” into reusable datasets and development inputs.

How does the system handle websites that don’t provide an API?

It uses a browser agent to navigate and extract the needed content directly from the site. In the Anthropic example, the agent opens the documentation page, scrolls to load sections, and returns the relevant Python API setup details as text, which is saved to cla_docs.md. In the PriceCharting example, the agent searches for the game page and—when the “new price” isn’t reliably detected—clicks into “new sold listings” to extract sale dates and prices. The extracted outputs then feed into downstream LLM steps.

What’s the end-to-end pipeline for generating a working chatbot from documentation?

First, the browser agent collects Anthropic API documentation and saves it as markdown (cla_docs.md). Second, Gemini is used to generate Python code for a chatbot that calls the Anthropic API, using the collected documentation as context. The script is then run (with an Anthropic API key provided), and a test prompt confirms the chatbot can make an API request. Minor formatting cleanup (removing backticks) is needed before execution.

Why did the PriceCharting workflow switch from “new price” to “new sold listings”?

The initial extraction attempt didn’t identify the “new price” reliably. The workaround was to click into “new sold listings,” then extract a sequence of recent sales (the transcript mentions selecting the last eight). That change produced a usable dataset including sale date, title, and price, which the later reporting step could analyze.

How is the scraped pricing data turned into a report?

A separate Python script loads the stored JSON dataset and uses Gemini to generate a markdown report. The report highlights trends and insights such as price variability and directional movement, and it includes a conclusion based on the observed sales history. The output is saved as a markdown file (e.g., Zelda report).

What makes these examples “hybrid” rather than purely browser automation?

Browser automation handles retrieval and extraction from web pages, but LLMs handle interpretation and generation afterward. The browser agent gathers raw documentation or structured sales records; Gemini then generates code (for the chatbot) or narrative analysis (for the price report). Conventional Python scripts glue the steps together by running the agent, saving outputs, and formatting inputs/outputs.

Review Questions

In the Anthropic documentation workflow, what are the two distinct roles played by the browser agent and by Gemini?
What specific adaptation improved extraction reliability on PriceCharting, and what fields were ultimately captured for analysis?
How does the final markdown report generation depend on the structure of the JSON data collected by the browser agent?

Key Points

1
Use a browser agent to extract content from sites without APIs, then save the results (e.g., markdown or JSON) for later processing.
2
A two-step pipeline can convert documentation into working code: scrape docs → generate Python client code with an LLM → run and test.
3
When a targeted field (like “new price”) fails to extract, adjust the navigation path (e.g., switch to “new sold listings”) and re-collect from a more reliable page section.
4
Store extracted data in structured formats (markdown for text context, JSON for datasets) so downstream scripts can analyze it consistently.
5
Use an LLM to transform structured sales data into human-readable reports, including trend and variability observations.
6
Hybrid agent stacks are especially useful for authenticated or interactive sites where manual scraping or API access is impractical.

Highlights

The workflow scrapes Anthropic’s API documentation with a browser agent, then uses Gemini to generate a Python chatbot that successfully calls the Anthropic API (Claude 3.5 Sonnet).

Price tracking becomes feasible without an API by navigating PriceCharting pages and extracting sale histories from “new sold listings” when direct price detection fails.

A simple ETL loop emerges: browser extraction → structured storage (JSON/MD) → Gemini-generated markdown report on price trends and variability.

The examples point toward broader agent use cases like authenticated browsing and content collection from platforms such as Reddit.

Topics

Hybrid AI Agents
Browser Automation
API Documentation Scraping
Price Tracking
LLM Code Generation

Mentioned

Browser Use
LangChain
OpenAI
Anthropic
Gemini
Claude
Nvidia
GeForce RTX
API
GPT
LLM
GTC