Get AI summaries of any video or article — Sign up free
Could a Swarm of Autonomous AI Agents be the Ultimate Business Asset? - Stage 1 thumbnail

Could a Swarm of Autonomous AI Agents be the Ultimate Business Asset? - Stage 1

All About AI·
5 min read

Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Stage 1 builds a goal-ready data pipeline by scraping Google results into JSON and converting YouTube videos into Whisper transcripts saved as JSON.

Briefing

A practical “swarm” workflow for business automation is taking shape: a master AI agent sets an overarching goal, then dispatches specialized worker agents to gather data, process it, and return structured outputs that can feed directly into tasks like blog writing and outreach. The core value is not just generating text, but building a repeatable pipeline that collects sources (Google and YouTube), converts them into machine-readable JSON, and then routes that data through role-based agents—summary, evaluation, writing, and outreach—until the final deliverable matches the original business objective.

Stage 1 focuses on building the data “brain” and the backend mechanics that make agent work possible. A custom UI accepts a research topic (example: “GPT Vision use cases”). From there, two data-collection agents run in the background. One agent searches Google via SerpAPI, scrapes the resulting websites, and saves the extracted content into JSON. The other agent searches YouTube, downloads the top results (configurable, shown as three), converts the videos to MP3, and uses Whisper to transcribe the audio. Because transcription is heavy, the transcript is chunked into 10-minute segments before being turned into text and then saved as JSON. Both JSON outputs are uploaded to Azure blob storage (the transcript is stored under a transcript.json naming pattern).

After the run, the JSON structure is inspected to confirm the pipeline worked end-to-end. The Google-scraped news JSON includes article entries with URLs and extracted text, including examples referencing multimodal ChatGPT voice/vision content. The YouTube transcript JSON contains a large block of transcribed words from the scraped video(s). The takeaway is that the system now has “dumb” but usable data files—exactly what a later master agent can download and distribute to worker agents.

The second half of the transcript demonstrates an early, working example of the outreach side of the swarm. Instead of a single agent doing everything, the setup uses two cooperating roles: a leader agent and a data agent. The leader agent coordinates and issues instructions, while only the data agent executes tool functions via ChatGPT function calling. The data agent is tasked with online research and communication, using functions like get organic results, scrape URLs, and save/open files. The leader agent targets contact discovery for marketing companies in Oslo, Norway, using search queries designed to avoid low-quality “top list” pages (an example given is “marketing compass in Oslo site.no”). If marketing companies dry up, the leader agent can pivot to related categories like law firms to keep the pipeline moving.

In a live test, the leader agent assigns the data agent a search query, then the data agent loops through scraped websites, extracts email addresses from each page, and saves them to a text file. The run produces multiple email addresses, demonstrating that the swarm concept can translate into concrete business leads. Future work is aimed at building the “master brain” decision prompts—so the system can automatically decide what agents to run, pass along the right data, and iterate toward an overarching goal like producing a blog post or launching outreach.

Cornell Notes

The project builds a two-stage “swarm” system for business automation. Stage 1 creates a data pipeline: a UI takes a topic, then a Google agent scrapes search results into JSON and a YouTube agent downloads videos, converts them to MP3, transcribes with Whisper (chunked into 10-minute segments), and saves transcripts as JSON in Azure blob storage. This produces structured inputs a later master agent can distribute to specialized workers. A working outreach example uses a leader agent to coordinate and a data agent to execute function calls, scrape websites, and extract/save email addresses for targets in Oslo, Norway. The approach matters because it turns agent output into repeatable, goal-driven workflows rather than one-off text generation.

How does Stage 1 turn a business topic into usable inputs for later agents?

A UI collects a research topic (example: “GPT Vision use cases”). A Google/SerpAPI-based workflow scrapes organic results and saves extracted content into JSON. In parallel, a YouTube workflow searches for videos, downloads the top results (shown as three), converts them to MP3, transcribes audio with Whisper by chunking into 10-minute segments, and saves the transcript as JSON. Both JSON files are uploaded to Azure blob storage, so the next “master brain” can download and distribute the data to worker agents.

What specific steps does the YouTube data agent perform before the transcript becomes JSON?

It searches YouTube for the query, downloads the selected videos, converts each video to an MP3 file, then uses Whisper to transcribe the audio. Because transcripts are large, it chunks the audio into 10-minute segments, converts the resulting text into JSON, and uploads it to Azure blob storage under a transcript.json naming pattern.

What is the division of labor in the outreach swarm example?

Two roles cooperate: a leader agent and a data agent. The leader agent coordinates and issues instructions but does not execute functions. The data agent follows the leader’s instructions and is the only one allowed to execute tool functions via ChatGPT function calling. The data agent uses functions to search, scrape URLs, extract contact information, and save results (email addresses) to a text file.

How does the leader agent improve the quality of outreach targets?

It uses search queries designed to avoid annoying “top lists” of companies. The transcript gives an example query like “marketing compass in Oslo site.no,” which tends to surface more relevant company pages. It also includes a fallback: if marketing companies are no longer found, it can pivot to a related business category such as law firms in Oslo to keep finding email addresses.

What does the outreach test demonstrate about the swarm’s operational loop?

The leader agent assigns a search query (marketing companies in Oslo, Norway). The data agent then iterates through scraped websites: for each site, it extracts email addresses, saves them, and moves to the next link. The run ends with multiple saved email addresses, showing the system can repeatedly scrape, extract, and store contact data rather than producing a single result.

Review Questions

  1. What artifacts (file types and storage locations) does Stage 1 produce, and how do they map to later agent tasks?
  2. Why does the YouTube workflow chunk audio into 10-minute segments before transcription, and what is the output format at the end?
  3. In the outreach setup, what responsibilities belong to the leader agent versus the data agent, and how does function calling enforce that split?

Key Points

  1. 1

    Stage 1 builds a goal-ready data pipeline by scraping Google results into JSON and converting YouTube videos into Whisper transcripts saved as JSON.

  2. 2

    Azure blob storage serves as the shared workspace where both news.json and transcript.json are uploaded for later agent use.

  3. 3

    The YouTube workflow downloads top results, converts to MP3, and transcribes with Whisper using 10-minute chunking to manage transcription workload.

  4. 4

    A master-agent concept is introduced: an overarching goal (e.g., creating a blog post) would drive dispatch to specialized worker agents like summary, evaluation, and writing.

  5. 5

    Outreach is implemented as a two-agent system where a leader agent coordinates while a data agent executes function calls to search, scrape, extract emails, and save them.

  6. 6

    Search query design matters for outreach quality; the leader agent uses targeted queries (e.g., site.no) and can pivot categories when results run out.

Highlights

Stage 1 produces structured JSON inputs by combining SerpAPI-based website scraping with Whisper-based YouTube transcription, then storing everything in Azure blob storage.
Whisper transcription is handled via 10-minute chunking, then converted into JSON so downstream agents can consume it reliably.
The outreach swarm uses a strict leader/data split: the leader issues instructions, while only the data agent runs function calls to scrape and extract emails.
Search strategy is treated as a control lever—queries are chosen to avoid low-quality “top list” pages and improve contact discovery results.

Topics

Mentioned