Could a Swarm of Autonomous AI Agents be the Ultimate Business Asset? - Stage 1
Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Stage 1 builds a goal-ready data pipeline by scraping Google results into JSON and converting YouTube videos into Whisper transcripts saved as JSON.
Briefing
A practical “swarm” workflow for business automation is taking shape: a master AI agent sets an overarching goal, then dispatches specialized worker agents to gather data, process it, and return structured outputs that can feed directly into tasks like blog writing and outreach. The core value is not just generating text, but building a repeatable pipeline that collects sources (Google and YouTube), converts them into machine-readable JSON, and then routes that data through role-based agents—summary, evaluation, writing, and outreach—until the final deliverable matches the original business objective.
Stage 1 focuses on building the data “brain” and the backend mechanics that make agent work possible. A custom UI accepts a research topic (example: “GPT Vision use cases”). From there, two data-collection agents run in the background. One agent searches Google via SerpAPI, scrapes the resulting websites, and saves the extracted content into JSON. The other agent searches YouTube, downloads the top results (configurable, shown as three), converts the videos to MP3, and uses Whisper to transcribe the audio. Because transcription is heavy, the transcript is chunked into 10-minute segments before being turned into text and then saved as JSON. Both JSON outputs are uploaded to Azure blob storage (the transcript is stored under a transcript.json naming pattern).
After the run, the JSON structure is inspected to confirm the pipeline worked end-to-end. The Google-scraped news JSON includes article entries with URLs and extracted text, including examples referencing multimodal ChatGPT voice/vision content. The YouTube transcript JSON contains a large block of transcribed words from the scraped video(s). The takeaway is that the system now has “dumb” but usable data files—exactly what a later master agent can download and distribute to worker agents.
The second half of the transcript demonstrates an early, working example of the outreach side of the swarm. Instead of a single agent doing everything, the setup uses two cooperating roles: a leader agent and a data agent. The leader agent coordinates and issues instructions, while only the data agent executes tool functions via ChatGPT function calling. The data agent is tasked with online research and communication, using functions like get organic results, scrape URLs, and save/open files. The leader agent targets contact discovery for marketing companies in Oslo, Norway, using search queries designed to avoid low-quality “top list” pages (an example given is “marketing compass in Oslo site.no”). If marketing companies dry up, the leader agent can pivot to related categories like law firms to keep the pipeline moving.
In a live test, the leader agent assigns the data agent a search query, then the data agent loops through scraped websites, extracts email addresses from each page, and saves them to a text file. The run produces multiple email addresses, demonstrating that the swarm concept can translate into concrete business leads. Future work is aimed at building the “master brain” decision prompts—so the system can automatically decide what agents to run, pass along the right data, and iterate toward an overarching goal like producing a blog post or launching outreach.
Cornell Notes
The project builds a two-stage “swarm” system for business automation. Stage 1 creates a data pipeline: a UI takes a topic, then a Google agent scrapes search results into JSON and a YouTube agent downloads videos, converts them to MP3, transcribes with Whisper (chunked into 10-minute segments), and saves transcripts as JSON in Azure blob storage. This produces structured inputs a later master agent can distribute to specialized workers. A working outreach example uses a leader agent to coordinate and a data agent to execute function calls, scrape websites, and extract/save email addresses for targets in Oslo, Norway. The approach matters because it turns agent output into repeatable, goal-driven workflows rather than one-off text generation.
How does Stage 1 turn a business topic into usable inputs for later agents?
What specific steps does the YouTube data agent perform before the transcript becomes JSON?
What is the division of labor in the outreach swarm example?
How does the leader agent improve the quality of outreach targets?
What does the outreach test demonstrate about the swarm’s operational loop?
Review Questions
- What artifacts (file types and storage locations) does Stage 1 produce, and how do they map to later agent tasks?
- Why does the YouTube workflow chunk audio into 10-minute segments before transcription, and what is the output format at the end?
- In the outreach setup, what responsibilities belong to the leader agent versus the data agent, and how does function calling enforce that split?
Key Points
- 1
Stage 1 builds a goal-ready data pipeline by scraping Google results into JSON and converting YouTube videos into Whisper transcripts saved as JSON.
- 2
Azure blob storage serves as the shared workspace where both news.json and transcript.json are uploaded for later agent use.
- 3
The YouTube workflow downloads top results, converts to MP3, and transcribes with Whisper using 10-minute chunking to manage transcription workload.
- 4
A master-agent concept is introduced: an overarching goal (e.g., creating a blog post) would drive dispatch to specialized worker agents like summary, evaluation, and writing.
- 5
Outreach is implemented as a two-agent system where a leader agent coordinates while a data agent executes function calls to search, scrape, extract emails, and save them.
- 6
Search query design matters for outreach quality; the leader agent uses targeted queries (e.g., site.no) and can pivot categories when results run out.