Build Anything with CrewAI, Here’s How
Based on David Ondrej's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
CrewAI can orchestrate a multi-step lead pipeline: generate search queries, search the web, scrape pages, discover about/contact URLs, and extract contact fields.
Briefing
CrewAI is presented as a fast way to build multi-step AI “agents” that can generate lead-targeting search queries, search the web, scrape relevant pages, and extract contact details—without requiring the viewer to write everything from scratch. The practical payoff is a lead-generation pipeline that can be adapted to any niche, location, and lead volume, turning basic inputs (industry + city + number of leads) into structured contact outputs like email and social profiles.
The build starts in Google Colab, where Python runs in notebook cells. After installing CrewAI and connecting to Anthropic’s Claude models via an API key, the workflow uses Claude 3.5 Sonnet (with notes about model/token settings and cheaper alternatives). Two Claude client instances are created: one “consistent” (temperature set low) for predictable outputs, and one “creative” (higher temperature) for generating varied phrasing. This split matters because the pipeline needs both reliable formatting and diverse search-query ideas.
The first CrewAI agent, dubbed a “variation agent,” takes a niche and location and produces 10 distinct, concise search queries intended to surface business leads. A second step wraps that agent into a Crew, then “kickoffs” execution so the generated queries appear as a clean list. The transcript emphasizes output control—no extra text, no quotation marks, and one query per line—so downstream steps can parse results reliably.
Next, the pipeline adds a web-search agent using Serper (via CrewAI tools). The search agent receives the generated queries and returns a list of websites. The build then highlights a key limitation: asking the model to hit a large lead count in one go can cause early stopping or counting errors, since LLMs struggle with strict numeric targets. The suggested fix is structural: use loops outside the agent (and deduplicate) so the system keeps searching until the desired number of qualifying leads is reached.
For scraping, the workflow installs Firecrawl and uses it to convert websites into LLM-ready markdown. After selecting a target site from the search results, Firecrawl scrapes the homepage content (truncated for readability), then a new CrewAI agent analyzes that homepage to find URLs for “about” and “contact” pages. Those pages are scraped as well, and the combined scraped text is fed into a final Anthropic call with a tight prompt instructing the model to extract email, Twitter, and LinkedIn—outputting “none” when fields aren’t found.
The result is a working end-to-end lead extractor: search → scrape → page discovery → contact extraction. The transcript closes by outlining concrete upgrades for production quality: save results to CSV, add personalized unique fields per business, run search and scraping in loops to reach exact lead counts, avoid counting leads with missing contact info, deduplicate websites, and refactor into functions so the whole process can run in a single cell. The overall message is that CrewAI plus purpose-built tools (Serper, Firecrawl, Anthropic) can assemble a usable lead engine quickly, while code-based loops and validation handle the parts where LLMs are least reliable.
Cornell Notes
CrewAI is used to build an end-to-end lead-generation pipeline from simple inputs: niche, location, and desired lead count. A “variation agent” generates 10 optimized search queries, then a “web search agent” uses Serper to find candidate websites. Firecrawl scrapes the homepage, and another agent locates “about” and “contact” page URLs so the scraper can pull the right content. Finally, an Anthropic Claude call extracts email, Twitter, and LinkedIn from the scraped text, using strict formatting rules and “none” when fields are missing. The approach matters because it turns unstructured web content into structured lead data, while also showing why loops and deduplication in code are needed for accurate counting.
How does the pipeline turn a niche and location into search queries that are actually useful for lead finding?
Why create both “consistent” and “creative” Claude clients in the same project?
What role do Serper and CrewAI tools play after the search-query agent finishes?
Why does the transcript warn that asking for large lead counts inside a single agent run can fail?
How does Firecrawl fit into the scraping and contact-extraction workflow?
What prompt constraints make the final contact extraction more reliable?
Review Questions
- Where in the pipeline are output-format constraints most critical, and what specific constraints are used to keep results parseable?
- What failure mode appears when trying to reach a large lead count in one agent run, and how does the proposed loop-based fix address it?
- How do the “about/contact URL discovery” and “contact field extraction” steps differ in responsibilities and prompting?
Key Points
- 1
CrewAI can orchestrate a multi-step lead pipeline: generate search queries, search the web, scrape pages, discover about/contact URLs, and extract contact fields.
- 2
Splitting Claude usage into “consistent” (temperature 0) and “creative” (higher temperature) helps balance formatting reliability with query diversity.
- 3
Tool-using agents (via CrewAI tools like SerperDev) can retrieve real web results rather than only generating text.
- 4
LLMs are unreliable at strict counting; reaching exact lead targets works better with code-driven loops, deduplication, and validation outside the agent.
- 5
Firecrawl converts websites into LLM-ready markdown, enabling reliable downstream extraction from homepage, about, and contact pages.
- 6
A tight extraction prompt (fixed field order, “none” for missing values, no extra text) improves the chance of structured outputs like email/Twitter/LinkedIn.
- 7
Turning the notebook into functions and CSV outputs is the next step toward making the pipeline reusable for real client delivery.