Get AI summaries of any video or article — Sign up free
Vibecode a CUSTOM Research Agent & Open Sourced it! thumbnail

Vibecode a CUSTOM Research Agent & Open Sourced it!

MattVidPro·
5 min read

Based on MattVidPro's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

The agent combines Bright Data web access with Gemini reasoning to produce source-backed research answers.

Briefing

An open-source “autonomous research agent” pairs a web-search/scraping backend with a Gemini-powered reasoning layer and a Lemon-themed React interface, then saves every claim with clickable source links. The practical punchline: instead of using a generic research chatbot, users can run a customizable agent locally that performs multi-step discovery (including Reddit and X.com) while producing traceable, source-backed answers.

The project is built around two core services. Bright Data supplies the web access—handling search-result extraction and scraping in JSON—while Google’s Gemini models handle the synthesis and reasoning. The agent offers two modes: a fast “basic” mode that performs SER-style search plus synthesis for quick, fact-oriented answers, and a slower “deep discovery” mode that runs advanced multi-step research using “deep search agents,” scrapers, and social search. In both cases, the system automatically stores research outputs for later review and attaches verifiable links to back each claim.

Setup is designed to be approachable even for non-experts, though it still requires standard developer prerequisites: Python 3.10+ for the backend and Node.js with npm for the GUI. Users clone the GitHub repository, run a Python environment setup command, install the React frontend dependencies, and create an env file by copying env.example to env. This env file is where API keys live—specifically a Gemini API key and a Bright Data key. The walkthrough also recommends using Google Anti-gravity as a “portal” for accessing and modifying the codebase, including Gemini model access.

Bright Data configuration is detailed and specific. Users create an API (the walkthrough focuses on SER API for search-engine results), choose a format (the agent currently expects light JSON), and then copy the direct API key into the env file. On the Google side, users generate a Gemini API key via Google AI Studio, with the reminder that free credits eventually require pay-as-you-go billing.

Once running, the interface provides a simple switch between modes and a history area (with one noted gap: past searches may not be viewable yet, even though they are saved). In a live example about whether the U.S. will stop minting the penny, the basic mode returns an executive summary quickly and then expands into a more comprehensive analysis, including timelines, discrepancies across sources, and implications for commerce and consumers. The results include a list of sources such as Wikipedia, USA Today, and usmint.gov, each clickable for verification.

Switching to deep discovery increases both depth and source volume. A Ford reliability question yields a structured reliability profile, mileage and cost framing, and generation-specific breakdowns, with many citations drawn from automotive sites and social sources. The walkthrough emphasizes that key behaviors are adjustable: number of sources gathered, the exact Bright Data API used, how information is distributed, and which LLM model handles reasoning. Outputs land in an outputs folder as saved markdown files (answer.md plus sources), making the agent’s work auditable and reusable.

Overall, the project’s value isn’t just “better research.” It’s control: a locally running, source-grounded research workflow that can be modified, extended, and learned from—turning web research into something users can inspect, tweak, and build upon rather than treat as a black box.

Cornell Notes

The project delivers an open-source, locally running AI research agent that performs web searches and scraping, then uses Gemini to synthesize answers with verifiable citations. It runs in two modes: a fast “basic” mode for SER-style search plus synthesis, and a slower “deep discovery” mode that performs multi-step research using scrapers and social sources like Reddit (and sometimes X.com). Bright Data powers the web access and returns data in JSON (the agent is set up for “light JSON”). The setup requires Python 3.10+ and Node.js/npm for the backend and React GUI, plus API keys stored in an env file. The agent saves outputs automatically, including an answer markdown file and a sources list, making results auditable and customizable.

What makes this research agent different from using a generic AI search/chat tool?

It’s designed for customization and transparency. Bright Data handles web search and scraping, while Gemini produces the reasoning and synthesis. The interface provides two research modes and, crucially, every claim is backed by clickable source links. Results are also saved automatically into an outputs folder (with separate answer.md and sources for basic vs deep discovery), so users can inspect what the agent used rather than relying on an opaque response.

How do the “basic fast mode” and “deep discovery” modes differ in practice?

Basic fast mode performs classic SER search plus synthesis, aiming for quick executive summaries and relatively fewer steps. Deep discovery runs advanced multi-step research using deep search agents, scrapers, and social search. The walkthrough notes deep discovery can take longer (around a minute or more) and returns a much larger set of sources and a more detailed reliability/timeline-style analysis.

What role does Bright Data play, and which Bright Data API is emphasized?

Bright Data is the “engine” for gathering and scraping sources from the web. The walkthrough focuses on creating a SER API instance (SER API) that returns HTML or JSON from search engine results. The agent is configured to accept “light JSON,” and the chosen API name must match what’s set in the env file (e.g., SER API 1).

What does the setup process require before the agent can run locally?

Users need Python 3.10+ for the backend and Node.js plus npm for the Lemon Agent GUI. They clone the GitHub repository, run commands to set up the Python environment and install the React frontend, then create env by copying env.example. The env file stores API keys (Gemini API key and Bright Data API key) and related configuration like the Bright Data API name and data format.

How does the agent handle citations and saved research outputs?

The UI presents sources at the end of each answer, with links that take users directly to the referenced pages (including examples like Wikipedia, USA Today, and usmint.gov). Saved outputs are written to an outputs folder as markdown files—answer.md plus sources—so users can revisit prior research runs. The walkthrough also notes a history-view feature may still be incomplete for viewing past research in the UI, even though saving works.

What kinds of customization are explicitly called out as adjustable?

The walkthrough highlights that users can adjust how many sources are gathered at once, which Bright Data API is called, how information is distributed, and which LLM model performs reasoning (it mentions Gemini 3 flash preview and suggests swapping to Gemini 3 Pro or other prompting strategies). It also notes that the data format could be modified (e.g., using raw HTML or screenshots) if the code and UI are updated accordingly.

Review Questions

  1. What specific components (APIs and model layer) power the agent’s web research and reasoning, and how do they connect?
  2. Compare the expected output structure and citation behavior between basic fast mode and deep discovery mode.
  3. Why is the env file central to running the agent, and what kinds of values must it contain?

Key Points

  1. 1

    The agent combines Bright Data web access with Gemini reasoning to produce source-backed research answers.

  2. 2

    Two modes are built in: a fast SER-search-and-synthesis mode and a slower multi-step deep discovery mode using scrapers and social sources.

  3. 3

    Running locally requires Python 3.10+ plus Node.js/npm for the React GUI, along with cloning the GitHub repo and running backend/frontend setup commands.

  4. 4

    API keys and configuration live in an env file created from env.example, including the Gemini API key and Bright Data key (and the Bright Data API name/data format).

  5. 5

    Bright Data configuration in the walkthrough emphasizes SER API with “light JSON,” which the agent currently expects.

  6. 6

    Outputs are automatically saved to an outputs folder as markdown files (answer.md and sources), and the UI provides clickable citations for verification.

  7. 7

    Key behaviors—source count, which Bright Data API is used, information distribution, and the LLM/prompting approach—are designed to be customizable.

Highlights

Every generated claim is paired with a verifiable, clickable source link, and results are saved automatically for later review.
Deep discovery is built for multi-step research (including social sources like Reddit), not just quick summarization.
Bright Data’s SER API configuration (including “light JSON” and matching API names in env) is a central requirement for the agent to scrape and search effectively.
The project is intentionally designed to be modifiable—users can change models, prompting, and how much source material gets pulled in.

Topics

  • Open-Source Research Agent
  • Local AI Setup
  • Bright Data Scraping
  • Gemini Reasoning
  • API Key Configuration

Mentioned