SQL AI Agents: Analyze Relational Databases with Natural Language using Llama 3 (LLM) and CrewAI

TL;DR

Use a multi-agent pipeline where one agent generates/validates SQL, another interprets results, and a third produces an executive summary.

Briefing Cornell Notes

Briefing

AI agents can turn natural-language questions into SQL queries, pull results from a relational database, and then generate a readable analysis and executive summary—using Llama 3 (via Groq) plus CrewAI. The core payoff is practical: companies already store valuable information in SQL tables, so an agent team can query that “private” data directly instead of relying on static documents or manual dashboards.

The workflow starts with a database developer agent that’s equipped with SQL-specific tools. Those tools let the agent (1) list available tables, (2) fetch table schemas and sample rows, (3) run arbitrary SQL queries against the database, and (4) repair malformed SQL using an LLM-powered query checker. Because the SQL execution tool can be risky in production, the setup emphasizes caution—especially around limiting query types to avoid harmful operations.

Once the database developer produces a correct SQL query and retrieves data, the results feed a data analyst agent. The analyst’s job is to interpret the returned rows in the context of the user’s question and write a longer “report” (the transcript notes a markdown report that can be quite detailed). A third agent—the senior report editor—then compresses that analysis into a short executive summary (kept under 100 words in the example), focusing on the most important findings.

Implementation-wise, the demo uses a Google Colab notebook. It installs LangChain components (including Groq integration) and CrewAI, sets a Groq API key, and loads a CSV dataset from Hugging Face (“demos series”). The CSV contains 376 salary examples across fields like experience level, employment type, job title, company location, remote ratio, and salary in USD. The data is converted into a SQLite database using SQLAlchemy, creating a table named salaries.

For the LLM layer, the setup initializes a Groq chat model using Llama 3 with a 1.370 billion parameter configuration (as written in the transcript). A callback handler is also configured to capture start/end events for visibility into what’s happening during agent runs.

The crew runs sequentially (no memory) and answers two example questions. For “effects on salary in USD based on company location size and employee experience,” the database developer generates a query that computes averages and groups results by relevant dimensions. The analyst then produces a detailed report concluding that the United States tends to have the highest average salaries, large companies and executive experience correlate with higher pay, and entry-level roles and small companies trend lower. For “how is the machine learning engineer salary in USD affected by remote positions,” the system generates a simpler query filtered to job title = machine learning engineer, then compares salary in USD across remote vs non-remote categories. The analyst reports that remote roles show higher average salaries and mentions a moderate positive correlation.

Overall, the approach demonstrates a repeatable pattern: natural-language → SQL generation/validation → database retrieval → analysis → executive summary, all orchestrated by a small, purpose-built CrewAI team tied directly to relational data.

Cornell Notes

A CrewAI team can answer natural-language questions by querying a relational SQL database and then producing both an analysis report and an executive summary. The database developer agent uses LangChain-wrapped SQL tools to list tables, inspect schemas, run SQL, and fix invalid queries with an LLM-based query checker. Retrieved results feed a data analyst agent, which writes a detailed markdown report tied to the user question. A senior report editor then condenses that report into a short summary (under 100 words in the demo). This matters because it connects LLM reasoning directly to private, structured data stored in SQL rather than relying on prewritten text or manual analysis.

How does the system translate a user’s natural-language question into a working SQL query?

A “database developer” agent is given tool access: one tool lists available tables, another returns table schemas (including creation SQL) and sample rows, a third executes SQL queries, and a fourth uses an LLM-powered query checker to rewrite invalid SQL into valid SQL. The agent uses schema metadata and sample rows to form the query, then runs it and passes the results downstream.

Why is a query checker tool important in an agent-driven SQL workflow?

SQL generation often produces invalid syntax (for example, the transcript shows an intentionally incorrect query like “select star … limit …” and then demonstrates that the query checker rewrites it into valid SQL). Because the query checker itself requires an LLM call, it acts as a guardrail that improves success rates when agents attempt free-form SQL.

What role does the data analyst agent play after SQL results are returned?

The analyst agent turns raw query outputs into a narrative report tied to the original question. In the salary dataset example, it computes and interprets patterns such as average salary differences by location, company size, and experience level, and it also discusses relationships like remote vs non-remote salary differences for a specific job title.

How does the senior report editor change the output format and length?

The senior report editor takes the analyst’s longer markdown report and produces an executive summary focused on the most important points. The demo explicitly limits the summary to under 100 words to keep it concise, while still reflecting the analyst’s key findings.

What dataset and database setup does the demo use to make the SQL queries concrete?

The demo loads a CSV from Hugging Face (“demos series”) containing 376 salary examples with fields such as experience level, employment type, job title, company location, remote ratio, and salary in USD. It converts the CSV into a SQLite database using SQLAlchemy and creates a table named salaries, which the agents query.

What safety concern is raised about letting agents execute arbitrary SQL?

The transcript notes that the generic SQL execution tool returns results as an array-like string and warns that, in production, it’s not the safest approach. Without restrictions, an agent could run harmful or overly broad queries, so query types should be limited.

Review Questions

What specific tools does the database developer agent use, and how do they work together to produce valid SQL results?
How do the analyst and report editor roles differ in output content and length?
What kinds of salary factors does the demo test in its first example question, and how are those factors reflected in the SQL query logic?

Key Points

1
Use a multi-agent pipeline where one agent generates/validates SQL, another interprets results, and a third produces an executive summary.
2
Wrap SQL capabilities into agent-friendly tools: table listing, schema inspection, SQL execution, and LLM-based query correction.
3
Connect LLMs to SQL through a relational database layer (the demo uses SQLite via SQLAlchemy) so agents can query structured data directly.
4
Treat unrestricted SQL execution as a production risk; add guardrails to limit query types and scope.
5
Feed SQL results into a dedicated analysis agent to produce a detailed markdown report grounded in the retrieved rows.
6
Compress long analyses with a separate summarization agent to produce short, decision-ready outputs.
7
Run the crew sequentially for predictable data flow: SQL → analysis → summary.

Highlights

The system’s reliability hinges on a query checker that rewrites invalid SQL into valid SQL before execution.

A three-agent division of labor—SQL developer, data analyst, and report editor—turns database rows into both a detailed report and a short executive summary.

The demo converts a Hugging Face salary CSV into a SQLite table named salaries, then queries it using agent-generated SQL.

Remote work status is analyzed by filtering to a specific job title and comparing salary in USD across remote vs non-remote categories.

Topics

SQL Agent Teams
Natural Language to SQL
CrewAI Orchestration
LangChain Tools
Llama 3 via Groq

Mentioned

Venelin Valkov
LLM
SQL
API
CSV
SQLAlchemy
Groq
Llama 3
CrewAI