Get AI summaries of any video or article — Sign up free
100% Local Super Easy Private Email RAG Setup | Ollama - Gmail ++ thumbnail

100% Local Super Easy Private Email RAG Setup | Ollama - Gmail ++

All About AI·
5 min read

Based on All About AI's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

A local email RAG workflow can ingest Gmail and Outlook messages, embed them locally with Ollama, and answer questions offline.

Briefing

A fully local “email RAG” workflow lets users pull selected messages from Gmail and Outlook, embed them on their own machine with Ollama, and then ask questions or generate summaries offline—without sending email content to a third-party service. The core value is speed and privacy: emails are downloaded, cleaned, chunked, embedded into a local vault, and then queried through a local retrieval pipeline powered by a Llama model.

The setup supports two main ways to choose what gets ingested. First, users can pass a keyword filter to retrieve all emails whose bodies contain that term; the system reports how many matches were found in Gmail and Outlook, then embeds the resulting text into the local vault. Second, users can define a date range (example: May 1 to May 10) to capture recent conversations, again embedding everything that falls within the window. After ingestion, the vault is reloaded so the system can answer questions like “summarize what me and [someone] talked about” for a specific thread topic, producing a compact recap drawn from the embedded messages.

A key operational detail is caching embeddings to avoid recomputation every time the user starts a new session. The workflow includes a “clear cache” step: when embeddings already exist, the system loads them and begins answering immediately; when the vault changes or the user clears the cache, it regenerates embeddings using the configured embedding model. This makes iterative searching practical—users can wipe prior results, ingest a new keyword set (for example, “Taylor Swift” or “Nvidia”), clear the cache, and then query for concrete answers such as the concert venue and dates.

Under the hood, the pipeline is a two-part process. One part collects emails, strips out messy HTML and weird characters, and converts content into clean plain text. The transcript notes use of HTML-to-text extraction and cleaning steps (including removal sequences, whitespace normalization, and an lxml-based approach) before chunking the text for embedding. The second part runs Ollama locally with a configurable model (defaulting to Llama 3) and a RAG configuration that points to local files such as the vault text, embeddings storage, and model settings.

For Gmail access, the workflow requires Google “app passwords” rather than a standard password, created for a specific app name and used by the script to authenticate. Outlook access is described as simpler, using an Outlook server plus username and password, with the option to select which mailbox scope to ingest (the transcript mentions inbox).

The creator also flags a security uncertainty: while the system is designed for local processing, it’s not presented as a guaranteed secure design, and users are encouraged to report concerns if email-handling libraries or authentication steps introduce risk. The session ends with a demonstration of the same local email RAG pattern on new searches, plus a separate member-repo highlight: an “LLM text analyzer” that runs fully locally to summarize uploaded text files via a simple web interface using Ollama.

Cornell Notes

The workflow builds a 100% local email RAG system using Ollama. It downloads selected Gmail and Outlook emails, cleans and chunks the text, embeds it into a local vault, and then answers questions or produces summaries offline. Users can ingest by keyword (searching email bodies) or by date range (e.g., May 1–May 10), and the system reports how many matches were found in each provider. Embeddings are cached to speed up repeated queries; clearing the cache forces regeneration when the vault changes. Gmail requires Google app passwords for authentication, while Outlook uses standard credentials with an Outlook server setting.

How does the system decide which emails to ingest from Gmail and Outlook?

It supports two selection modes. One mode takes a keyword argument and retrieves emails whose bodies contain that keyword, then embeds the matching messages (with counts reported separately for Gmail and Outlook). Another mode uses a start date and end date to ingest emails within a time window (example given: May 1 to May 10), again embedding everything found in that range. The transcript also demonstrates combining keyword and date-range ingestion (e.g., May 1–May 10 with keyword “Nvidia”).

What makes repeated querying fast instead of slow every time?

Embeddings are cached. After ingestion, the system stores embeddings locally (in an embeddings file) and reloads them on subsequent runs, so it can start retrieval and answering without re-embedding the same email text. A “clear cache” step is used when switching to a new set of emails or when the vault changes, forcing regeneration of embeddings.

What does the email text processing pipeline do before embedding?

It cleans and normalizes email content. The transcript describes stripping weird characters and HTML artifacts, converting HTML to text, and using functions that remove sequences and extra whitespace. It also mentions using lxml to remove unwanted structures, aiming to produce cleaner plain text before chunking for embedding.

How are Gmail credentials handled compared with Outlook credentials?

Gmail requires Google “app passwords.” The workflow logs into the Gmail account, creates an app password for a specific app name, and uses the resulting ~16-character password in the script. Outlook is described as easier: the script uses an Outlook server plus username and password, and the transcript mentions selecting inbox as the ingested scope.

Which local model and configuration knobs control the RAG behavior?

The default local model is Llama 3, with an option to override it via a “--model” flag. The configuration includes a vault text path, embeddings file path, model name, and a “top K” parameter (the transcript suggests top K can be adjusted, with an example value of 7). A system message, base URL, and API key appear in the config as well.

What kinds of questions does the system answer after ingestion?

It supports both summarization and factual retrieval from the embedded email text. Examples include summarizing a conversation topic (e.g., a discussion about “scrum V2 beta” and related items) and answering specific questions like the venue and dates for a concert mentioned in emails (e.g., “Friends Arena in Stockholm” with May 17–19).

What security caveat is raised about the approach?

The transcript includes uncertainty about security guarantees. The creator notes they are not a cybersecurity expert and asks viewers to flag potential security issues—especially around email handling libraries and authentication—so the system can be improved or removed if it’s not safe.

Review Questions

  1. When would you need to clear the embedding cache, and what benefit does caching provide during repeated searches?
  2. Describe the two email ingestion modes and give one example of each from the transcript.
  3. What preprocessing steps are applied to email content before chunking and embedding?

Key Points

  1. 1

    A local email RAG workflow can ingest Gmail and Outlook messages, embed them locally with Ollama, and answer questions offline.

  2. 2

    Email selection can be done by keyword (body search) or by date range, with separate Gmail/Outlook match counts.

  3. 3

    Embedding caching speeds up repeated queries; clearing cache forces re-embedding after vault changes.

  4. 4

    Email ingestion includes cleaning and HTML-to-text conversion, followed by chunking and normalization before embedding.

  5. 5

    Gmail authentication uses Google app passwords created for a specific app name, while Outlook uses an Outlook server plus username/password.

  6. 6

    RAG behavior is configurable via local Ollama settings such as the model (default Llama 3) and retrieval parameters like top K.

  7. 7

    The approach is presented as local and private, but security is not guaranteed; users are encouraged to report risks.

Highlights

The system turns private emails into a local searchable vault: download → clean → chunk → embed → query offline.
A “clear cache” mechanism prevents re-embedding on every run, making iterative keyword/date searches practical.
Gmail access relies on Google app passwords, while Outlook uses an Outlook server credential flow.
After ingesting “Taylor Swift” emails, the system answers with a specific venue and date range pulled from message text.
A separate member demo shows a local “LLM text analyzer” that summarizes uploaded files via a simple interface using Ollama.

Topics

  • Local Email RAG
  • Ollama Embeddings
  • Gmail App Passwords
  • Outlook Ingestion
  • Llama 3 Retrieval

Mentioned

  • RAG