Get AI summaries of any video or article — Sign up free
Smart Second Brain for Obsidian(Free & Offline) thumbnail

Smart Second Brain for Obsidian(Free & Offline)

Prakash Joshi Pax·
5 min read

Based on Prakash Joshi Pax's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Smart Second Brain can run offline by pairing the Obsidian plugin with the local OMA LLM runner.

Briefing

A privacy-first “second brain” for Obsidian can run entirely offline by pairing the Obsidian plugin Smart Second Brain with locally hosted large language models (LLMs). The setup turns a personal Obsidian vault into a searchable knowledge base: users can chat with their notes, ask questions, generate summaries, and get suggestions with links back to the exact notes that informed the answers—without sending content to the cloud.

The workflow starts by installing an offline LLM runner called OMA (from oma.com). OMA provides the ability to run LLMs locally on Mac, Linux, and Windows, including models such as Llama2, Gemma, and Dolphin Mixtral (as listed in the transcript). After OMA is installed and running, the next step is adding the Obsidian plugin Smart Second Brain. Once enabled, the plugin can auto-start with Obsidian and lets users exclude specific files or folders from indexing to avoid long waits—important because indexing time scales with vault size.

Smart Second Brain then connects to OMA for two key model roles. First is the embedding model, which indexes the vault so the system can retrieve relevant notes when a question is asked. The transcript highlights “mxb y embed large” as a recommended embedding choice. Second is the chat (language) model, which generates the final response. The plugin also includes retrieval controls such as similarity and “creativity,” plus an option to use third-party services (notably OpenAI), though the demonstration focuses on local OMA for offline privacy.

After installing the chosen models, Smart Second Brain initializes and indexes the vault. In the demo, a vault with roughly 250 nodes indexed in minutes, but larger vaults can take longer. Once indexing completes, the “octopus” toggle determines whether answers draw from the vault’s indexed notes. With the toggle off, the assistant replies using its general training. With the toggle on, it retrieves relevant notes and can create clickable references back into the vault.

The transcript also surfaces limitations and tuning behavior. When retrieval fails, the assistant may produce answers that don’t match the vault or may generate links to notes that don’t exist—an issue the creator hopes to see fixed. Users can adjust similarity and creativity to improve retrieval quality. Switching chat models (e.g., from GMA to Mixtral) forces reindexing and changes answer accuracy; the demo reports better alignment after reindexing with Mixtral.

Model choice affects both quality and hardware demands. The transcript compares Mixtral’s performance to GPT-3.5-class results, but notes it is resource-heavy (around 25GB), requiring a powerful PC. For the best balance, the demonstrator settles on Llama2 as the chat model paired with “mxb y embed large” as the embedding model, citing improved answers and correct references.

Beyond Q&A, the plugin supports saving chats back into the vault as notes, and deleting chats when needed. Overall, the core value is a conversational Obsidian experience grounded in personal notes—fast enough to iterate, configurable enough to tune, and private enough to keep everything local.

Cornell Notes

Smart Second Brain turns an Obsidian vault into a local, offline Q&A assistant by indexing notes with an embedding model and answering with a locally hosted LLM via OMA. The key privacy feature is that vault content stays on-device, with answers optionally grounded in retrieved notes and linked back to the exact sources. Setup requires installing OMA, installing the Smart Second Brain plugin, selecting a chat model and an embedding model, then initializing and indexing the vault. Retrieval quality depends on similarity and creativity settings, and switching chat models triggers reindexing. The demo reports that Llama2 + “mxb y embed large” works well, while Mixtral can improve results but demands much more compute and storage.

How does Smart Second Brain keep answers grounded in a user’s own Obsidian notes?

It relies on two model roles. An embedding model indexes the vault so the system can retrieve relevant notes for each question. A chat model then generates the response using the retrieved context. When the “octopus” toggle is enabled, the assistant uses the indexed vault context; when disabled, it answers from general training data. The transcript also notes that answers can include clickable references that take the user to the specific notes used.

Why do similarity and creativity settings matter when asking questions?

Retrieval can fail if the system can’t find sufficiently relevant notes. The transcript shows this when asking “how to read,” where the assistant initially returns no retrieved notes. Lowering similarity and adjusting creativity changes how many notes are retrieved and how the response is formed. Later, the user tunes similarity (e.g., lowering to around 30, then increasing) to get the assistant to retrieve the right notes and produce answers closer to the vault content.

What happens when switching chat models like GMA to Mixtral?

Switching the chat model requires reinitializing and reindexing the vault. The transcript reports that indexing for ~250 nodes took a couple of minutes with one setup, but after switching to Mixtral it took about 6–7 minutes. After reindexing, the assistant’s answers to questions like “under hour rule” improved and included references to the correct vault note.

What are the main limitations observed in the vault-grounded mode?

Two issues appear in the transcript. First, retrieval isn’t perfect—sometimes the assistant can’t retrieve the expected note, leading to less accurate answers. Second, the assistant may create links to notes that don’t exist in the vault. The demonstrator flags this as a current limitation and expects improvements from future plugin updates.

How do hardware requirements influence which local model to choose?

The transcript contrasts model quality with resource needs. Mixtral is described as performing on par with GPT-3.5-class results, but it is large (about 25GB) and requires a powerful PC. Smaller local options like Llama2 and Mistral are presented as more practical, and the demonstrator ultimately recommends Llama2 as the chat model paired with “mxb y embed large” for embedding.

What additional workflow features does Smart Second Brain offer beyond chat?

The assistant can save chats into the vault as notes, so the conversation becomes part of the user’s Obsidian workspace. It also supports deleting chats. Additionally, users can toggle whether the assistant uses vault context at all, letting them choose between vault-grounded answers and purely model-based answers.

Review Questions

  1. What is the difference between the embedding model and the chat model in Smart Second Brain’s architecture?
  2. How would you troubleshoot a question that returns no relevant notes—what settings would you adjust first?
  3. Why does changing the chat model require reindexing, and how does that affect iteration time?

Key Points

  1. 1

    Smart Second Brain can run offline by pairing the Obsidian plugin with the local OMA LLM runner.

  2. 2

    Vault privacy is maintained by indexing and answering locally, with an option to avoid using vault context entirely.

  3. 3

    Embedding models index note content for retrieval; chat models generate responses using retrieved context.

  4. 4

    Indexing time scales with vault size, so excluding unneeded folders/files can significantly speed setup.

  5. 5

    Retrieval quality depends on similarity and creativity; tuning these can turn “no notes retrieved” into accurate, referenced answers.

  6. 6

    Switching chat models forces reinitialization and reindexing, changing both latency and answer quality.

  7. 7

    Mixtral may deliver strong results but is resource-heavy (around 25GB), while Llama2 + “mxb y embed large” is presented as a practical high-performing pairing.

Highlights

Smart Second Brain can chat with an Obsidian vault using locally hosted LLMs, keeping content off the cloud.
The “octopus” toggle controls whether answers are grounded in retrieved vault notes or generated from general model training.
Similarity and creativity settings directly affect whether the system retrieves the right notes and how closely answers match the vault.
Switching chat models (e.g., to Mixtral) triggers reindexing and can materially improve reference accuracy.
Mixtral’s performance comes with steep hardware demands, while Llama2 + “mxb y embed large” is framed as a balanced default.

Mentioned