Smart Second Brain for Obsidian(Free & Offline)
Based on Prakash Joshi Pax's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Smart Second Brain can run offline by pairing the Obsidian plugin with the local OMA LLM runner.
Briefing
A privacy-first “second brain” for Obsidian can run entirely offline by pairing the Obsidian plugin Smart Second Brain with locally hosted large language models (LLMs). The setup turns a personal Obsidian vault into a searchable knowledge base: users can chat with their notes, ask questions, generate summaries, and get suggestions with links back to the exact notes that informed the answers—without sending content to the cloud.
The workflow starts by installing an offline LLM runner called OMA (from oma.com). OMA provides the ability to run LLMs locally on Mac, Linux, and Windows, including models such as Llama2, Gemma, and Dolphin Mixtral (as listed in the transcript). After OMA is installed and running, the next step is adding the Obsidian plugin Smart Second Brain. Once enabled, the plugin can auto-start with Obsidian and lets users exclude specific files or folders from indexing to avoid long waits—important because indexing time scales with vault size.
Smart Second Brain then connects to OMA for two key model roles. First is the embedding model, which indexes the vault so the system can retrieve relevant notes when a question is asked. The transcript highlights “mxb y embed large” as a recommended embedding choice. Second is the chat (language) model, which generates the final response. The plugin also includes retrieval controls such as similarity and “creativity,” plus an option to use third-party services (notably OpenAI), though the demonstration focuses on local OMA for offline privacy.
After installing the chosen models, Smart Second Brain initializes and indexes the vault. In the demo, a vault with roughly 250 nodes indexed in minutes, but larger vaults can take longer. Once indexing completes, the “octopus” toggle determines whether answers draw from the vault’s indexed notes. With the toggle off, the assistant replies using its general training. With the toggle on, it retrieves relevant notes and can create clickable references back into the vault.
The transcript also surfaces limitations and tuning behavior. When retrieval fails, the assistant may produce answers that don’t match the vault or may generate links to notes that don’t exist—an issue the creator hopes to see fixed. Users can adjust similarity and creativity to improve retrieval quality. Switching chat models (e.g., from GMA to Mixtral) forces reindexing and changes answer accuracy; the demo reports better alignment after reindexing with Mixtral.
Model choice affects both quality and hardware demands. The transcript compares Mixtral’s performance to GPT-3.5-class results, but notes it is resource-heavy (around 25GB), requiring a powerful PC. For the best balance, the demonstrator settles on Llama2 as the chat model paired with “mxb y embed large” as the embedding model, citing improved answers and correct references.
Beyond Q&A, the plugin supports saving chats back into the vault as notes, and deleting chats when needed. Overall, the core value is a conversational Obsidian experience grounded in personal notes—fast enough to iterate, configurable enough to tune, and private enough to keep everything local.
Cornell Notes
Smart Second Brain turns an Obsidian vault into a local, offline Q&A assistant by indexing notes with an embedding model and answering with a locally hosted LLM via OMA. The key privacy feature is that vault content stays on-device, with answers optionally grounded in retrieved notes and linked back to the exact sources. Setup requires installing OMA, installing the Smart Second Brain plugin, selecting a chat model and an embedding model, then initializing and indexing the vault. Retrieval quality depends on similarity and creativity settings, and switching chat models triggers reindexing. The demo reports that Llama2 + “mxb y embed large” works well, while Mixtral can improve results but demands much more compute and storage.
How does Smart Second Brain keep answers grounded in a user’s own Obsidian notes?
Why do similarity and creativity settings matter when asking questions?
What happens when switching chat models like GMA to Mixtral?
What are the main limitations observed in the vault-grounded mode?
How do hardware requirements influence which local model to choose?
What additional workflow features does Smart Second Brain offer beyond chat?
Review Questions
- What is the difference between the embedding model and the chat model in Smart Second Brain’s architecture?
- How would you troubleshoot a question that returns no relevant notes—what settings would you adjust first?
- Why does changing the chat model require reindexing, and how does that affect iteration time?
Key Points
- 1
Smart Second Brain can run offline by pairing the Obsidian plugin with the local OMA LLM runner.
- 2
Vault privacy is maintained by indexing and answering locally, with an option to avoid using vault context entirely.
- 3
Embedding models index note content for retrieval; chat models generate responses using retrieved context.
- 4
Indexing time scales with vault size, so excluding unneeded folders/files can significantly speed setup.
- 5
Retrieval quality depends on similarity and creativity; tuning these can turn “no notes retrieved” into accurate, referenced answers.
- 6
Switching chat models forces reinitialization and reindexing, changing both latency and answer quality.
- 7
Mixtral may deliver strong results but is resource-heavy (around 25GB), while Llama2 + “mxb y embed large” is presented as a practical high-performing pairing.