What Your Vault Knows — Talks & Discussion
Based on Obsidian Community Talks's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
TF-IDF converts notes into weighted term vectors by combining term frequency with inverse document frequency to emphasize distinctive words.
Briefing
A three-part discussion on extracting more value from Obsidian vaults centered on a practical question: how can computational methods turn existing notes into better discovery, understanding, and assistance—without forcing users to manually connect everything themselves. The strongest throughline was that “relatedness” can be computed (not just guessed), that vaults can be represented as graphs and embeddings, and that language models can be wired into Obsidian workflows via reusable “skills.” Together, the ideas sketch a path from search and linking to tutoring, summarization, and even deterministic lookups.
Ben’s segment began with a classic document-similarity approach: represent each note as a weighted vector of terms using TF-IDF (term frequency–inverse document frequency). Words that appear frequently in one note but rarely across the vault get higher weights, producing an “encoding” of the note. Similarity then becomes a geometric problem: compare two vectors using cosine similarity, which measures the angle between them rather than raw distance—helpful when notes differ in length. The method is intentionally simple and intuitive, and it’s framed as a way to power “given a note, show me similar notes” inside Obsidian. Limitations surfaced quickly: it’s unlikely to work well for very short texts, and it’s a baseline that more advanced models could improve.
Emil shifted from similarity to knowledge-graph representation and visualization. The goal is a compact, navigable view of a vault’s structure—less scrolling through huge packs of text, more understanding what concepts connect and why. He described “yellow” as a graph-view system where users can define visual rules (node shapes, arrow types) and even annotate why links exist, improving intuition about the underlying graph. From there, Emil argued that language technology could automate the heavy lifting: named entity extraction, entity linking, and relation extraction. Instead of manually annotating every sentence, systems could infer entities and connections automatically, then optionally connect them to external sources like Wikipedia or Wikidata. He also referenced tools such as Codex (a web-based “operating system” for annotated writing), InfraNodus (automatic knowledge-graph construction from text), and Neo4j-based network analysis—positioning them as building blocks for an Obsidian-friendly “browse your vault as a graph” experience.
Paul’s contribution brought the discussion into hands-on automation with Duo, a virtual assistant for knowledge work integrated into Obsidian. Duo’s core mechanism is a chat interface powered by a skill system: users create markdown “skill files” that define patterns for what the assistant should do, often using placeholders that get filled from the user’s prompt or from context pulled from the vault. Examples included generating research questions, finding related concepts, creating quizzes/flash-card-style prompts from notes, and producing summaries or key points. Duo also supports deterministic actions via code blocks and external data sources—such as querying Wikidata through SPARQL-like requests—so not every task depends on free-form generation. A live demo showed skills computing expressions with JavaScript, retrieving related notes, and generating paragraphs based on vault context. The discussion also addressed risks: language models can confabulate, and fine-tuning on personal notes can bias style and content, so safety filters and careful deployment matter.
Across all three segments, the practical message was consistent: vault intelligence emerges when similarity metrics, graph representations, and language-model-driven skills are combined—turning scattered notes into navigable knowledge, and eventually into an assistant that can reason over what’s already been written.
Cornell Notes
The discussion focused on turning an Obsidian vault into something more “computable”: similar-note discovery, graph-based navigation, and an assistant that can act on vault content. Ben described a baseline similarity method using TF-IDF vectors and cosine similarity to rank notes by topic overlap, with the caveat that short texts often fail. Emil argued that knowledge-graph views can make vaults easier to browse, and that NLP can automate entity extraction, entity linking, and relation extraction so users don’t have to annotate everything manually. Paul demonstrated Duo, an Obsidian-integrated virtual assistant where markdown “skills” define reusable behaviors, including context-building from related notes and deterministic lookups via code and Wikidata queries. Together, the approaches show a pipeline from representation → retrieval → assistance.
How does TF-IDF plus cosine similarity turn two notes into a similarity score?
Why is cosine similarity preferred over raw distance for note similarity?
What’s the difference between manually annotating entities in a note and automating it with NLP?
How does Duo’s “skill file” approach make the assistant’s behavior controllable?
What role does deterministic querying (e.g., Wikidata) play alongside generative text?
Review Questions
- If you had two notes with the same key terms but one is much longer, which similarity measure from the discussion would likely handle that better and why?
- Explain how entity extraction, entity linking, and relation extraction differ, and which one is most directly responsible for turning text into graph edges.
- In Duo’s skill system, what are placeholders used for, and how can skills be composed to build context before generating an answer?
Key Points
- 1
TF-IDF converts notes into weighted term vectors by combining term frequency with inverse document frequency to emphasize distinctive words.
- 2
Cosine similarity compares note vectors by angle, reducing sensitivity to note length and focusing on term distribution overlap.
- 3
Graph-based vault views aim to make relationships navigable by showing nodes, edges, and optionally “why” explanations for connections.
- 4
NLP automation can replace manual entity annotation by performing entity extraction, entity linking, and relation extraction to infer graph structure from text.
- 5
Duo’s behavior is controlled through markdown “skill files” that define patterns with placeholders filled from user prompts and/or vault context.
- 6
Duo can mix generative responses with deterministic actions, including code execution and structured queries to sources like Wikidata.
- 7
Safety and reliability remain central concerns because language models can confabulate and personal fine-tuning can introduce bias in style and content.