Get AI summaries of any video or article — Sign up free
An Introduction to Dataview - Part 2 thumbnail

An Introduction to Dataview - Part 2

6 min read

Based on Obsidian Community Talks's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

“Contains” checks substrings on single strings, but on lists it only matches exact elements/objects—substring expectations often break for multi-valued fields like authors.

Briefing

Dataview’s “contains” and related functions let Obsidian users filter notes by matching substrings, list membership, and even patterns in filenames and titles—turning everyday metadata into powerful, query-driven indexes. The core takeaway is that “contains” behaves differently depending on whether it’s applied to a single value or a list: on strings it checks for substring presence, but on arrays it only returns true when an exact element/object matches. That distinction matters because it changes what queries will work when metadata fields are multi-valued (like an authors list).

The session starts with practical examples of “contains.” A string query like “yes contains e” returns true, while “no contains e” returns false. The same idea extends to lists: a list [1,2,3] “contains 2” is true, while “contains 4” is false. From there, the discussion moves into real note filtering: building lists of daily notes by checking whether the filename contains a prefix/suffix (e.g., “dn” in the filename), or finding source notes by testing whether an authors field includes a specific author name (e.g., notes whose authors list contains “robert lamb”). A key caveat emerges when someone tries to “vectorize” contains across arrays: even if an authors field includes a substring match, the query won’t behave like substring search across list elements; it only succeeds when the exact object/value is present.

To handle date-like selection in titles, the conversation pivots to regular-expression-based matching. A common approach is using “regex match” (via rejects match) against file names or titles, with the regex crafted to match the entire string format. Participants note that you can still target substrings by shaping the regex (for example, using patterns that allow digits to appear in the right position). For example, selecting notes whose titles include a date can be done by matching a filename pattern with a four-digit year and two-digit month/day structure, or by using a field like file.day when available—where null results naturally fail the where condition.

The session then broadens into other aggregation and filtering functions. “Length” supports queries like “notes whose filename length is at least 20 characters” and “notes whose tags list length equals zero” to find untagged notes. “Sum” can add numeric lists, enabling workflows like totaling study time across repeated entries stored as list items. Questions about counting embedded items (like “blocking beds” or block embeds) highlight current limitations: Dataview can’t reliably access rendered embed content in live/preview contexts, and counting embedded elements may require workarounds such as link-based queries or regex detection, with some functionality only visible in edit mode.

Beyond functions, the discussion turns to data hygiene and maintainability. Participants compare manual curation versus dynamic “maps of content” generated from tags and templates. One attendee describes using a template (e.g., dataview mlc) to generate an index/map without cluttering graph views with real links, and sorting it by recently modified items. The group repeatedly returns to the same practical reality: Dataview’s power depends on consistent YAML metadata, and templates can reduce errors and repetitive maintenance. Finally, the session closes with guidance on contributing—using GitHub issues for feature requests/bugs, forking repositories, and updating documentation—so the plugin can evolve alongside user needs.

Cornell Notes

Dataview filtering hinges on how functions behave on different data types. “Contains” checks substrings on single strings, but on lists it only returns true for exact element/object matches—so substring expectations often fail for multi-valued metadata like authors. For date-like selection in filenames or titles, regex-based matching (via rejects match) can target strict formats, while field-based approaches like file.day can cleanly filter notes by returning null when a date isn’t present. Additional functions like length and sum enable practical tasks such as finding untagged notes or totaling numeric lists. The session also emphasizes that reliable results depend on consistent YAML metadata and that templates can reduce maintenance overhead.

Why does “contains” work for substring searches on strings but not for substring searches inside list-valued metadata?

“Contains” behaves differently by input type. On a single string, it checks whether the string includes a substring (e.g., “yes” contains “e” → true; “no” contains “e” → false). On a list/array, it checks whether the list contains an exact element/value (e.g., [1,2,3] contains 2 → true; contains 4 → false). In the authors example, even if an author name field includes the substring “lam,” a query against a list won’t treat it as substring matching across elements; it only returns true when the exact object/value matches.

How can Dataview select notes whose titles or filenames include dates?

One approach uses regex matching against file names/titles, with a regex crafted to match the required date format. Participants note the regex often needs to match the entire string for the query to succeed, but it can be shaped to effectively target substrings by using patterns that allow digits in the right positions. Another approach uses structured fields like file.day: if file.day returns null for notes without a date, the where condition fails automatically, making the filter clean and avoiding brittle regex.

What are practical uses of the length function in Dataview queries?

Length supports threshold and existence-style filtering. Examples include: selecting notes where the filename length is at least 10 or 20 characters (to show only longer notes), and selecting notes where the tags list length equals 0 to find untagged notes. Because tags are represented as a list, length can act as a direct proxy for “has tags vs. has none.”

How does sum enable aggregation, and what limitation came up when trying to count embedded items?

Sum can add numeric values from a list, such as totaling repeated study durations stored as list items (e.g., [15,20,...] → total minutes). Counting embedded elements like “blocking beds” proved harder: Dataview may not access rendered embed content in preview/live contexts, and some visibility differences appear between edit mode and preview mode. Workarounds discussed included querying links or using regex-based detection, but reliable embed-content counting wasn’t presented as straightforward.

Why do templates and consistent YAML metadata matter for Dataview results?

Dataview queries depend on metadata fields being present and correctly formatted. Adding metadata to everything can be an upfront investment, but once established, maintenance becomes the ongoing cost. Templates help by auto-filling YAML for new notes (reducing typos and format drift) and by keeping dynamic fields consistent. Participants also noted limitations around YAML variables for avoiding repeated manual date entry, which affects how easily dynamic due dates can be reused across multiple fields.

What contribution paths were suggested for improving Dataview?

Users were directed to the plugin’s GitHub page: report bugs or request features via the Issues tab, write descriptive titles, and use markdown in issue comments. For deeper involvement, the repository can be forked and code contributions can be made. Documentation contributions were also encouraged—fork the docs and update them for beginner clarity—and community support was mentioned via Discord (e.g., messaging “blacksmith”).

Review Questions

  1. When would you expect “contains” to fail for substring matching, and what data type difference causes that?
  2. Compare regex-based date filtering with field-based filtering using file.day: what makes each approach robust or fragile?
  3. What kinds of tasks are good fits for length and sum, and what kinds of counting tasks may run into embed-content limitations?

Key Points

  1. 1

    “Contains” checks substrings on single strings, but on lists it only matches exact elements/objects—substring expectations often break for multi-valued fields like authors.

  2. 2

    Filename/title date filtering can be done with regex matching, but regexes may need to match the full string format unless carefully constructed for substring targeting.

  3. 3

    Field-based filtering like file.day can be cleaner than regex because null values naturally exclude notes lacking the date.

  4. 4

    Length supports practical filters such as finding untagged notes (tags list length equals zero) or selecting notes by minimum filename size.

  5. 5

    Sum enables numeric aggregation across list-valued metadata, such as totaling repeated study durations.

  6. 6

    Counting embedded elements is constrained by Dataview’s access to rendered content in preview/live contexts, pushing users toward workarounds like link queries or regex detection.

  7. 7

    Consistent YAML metadata and templates reduce query breakage over time; maintenance remains a core requirement for reliable Dataview results.

Highlights

“Contains” is substring-aware for single strings, but list queries require exact element matches—vectorizing substring logic across arrays doesn’t behave the way many users expect.
Selecting date-bearing notes can be done either with regex against filenames/titles or with structured fields like file.day, where null automatically filters out non-date notes.
Length turns metadata into actionable filters—tags length of zero quickly surfaces untagged notes.
Sum can total numeric lists for workflows like aggregating study time, but counting embed instances runs into preview/live access limits.
Templates can generate dynamic “maps of content” from tags while avoiding graph clutter from real links, shifting maintenance from manual curation to metadata discipline.

Topics

  • Dataview contains
  • Regex Date Filtering
  • Length and Sum Functions
  • Embed Counting Limits
  • YAML Templates and Maintenance