Cleaning up my Logseq Graph (sound fix)
Based on Tools on Tech's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use declutter cleanup queries (orphan pages, broken references, tasks without tags, empty files) to target problems directly instead of manually scanning a slow graph view.
Briefing
A graph cleanup spree is underway to prepare a Logseq knowledge base for an upcoming database-oriented version, with the biggest practical lesson being that large graphs slow down visual tools and make “manual” cleanup painful—so targeted queries and careful file hygiene matter more than chasing every node in the graph view. With roughly 3,900 pages (and rising as links and topics get rebuilt), the work starts by removing orphan pages—pages that “go nowhere”—but quickly turns into a broader audit of broken references, empty files, and automation-generated clutter.
The cleanup process leans heavily on Logseq’s “declutter” and cleanup queries rather than relying on graph navigation. Orphan pages are removed, but the transcript notes that some “empty” pages aren’t truly harmful (for example, pages that exist only as placeholders or are created by earlier workflows). More time goes into broken references: when a referenced page or block is deleted, embeds and block references can become dead ends. A key example involves embed blocks that no longer resolve; attempts to “go back” after deleting blocks can confuse Logseq’s history, forcing the user to reopen whole pages and delete blocks in a safer order.
Empty pages are treated as a spectrum. Some are safe to delete because they function as reference shells—especially in the context of “change logs,” where the page may exist only to collect backlinks from blocks. But other empties reveal workflow gaps: journal-style pages that were created but never filled become “half pages” that need follow-up. The transcript also highlights a subtle behavior: certain empty reference pages may not appear in cleanup lists until they contain a minimal amount of content, meaning cleanup results can depend on how pages were created.
Graph view is described as useful for spotting problems at the edges—small “constellations” of lightly connected notes—but it’s also slow and cumbersome at this scale. The inability to shift-click and open pages in the background while keeping the graph stable is framed as a major usability bottleneck. Even with a capable computer, the graph’s constant loading, linking, and rendering competes with other CPU-intensive tasks, especially during streaming/recording.
A major source of mess comes from automated imports, particularly Readwise-style highlight syncing. Highlight pages can multiply into large clusters that look informative but often don’t contain the exact context the user wants; they can also bloat the graph with near-duplicate highlight artifacts. After making backups, the cleanup includes deleting many small highlight-related markdown files, with the explicit acceptance that some data loss is preferable to indefinite clutter.
Finally, the cleanup is positioned as migration preparation. The user plans to migrate toward a database backend and expects that some cleanup can be done more easily on the markdown side using file manipulation tools, while the database phase may require different tooling. The plan is to keep iterating in weekly chunks, use Logseq Merge Pages for deduping similar pages (preserving aliases), and tighten tagging conventions so the database version can classify pages reliably. The overall message: clean enough now to reduce migration pain, but don’t aim for perfection—expect mess, test aggressively, and rely on backups when deleting aggressively.
Cornell Notes
The cleanup focuses on making a large Logseq graph manageable and migration-ready for a database-oriented future. The most effective tactics are targeted cleanup queries (orphan pages, broken references, empty files) and careful deletion practices, because graph view becomes slow and awkward at thousands of nodes. Automation—especially Readwise-style highlight syncing—creates huge clusters of low-value pages, so backups and selective removal are used to reduce clutter. The work also improves structure by merging duplicates with Logseq Merge Pages and shifting toward tagging so page types can be inferred reliably later. The guiding principle is pragmatic: accept some data loss, test changes safely, and clean in chunks rather than trying to fix everything in one pass.
Why does the cleanup rely more on queries than on graph view when the graph has thousands of pages?
What kinds of broken references show up, and why are embeds especially tricky?
When is it safe to delete an “empty” page, and when is it a sign of workflow problems?
How does the transcript treat automated highlight clutter from Readwise-style syncing?
What is the role of Logseq Merge Pages in the cleanup strategy?
How does tagging change the cleanup and future migration plan?
Review Questions
- What specific failure modes make graph view less practical than cleanup queries at ~3,900 pages?
- Describe how broken embed references can break navigation and why whole-page cleanup may be safer than block-level edits.
- Why does the transcript treat automated highlight pages as a special cleanup category, and what safeguards are used before deleting them?
Key Points
- 1
Use declutter cleanup queries (orphan pages, broken references, tasks without tags, empty files) to target problems directly instead of manually scanning a slow graph view.
- 2
Treat broken references—especially embed blocks—as navigation hazards: delete broken embed blocks from the whole page to avoid history confusion.
- 3
Delete empty pages pragmatically: reference-shell empties (like some change log pages) may be safe, while journal “half pages” often indicate missed work.
- 4
Automated highlight syncing can overwhelm a knowledge base; make backups and selectively remove low-value highlight markdown files to prevent the graph from becoming dominated by automation.
- 5
Reduce duplicates with Logseq Merge Pages, keeping one canonical page and using aliases to preserve connections.
- 6
Shift toward consistent tagging so the database backend can classify pages reliably during migration.
- 7
Clean in stages (e.g., weekly chunks) and expect some data loss; test migration readiness rather than aiming for a pristine graph.