Adding and Organizing Archival Documents in DEVONthink

TL;DR

Create a clear folder/group structure in DEVONthink before importing documents so later tagging and searching stay consistent.

Briefing Cornell Notes

Briefing

Organizing archival PDFs in DEVONthink becomes dramatically more useful once citation data is permanently tied to each document—so later writing can pull footnotes and bibliographies automatically instead of forcing researchers to backtrack through sources. In a walkthrough built around Abraham Work Dots’s papers from the Archives of American Art, the process starts with creating a folder structure, importing the first batch of PDFs, and immediately converting them into fully searchable text so key dates and names can be found with DEVONthink’s search tools.

The first concrete step is practical triage: the archivist creates a subgroup for “Folder 1,” drops in seven PDFs, then renames each file with meaningful identifiers. One PDF is renamed “Dots CV” and annotated with “Circa 1961,” while a tag such as “biography” is applied to support later chronological or thematic searching. A key issue appears right away: although the file is labeled “PDF plus text” (OCR output), it still isn’t searchable inside DEVONthink. The solution is to re-OCR the document so that text can be highlighted and searched—demonstrated by searching for “1889,” which becomes findable only after re-OCR.

Next comes the citation backbone. The workflow uses a custom “super annotation” file—created as part of the DEVONthink for Historians super user materials—that stores the citation metadata needed for Chicago-style references to archival documents. The annotation captures author and title, the archive name and location, and the relevant date (with collection number skipped when the archive doesn’t provide one). It also includes personal research notes, such as a reminder to check specific event dates later. After saving, the researcher exports the super annotation to Bookends, a reference manager, using a provided script that transfers the citation record so it can be used in Microsoft Word.

To make writing faster and more reliable, the workflow adds the Bookends reference ID number directly into each document’s filename. That way, when drafting in Word, citations can be inserted via the Bookends integration using the “squiggly line” citation workflow. The payoff arrives at the end of writing: once claims are linked to the correct archival documents, Bookends can generate complete footnotes and bibliographies, including automatic handling of repeated citations in short form.

Beyond citations, the method is framed as a structured first review of the materials. Renaming, tagging, and writing targeted comments forces a slower look at what each item is and how it might be used later. The result is a database that supports both retrieval (searchable text, tags, and annotations) and writing (citation records that survive file moves, accidental deletions, or assembling documents into smart groups). The presenter notes that while edge cases exist for tricky citations, the super annotation format is positioned as sufficient for most archival documents, with the DEVONthink for Historians super user guide detailing how to set up the custom citation record for other reference managers such as Zotero.

Cornell Notes

The workflow for archival PDFs in DEVONthink centers on one idea: every document should carry its citation metadata with it. After importing and renaming PDFs into a structured folder/group, the files are re-OCR’d until their text is truly searchable. A custom “super annotation” file is then created for each document to store Chicago-style archival citation fields (author/title, archive name/location, circa date, and notes), and that annotation is exported to Bookends via a script. Finally, the Bookends reference ID is added to the document filename so Word citations can be inserted quickly and later auto-formatted into footnotes and bibliographies, including short-form repeat citations. This upfront organization also functions as an initial review, helping researchers decide how each item will be used later.

Why does re-OCR matter even when a PDF already shows “PDF plus text” in DEVONthink?

In the walkthrough, the initial OCR output wasn’t actually searchable inside DEVONthink. After re-OCR, the document became highlightable and searchable—demonstrated by searching for “1889,” which only worked after the re-OCR step. The practical takeaway is to verify searchability, not just rely on the file’s OCR label.

What is stored in the “super annotation” and how does it support Chicago-style citations for archival documents?

The super annotation is a custom annotation file that collects citation fields needed for Chicago-style references to archival materials. It includes the author and title, the archive name and location, and the document date (e.g., “Circa 1961”). If the archive doesn’t provide a collection number, that field is skipped. It also holds researcher comments—like reminders to check specific event dates—so citation metadata and research context stay linked to the document.

How does exporting to Bookends reduce citation work during writing in Word?

After saving the super annotation, the workflow runs a script to export the annotation to Bookends. During drafting in Microsoft Word, citations can be inserted using the Bookends integration (via the reference ID embedded in the filename). At the end, Bookends can generate the bibliography and footnotes automatically, including short-form formatting for repeated citations.

What’s the purpose of adding the Bookends reference ID to each document’s filename?

The ID acts as a durable bridge between the archival PDF and its citation record. If files are moved, misplaced, or removed from their expected location, the attached citation information can still be consulted to restore the correct source. It also speeds up citation insertion later because the ID is immediately available while writing.

How does the organization workflow double as a first-pass review of archival material?

Renaming, tagging (like “biography”), and writing targeted comments forces the researcher to slow down and assess each item’s value and likely future use. Those decisions prime later retrieval: when searching later or encountering documents again, the researcher already knows what to look for and how the item fits into the broader project.

What limitation is acknowledged, and what workaround is offered?

The workflow notes that exceptions exist for tricky citations. For most archival documents, the super annotation format is presented as sufficient. For other reference managers (beyond Bookends), the DEVONthink for Historians super user guide describes how to set up the custom archival citation record—explicitly mentioning Zotero as an example.

Review Questions

When should a researcher re-OCR a PDF in DEVONthink, and what quick test confirms the OCR is usable?
How do the super annotation and Bookends reference ID work together to streamline footnotes and bibliographies in Microsoft Word?
What kinds of metadata and notes are most important to include in the super annotation for archival documents?

Key Points

1
Create a clear folder/group structure in DEVONthink before importing documents so later tagging and searching stay consistent.
2
Verify that OCR output is truly searchable inside DEVONthink; re-OCR when highlighting/search fails.
3
Rename PDFs with meaningful labels (e.g., document type and circa date) and apply tags that reflect how the material will be retrieved later.
4
Use a custom “super annotation” to store Chicago-style archival citation fields plus research reminders, then save it for each document.
5
Export super annotations to Bookends using the provided script so citation records are ready for writing.
6
Embed the Bookends reference ID into each document filename to speed citation insertion and reduce backtracking after file moves.
7
Treat tagging and annotation as an initial review step—deciding how each item will be used later improves retrieval and writing efficiency.

Highlights

Re-OCR isn’t optional when DEVONthink can’t actually search the text; the workflow demonstrates a before/after using a date search (“1889”).

A “super annotation” file becomes the citation engine: it stores Chicago-style archival metadata and personal research notes in one place.

Exporting to Bookends plus adding the reference ID to filenames turns citation insertion in Word into a fast, repeatable workflow.

The method reduces risk from accidental file loss or database reorganization by keeping citation context attached to each document.

Upfront organization doubles as a structured first review, making later searching and drafting faster and less error-prone.

Topics

DEVONthink Organization
Archival OCR
Super Annotations
Bookends Integration
Chicago-Style Citations

Mentioned

DEVONthink
Bookends
Microsoft Word
Zotero
Avi Gail Oren
Ada Bartlett
Abraham Work Dots
Aaron Goodelman
OCR