Systematic reviews in Elicit | Screening & extraction

TL;DR

Elicit’s PDF workflow supports both screening and structured data extraction using criteria-aligned columns, including custom fields.

Briefing Cornell Notes

Briefing

Elicit’s PDF-based workflow is built to speed up the two most time-consuming parts of evidence synthesis—screening studies for inclusion and extracting structured data—while keeping a human-checkable audit trail. The core promise is that AI can pull key information out of papers (including tables), surface the most relevant supporting quotes, and flag low-confidence answers so reviewers can focus their time on verification and interpretation rather than copy-and-paste work.

The walkthrough centers on the “Extract data from PDFs” workflow. Users can upload multiple PDFs at once into their private library, then open each paper to view extracted text and tables. Elicit’s table extraction is highlighted as a differentiator: it can extract data from tables rather than limiting extraction to narrative text. For screening, the system supports column-based extraction aligned to review criteria. Reviewers can start with broad, open-ended fields such as “population characteristics,” then iteratively narrow the focus by adding more specific columns—like participant age or region—based on what the initial pass reveals.

A key operational feature is traceability. When Elicit fills a column, users can click into answers to see the underlying sources and the most relevant quotes, and they can open the paper to confirm context. When Elicit is uncertain, it marks the response with a confidence flag, prompting reviewers to double-check rather than accept everything at face value. The workflow also supports custom columns for criteria not covered by predefined fields. For example, instead of relying on a generic “region” field, a reviewer can create a “continent where the study took place” column with formatting instructions, then filter papers by keyword matches in that column (e.g., include Africa and Asia). This makes it possible to operationalize inclusion/exclusion rules directly inside the screening interface.

After screening and extraction, results can be exported as a CSV for use in spreadsheets like Excel. The export includes extracted metadata (title, authors, and the chosen columns), supporting quotes, and reasoning where applicable—often marking entries as “not applicable” or “not mentioned” with an explanation. Reviewers can then add their own tracking columns such as “reviewed” or “in progress,” using spreadsheet tools for sorting, filtering, and ongoing edits.

Accuracy controls matter. Elicit offers a “high accuracy mode,” described as reducing error by about half compared with regular mode, but at higher cost. The guidance is to use a rougher first pass for large-scale screening, then switch to high accuracy for later-stage data extraction—especially when detailed information is likely to live in tables (e.g., effect sizes).

Finally, the transcript cites internal testing against manual work. For screening roughly 5,000 papers, Elicit reportedly retrieved over 96% of studies deemed relevant by a team, compared with about 92% for trained human research assistants. For data extraction, the system is described as reaching around 98% accuracy versus about 72% for trained members in one comparison, with disagreement cases often resolving in Elicit’s favor after re-checking. The work is saved in the user’s sidebar for continuity, and the papers remain private to the user rather than shared publicly.

Cornell Notes

Elicit’s “Extract data from PDFs” workflow helps teams screen and extract data for systematic reviews by turning PDFs into structured columns with supporting quotes. Reviewers can start with broad criteria (like population characteristics), then add more specific columns (age, region) and even create custom fields (like continent) with formatting instructions. Answers include sources and quotes for verification, and low-confidence outputs are flagged so reviewers know what to double-check. Exporting to CSV supports spreadsheet-based tracking and editing. High accuracy mode reduces error (about half) but costs more, so it’s recommended for later extraction steps—especially when key details are in tables.

How does Elicit support screening decisions without losing the ability to verify what it pulled from a paper?

Screening is driven by column-based extraction tied to inclusion criteria. For each extracted answer, users can click through to see the sources and the most relevant quotes, then open the paper to confirm context. When Elicit is not confident, it adds a flag so reviewers can prioritize manual checks on those specific fields rather than reviewing everything equally.

What’s the practical difference between using predefined columns and creating custom columns during screening?

Predefined columns (like “population characteristics”) provide broad, open-ended extraction that can quickly reveal what dimensions appear across papers. Custom columns let reviewers encode criteria not covered by defaults—such as “continent where the study took place”—and specify formatting so outputs follow a consistent structure. Those custom fields can then be filtered using keyword matching in the extracted cells (e.g., filter for Africa and Asia).

Why does “high accuracy mode” matter, and when should it be turned on?

High accuracy mode is described as cutting error roughly in half compared with regular mode, but it’s more expensive. The recommended pattern is a cheaper, rougher first pass for screening large numbers of papers, then switching to high accuracy for data extraction—particularly when detailed outcomes (like effect sizes) are often located in tables.

How does table extraction change what kinds of data can be extracted for meta-analysis inputs?

Elicit extracts data from tables, not just narrative text. That matters when key quantitative details—such as effect sizes or other dimensions—are primarily reported in tabular form. The transcript links this capability to the recommendation to use high accuracy mode during extraction when table details are central.

What does the CSV export include, and how does it fit into a reviewer’s workflow?

The CSV export includes paper metadata (title, authors), the extracted columns (e.g., population characteristics, age, region, continent), supporting quotes, and reasoning where applicable (including cases marked “not applicable” or “not mentioned”). Reviewers can then add their own spreadsheet columns like “reviewed” or “in progress,” and use spreadsheet sorting/filtering to manage progress and corrections.

What performance comparisons were cited for screening and extraction accuracy?

Internal testing comparisons were described against manual work. For screening about 5,000 papers, Elicit reportedly retrieved over 96% of papers considered relevant by a team, while trained human research assistance achieved about 92%. For data extraction, one comparison cited about 98% accuracy for Elicit versus about 72% for trained members, with flagged low-confidence cases and disagreements often resolving in Elicit’s favor after re-checking.

Review Questions

When would a reviewer choose to add a new column versus creating a custom column with formatting instructions?
How do confidence flags and quote-level traceability change the way reviewers allocate time during screening?
What trade-off does high accuracy mode introduce, and how does that trade-off influence the screening-to-extraction workflow?

Key Points

1
Elicit’s PDF workflow supports both screening and structured data extraction using criteria-aligned columns, including custom fields.
2
Extracted answers come with source traceability via supporting quotes, and low-confidence outputs are flagged for targeted review.
3
Table extraction enables extraction of quantitative details commonly needed for meta-analysis inputs, not just narrative text.
4
Custom columns can be filtered using keyword matching on extracted cell contents, helping operationalize inclusion/exclusion rules.
5
CSV export packages extracted fields, quotes, and reasoning for spreadsheet-based tracking and iterative corrections.
6
High accuracy mode reduces extraction error but costs more, so it’s best reserved for later-stage extraction—especially when table details drive outcomes.
7
Internal comparisons cited higher recall for screening and higher accuracy for extraction versus trained manual research assistance in the reported tests.

Highlights

Elicit can extract data from tables inside PDFs, not only from surrounding text—an important capability for effect sizes and other quantitative fields.

Confidence flags plus quote-level sources create a built-in verification loop, directing reviewer attention to uncertain outputs.

A custom “continent” column example shows how reviewers can impose consistent formatting and then filter papers by keyword matches in extracted cells.

High accuracy mode is positioned as a later-stage tool for extraction, particularly when key details live in tables.

Internal testing claims over 96% recall for relevant studies in a large screening task and about 98% extraction accuracy in a manual comparison.

Topics

Systematic Review Screening
Data Extraction
PDF Table Extraction
High Accuracy Mode
CSV Export