Systematic reviews in Elicit | Screening & extraction
Based on Elicit's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Elicit’s PDF workflow supports both screening and structured data extraction using criteria-aligned columns, including custom fields.
Briefing
Elicit’s PDF-based workflow is built to speed up the two most time-consuming parts of evidence synthesis—screening studies for inclusion and extracting structured data—while keeping a human-checkable audit trail. The core promise is that AI can pull key information out of papers (including tables), surface the most relevant supporting quotes, and flag low-confidence answers so reviewers can focus their time on verification and interpretation rather than copy-and-paste work.
The walkthrough centers on the “Extract data from PDFs” workflow. Users can upload multiple PDFs at once into their private library, then open each paper to view extracted text and tables. Elicit’s table extraction is highlighted as a differentiator: it can extract data from tables rather than limiting extraction to narrative text. For screening, the system supports column-based extraction aligned to review criteria. Reviewers can start with broad, open-ended fields such as “population characteristics,” then iteratively narrow the focus by adding more specific columns—like participant age or region—based on what the initial pass reveals.
A key operational feature is traceability. When Elicit fills a column, users can click into answers to see the underlying sources and the most relevant quotes, and they can open the paper to confirm context. When Elicit is uncertain, it marks the response with a confidence flag, prompting reviewers to double-check rather than accept everything at face value. The workflow also supports custom columns for criteria not covered by predefined fields. For example, instead of relying on a generic “region” field, a reviewer can create a “continent where the study took place” column with formatting instructions, then filter papers by keyword matches in that column (e.g., include Africa and Asia). This makes it possible to operationalize inclusion/exclusion rules directly inside the screening interface.
After screening and extraction, results can be exported as a CSV for use in spreadsheets like Excel. The export includes extracted metadata (title, authors, and the chosen columns), supporting quotes, and reasoning where applicable—often marking entries as “not applicable” or “not mentioned” with an explanation. Reviewers can then add their own tracking columns such as “reviewed” or “in progress,” using spreadsheet tools for sorting, filtering, and ongoing edits.
Accuracy controls matter. Elicit offers a “high accuracy mode,” described as reducing error by about half compared with regular mode, but at higher cost. The guidance is to use a rougher first pass for large-scale screening, then switch to high accuracy for later-stage data extraction—especially when detailed information is likely to live in tables (e.g., effect sizes).
Finally, the transcript cites internal testing against manual work. For screening roughly 5,000 papers, Elicit reportedly retrieved over 96% of studies deemed relevant by a team, compared with about 92% for trained human research assistants. For data extraction, the system is described as reaching around 98% accuracy versus about 72% for trained members in one comparison, with disagreement cases often resolving in Elicit’s favor after re-checking. The work is saved in the user’s sidebar for continuity, and the papers remain private to the user rather than shared publicly.
Cornell Notes
Elicit’s “Extract data from PDFs” workflow helps teams screen and extract data for systematic reviews by turning PDFs into structured columns with supporting quotes. Reviewers can start with broad criteria (like population characteristics), then add more specific columns (age, region) and even create custom fields (like continent) with formatting instructions. Answers include sources and quotes for verification, and low-confidence outputs are flagged so reviewers know what to double-check. Exporting to CSV supports spreadsheet-based tracking and editing. High accuracy mode reduces error (about half) but costs more, so it’s recommended for later extraction steps—especially when key details are in tables.
How does Elicit support screening decisions without losing the ability to verify what it pulled from a paper?
What’s the practical difference between using predefined columns and creating custom columns during screening?
Why does “high accuracy mode” matter, and when should it be turned on?
How does table extraction change what kinds of data can be extracted for meta-analysis inputs?
What does the CSV export include, and how does it fit into a reviewer’s workflow?
What performance comparisons were cited for screening and extraction accuracy?
Review Questions
- When would a reviewer choose to add a new column versus creating a custom column with formatting instructions?
- How do confidence flags and quote-level traceability change the way reviewers allocate time during screening?
- What trade-off does high accuracy mode introduce, and how does that trade-off influence the screening-to-extraction workflow?
Key Points
- 1
Elicit’s PDF workflow supports both screening and structured data extraction using criteria-aligned columns, including custom fields.
- 2
Extracted answers come with source traceability via supporting quotes, and low-confidence outputs are flagged for targeted review.
- 3
Table extraction enables extraction of quantitative details commonly needed for meta-analysis inputs, not just narrative text.
- 4
Custom columns can be filtered using keyword matching on extracted cell contents, helping operationalize inclusion/exclusion rules.
- 5
CSV export packages extracted fields, quotes, and reasoning for spreadsheet-based tracking and iterative corrections.
- 6
High accuracy mode reduces extraction error but costs more, so it’s best reserved for later-stage extraction—especially when table details drive outcomes.
- 7
Internal comparisons cited higher recall for screening and higher accuracy for extraction versus trained manual research assistance in the reported tests.