Get AI summaries of any video or article — Sign up free
New! Automate your Systematic Review with Elicit thumbnail

New! Automate your Systematic Review with Elicit

Elicit·
5 min read

Based on Elicit's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Elicit automates systematic reviews end-to-end—question refinement, paper gathering, screening, extraction, and report drafting—while keeping AI outputs traceable to source text.

Briefing

Elicit’s new systematic review workflow aims to automate nearly the entire pipeline—refining the research question, collecting candidate papers, screening them against custom criteria, extracting structured data, and generating a first-draft report—while keeping every AI decision traceable to the underlying text. The practical payoff is speed without a “black box”: users can review why each paper was included or excluded, override judgments, and revise screening criteria or extraction fields before running on the full corpus.

The process starts with a research question. Users can begin broad (the example centers on microplastics and pregnancy), and Elicit provides hover-based suggestions that tighten key assumptions—such as specifying the population, exposure measurement approach, and outcome definition. Once the question is selected, the workflow can auto-populate the next steps with AI-generated recommendations, though users can uncheck automation to work manually.

Paper collection happens in a “Gather” step with three main routes. First, Elicit Search can pull up to 500 papers at once using semantic search over more than 126 million papers from the Semantic Scholar database, ranking relevance based on title and abstract meaning rather than keyword matching alone. Second, users can add papers from their Elicit Library using tags. Third, users can upload PDFs directly. Elicit positions this as complementary to traditional Boolean or database-specific searching (including workflows still common for regulatory or publication standards): keyword search can remain the baseline, while semantic search helps reduce the risk of missing relevant studies.

Next comes screening criteria definition. Elicit generates custom screening columns tailored to the research question—questions about exposure assessment (e.g., direct maternal exposure vs environmental levels), quantitative measurement, participant type (humans vs animals), study design (excluding case reports, for instance), and whether birth outcomes are reported. Users can toggle suggested criteria on or off, add their own, and iterate quickly using a representative sample of 100 papers before committing to the full set. Each screening decision is presented with explicit yes/no/maybe judgments plus explanations tied to the paper’s text, and papers can be ranked by likelihood of inclusion for faster review. Importantly, screening is not treated as one-and-done: criteria can be adjusted later, and overrides are allowed at the paper level.

After screening, extraction follows a similar pattern: Elicit auto-suggests extraction fields in a “PICO-like” structure (customized to the question) and provides detailed instructions for what to pull from methods and results. The system supports both free-form responses and structured options such as yes/no/maybe, and it can extract quantitative data from tables. Transparency is built in via one-click links to quotes from the source text (including table citations) whenever AI fills a field.

Finally, Elicit generates a report designed as a usable first draft rather than a final publication-ready synthesis. The report includes a one-line summary, a PRISMA-inspired diagram of how studies were evaluated across screening criteria, summaries of included studies, and thematic discussion. Users can chat with the report and underlying data, export results at each stage, download the report as a PDF, and collaborate in real time on supported plans. The workflow is also framed as enabling “living reviews,” where adding new papers can flow through existing criteria and fields rather than restarting from scratch.

Cornell Notes

Elicit’s systematic review workflow automates the full pipeline—question refinement, paper collection, screening, data extraction, and report drafting—while keeping AI outputs transparent and editable. Users can start with a broad research question and receive suggestions that specify population, exposure measurement, and outcomes. Screening criteria are generated as custom yes/no/maybe questions, tested first on a 100-paper sample for fast iteration, then applied to the full set with ranked recommendations and per-paper explanations tied to quotes from the source text. Data extraction similarly uses AI-suggested fields (including structured options and quantitative table extraction) with one-click links to supporting text. The resulting report acts as a first draft and can be updated as new papers are added, supporting living reviews.

How does Elicit help tighten an under-specified research question before any papers are screened?

Users enter a broad question (e.g., microplastics during pregnancy). Elicit then provides feedback suggestions that clarify assumptions, such as specifying the population more precisely, how outcomes are measured, and what exposure definition is being used. Hovering over suggestions shows alternative phrasings, and selecting one updates downstream steps because the research question drives the custom screening and extraction fields.

What are the main ways papers enter an Elicit systematic review, and why does the workflow treat semantic search as complementary?

Papers can be added via Elicit Search (up to 500 papers at once) using semantic search over 126+ million Semantic Scholar records based on title/abstract meaning; via importing from the Elicit Library using tags; or by uploading PDFs. Elicit positions semantic search as a supplement to traditional Boolean/database searches—useful for comprehensiveness—especially when regulatory or publication workflows still require non-AI methods.

How does Elicit make screening criteria both customizable and efficient?

Elicit generates custom screening columns tailored to the research question (e.g., whether exposure assessment is direct maternal exposure, whether quantitative measurement is included, whether participants are humans, and whether birth outcomes are reported). Users can toggle criteria on/off, add their own, and iterate quickly using a representative sample of 100 papers. After criteria look right, the workflow runs the criteria on the full set and ranks papers by likelihood of inclusion.

What does “transparency” mean in screening and extraction inside Elicit?

For each AI-generated decision, Elicit provides explanations and direct links to the underlying text. In screening, each paper gets yes/no/maybe judgments for every criterion with reasons that can be checked against the abstract. In extraction, fields include one-click access to quotes (and table citations when relevant), so users can verify whether the AI pulled the right information and correct misunderstandings.

How does Elicit handle data extraction from tables and structured fields?

Extraction fields can be free-form or structured (including predefined options and yes/no/maybe). Elicit can extract detailed quantitative data from tables and supports searching within large tables for specific values (e.g., standard deviation) or authors. The system also supports custom columns beyond the auto-suggested set.

What does the generated report do, and how is it meant to fit into a researcher’s workflow?

The report summarizes included studies and synthesizes the evaluation process. It includes a one-liner, a detailed abstract-style summary, a PRISMA-inspired diagram of screening across criteria, and expandable summaries of study designs and extracted columns. It’s designed as a first draft or synthesis input—not a standalone final systematic review—and can be exported (PDF) or shared, with collaboration and chat-based follow-up on the underlying data.

Review Questions

  1. When and why should a user iterate screening criteria on a 100-paper sample before running on the full set?
  2. How does Elicit Search’s semantic approach differ from Boolean keyword searching, and how does the workflow recommend using both together?
  3. What mechanisms let users verify and correct AI-generated screening/extraction decisions inside Elicit?

Key Points

  1. 1

    Elicit automates systematic reviews end-to-end—question refinement, paper gathering, screening, extraction, and report drafting—while keeping AI outputs traceable to source text.

  2. 2

    Research-question feedback helps specify population, exposure measurement, and outcomes so downstream screening and extraction fields match the actual intent.

  3. 3

    Elicit Search can import up to 500 papers at once using semantic search over 126+ million Semantic Scholar records, and it’s positioned as a supplement to traditional keyword/Boolean workflows.

  4. 4

    Screening criteria are custom-generated, editable, and first tested on a 100-paper sample to speed iteration before applying to the full set.

  5. 5

    Screening and extraction decisions include per-criterion explanations and one-click links to quotes or table evidence, enabling manual overrides.

  6. 6

    Data extraction supports structured fields and quantitative table extraction, with exportable results (CSV) and limits described as up to ~3,000 cells per extraction run.

  7. 7

    The report functions as a first-draft synthesis (with PRISMA-inspired flow and study summaries) and supports collaboration and living-review updates by reusing prior criteria and fields.

Highlights

Elicit’s workflow is “automation first” but built for auditability: every AI-generated screening or extraction output links back to quotes from the underlying text.
Semantic search can pull up to 500 relevant papers at once from 126+ million Semantic Scholar records, reducing the chance of missing studies that keyword searches overlook.
Screening criteria are generated as custom yes/no/maybe questions tailored to the research question, then iterated on a 100-paper sample before running across the full set.
Extraction can pull detailed quantitative values from tables, with one-click access to the exact supporting evidence.
The generated report is designed as a usable first draft, complete with a PRISMA-inspired diagram and expandable study summaries, and can be updated as new papers arrive.

Topics

Mentioned