Get AI summaries of any video or article — Sign up free
Finally, AI agents that actually work for advanced research. thumbnail

Finally, AI agents that actually work for advanced research.

Andy Stapleton·
5 min read

Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

AI agents that can retrieve and synthesize across academic sources can produce more useful literature overviews than offline ChatGPT-style systems.

Briefing

AI agents are starting to deliver genuinely useful literature overviews for academic research—especially when they can browse and synthesize across multiple sources—while still falling short of replacing a researcher’s judgment.

The contrast is stark when using ChatGPT-style systems without reliable internet access. A prompt like “find the current state of organic photovoltaic devices in the academic literature” produces generic, non-actionable output and relies on a knowledge cutoff rather than pulling the newest papers. Even when such systems can generate summaries, they tend to miss the practical goal of academic work: locating up-to-date studies, extracting key metrics, and organizing findings in a way that supports further reading.

That gap is where newer “AI agent” tools come in. One service, Silatus (spelled ambiguously in the transcript), offers an interface with options labeled “General” and “academic,” plus “fast” and “precise.” When the same organic photovoltaic prompt is run, the agent searches multiple paper sources and returns a structured summary that includes factors, strategies, and a rundown of individual sources. However, the results still skew older in places (the example includes papers from 2019), and the overall output is judged insufficient for serious academic use despite the “academic research” branding.

A second tool, Omni (with the transcript describing it as an “autonomous market research” style agent), performs better on the same task. It produces a more current, literature-grounded overview and breaks the work into subtasks such as identifying the latest advancements, current challenges, efficiency rates, materials, device architecture, stability, lifetime, and manufacturing/commercialization considerations. A notable feature is the visibility into the agent’s internal reasoning steps (“wheels turning”), which makes the workflow feel more like an organized research pass than a single-shot summary.

The transcript then highlights Cognosis, another agent-based research service with a free tier. Using the same organic photovoltaic prompt, it reportedly retrieves papers from multiple publishers rather than sticking to one database ecosystem. In the example, it surfaces a 2023 review article—“Advances in organic photovoltaic sales are comprehensive review” (as quoted)—and provides an up-to-date review-style rundown aligned with the prompt’s intent. Even with only two retrieved articles, the agent is credited with meeting the core requirement: pulling recent, relevant literature and summarizing it in a structured way.

Across all three tools, the message is consistent: these systems can mine and synthesize literature faster than a human can manually scan everything, but they cannot replace the critical thinking and domain intuition that comes from reading, evaluating quality, and deciding what truly fits a research direction. The agents are positioned as a “first touch point” and time-saver—useful for onboarding into a new field or quickly gathering feelers—while researchers still need to verify claims, assess novelty, and apply judgment when writing and citing work.

Cornell Notes

AI agents are increasingly able to produce structured, literature-based research summaries when they can retrieve papers across sources—something ChatGPT-like systems struggle with when they lack internet access. In the transcript’s tests on organic photovoltaic devices, Silatus returns a structured overview but can lag on recency (e.g., examples from 2019). Omni performs better by organizing the task into research subtasks and producing more up-to-date efficiency, materials, architecture, stability, and commercialization coverage, with visible step-by-step reasoning. Cognosis is highlighted for pulling from multiple publishers and surfacing a 2023 review article, delivering an up-to-date review-style summary. Despite the gains, the tools are not a substitute for researcher intuition, critical evaluation, and thesis-level judgment.

Why does a ChatGPT-style approach without internet access fall short for academic research tasks?

When a system can’t reliably browse, it can’t pull the newest papers needed for “current state” questions. In the transcript’s example, the output leans on a knowledge cutoff and produces broad, generic trends rather than a literature-grounded, up-to-date set of citations and metrics. Academic work depends on retrieving recent studies, not just generating plausible summaries.

What does Silatus do with an “academic research” prompt, and where does it disappoint?

Silatus offers “General” vs “academic” modes and “fast” vs “precise.” For the organic photovoltaic prompt, it searches multiple sources (including ACS/RSC-style publisher pages) and returns a structured summary with factors, strategies, and per-source rundowns. The disappointment is recency and usefulness: the example includes papers from 2019 and the overall output is judged not strong enough to justify paying for serious academic research.

How does Omni’s output differ in quality and structure from Silatus in the transcript’s test?

Omni is described as understanding the research brief more effectively and returning a more current set of findings. It breaks the task into explicit subtasks—latest advancements, current challenges, efficiency rates, materials, device architecture, stability/lifetime, and manufacturing/commercialization—and then presents key takeaways plus sections resembling a review. The transcript also highlights that Omni shows an “internal monologue” style breakdown of what it’s doing.

What makes Cognosis stand out for the same organic photovoltaic query?

Cognosis is credited with retrieving papers from multiple sources rather than only one ecosystem. In the example, it returns a 2023 review article and provides an up-to-date review-style summary aligned with the prompt’s goal. Even though the transcript notes it used only two articles, the retrieved content is positioned as directly relevant and recent.

What limitation remains across these AI agent tools, even when they produce good summaries?

They can’t replace critical thinking and domain intuition. The transcript emphasizes that researchers must still evaluate whether a paper truly fits the field’s leading edge, judge novelty and quality, and decide what to cite in a thesis. Agents can accelerate discovery and summarization, but they can’t perform the human-level judgment required for rigorous academic writing.

Review Questions

  1. When an AI tool lacks internet access, what specific failure mode appears for “current state of X” research prompts?
  2. Compare how Silatus, Omni, and Cognosis handle recency and structure in the organic photovoltaic example.
  3. Why does the transcript argue that AI summaries still can’t replace a researcher’s intuition when deciding what to cite?

Key Points

  1. 1

    AI agents that can retrieve and synthesize across academic sources can produce more useful literature overviews than offline ChatGPT-style systems.

  2. 2

    ChatGPT-like systems without internet access tend to generate generic, cutoff-based summaries that don’t meet the “current state” requirement.

  3. 3

    Silatus provides structured academic-style outputs, but recency and depth may fall short for serious research needs.

  4. 4

    Omni organizes literature review tasks into clear research subtasks (efficiency, materials, architecture, stability, commercialization) and appears more current in the example.

  5. 5

    Cognosis highlights multi-publisher retrieval and can surface recent review papers (e.g., a 2023 review) aligned with the prompt.

  6. 6

    Even strong agent summaries require human critical evaluation, intuition, and judgment for thesis writing and citation decisions.

Highlights

A prompt like “current state of organic photovoltaic devices” fails to deliver academically useful results when the system can’t browse and instead relies on knowledge cutoffs.
Omni’s strength is structured, review-like coverage across efficiency, materials, device architecture, stability/lifetime, and commercialization, with visible step-by-step reasoning.
Cognosis is praised for pulling from multiple publishers and returning an up-to-date review paper (including a 2023 article) that matches the prompt’s intent.
Across tools, the bottleneck isn’t finding papers—it’s the human work of evaluating fit, novelty, and quality for rigorous academic claims.

Topics

Mentioned