Get AI summaries of any video or article — Sign up free
Claude 3.5 Sonnet for Research - is it any good? thumbnail

Claude 3.5 Sonnet for Research - is it any good?

Andy Stapleton·
4 min read

Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Claude 3.5 Sonnet fails to retrieve actual peer-reviewed papers when asked, offering generic search guidance instead.

Briefing

Claude 3.5 Sonnet is positioned as a research assistant with “graduate level reasoning,” improved vision, and new “artifacts,” but its performance across common academic workflows lands closer to a C+ than a replacement for established tools. The biggest gap is reliability on research-specific tasks: when asked to find the most relevant peer-reviewed papers on transparent electrodes, it fails to actually retrieve papers and instead offers generic guidance—an outcome that makes it a poor fit against search-first competitors like Perplexity.

Where Claude 3.5 does help, it tends to deliver scaffolding rather than depth. For a literature review outline on OPV devices, it produces a usable structure, but the bullet points stay thin compared with what ChatGPT can generate when asked to expand each section. In academic editing, Claude performs better: it offers supportive, professional feedback on an abstract draft and then tightens the writing while preserving key information and emphasizing significance—useful for revision, even if it still feels more like editing help than research-grade synthesis.

Vision is another mixed bag. Claude promises improved state-of-the-art vision, and when given five research figures, it can interpret what’s in the images and propose an ordering that follows a narrative arc (e.g., fabrication process, microscopy, then electrical performance plots). It even explains why the sequence makes sense based on captions and content. Still, it misorders at least one figure relative to the user’s intended labeling, and the overall experience doesn’t match the convenience and depth users get from ChatGPT or Perplexity’s vision/search workflows.

Claude also handles paper comprehension and “next steps” reasonably well. After uploading a peer-reviewed paper, it provides a simplified explanation and can suggest follow-on research directions such as optimizing cooling rates, scaling up studies, and assessing long-term stability—exactly the kind of gap-finding researchers need. But the summary quality is described as merely “okay,” not transformative.

The “artifacts” angle—marketed with examples that look like image generation—turns into a frustration point. Claude refuses to create or manipulate images in the way the user expects, even when asked for a poster layout or an SVG-based graphic. It can output a rough pseudo-SVG template, but it doesn’t deliver a functional, researcher-ready artifact. The bottom line: Claude 3.5 Sonnet is useful for writing and light editing, and it can interpret figures, but it still falls short as a full research copilot compared with tools that can reliably search literature and produce richer, more actionable outputs.

Cornell Notes

Claude 3.5 Sonnet earns its best marks for writing support and basic academic editing. It can generate a workable literature-review outline for OPV devices, tighten an abstract while preserving key information, and offer a simplified explanation of a peer-reviewed paper plus plausible next research steps (e.g., cooling-rate optimization, scale-up, and long-term stability). Its vision can interpret uploaded figures and propose a logical narrative order, but it may misplace figure sequencing and doesn’t match the depth or workflow smoothness of ChatGPT or Perplexity. The “artifacts” promise disappoints when image generation or proper SVG output isn’t delivered, limiting its usefulness for poster-style deliverables.

Why does Claude 3.5 Sonnet struggle with the “find peer-reviewed papers” task?

When prompted to “find me the most relevant peer reviewed papers on transparent electrodes,” it doesn’t retrieve or list actual papers. Instead, it returns generic instructions about how to search and suggests broad areas of interest. That makes it a poor substitute for tools like Perplexity that can search scientific databases directly.

What does Claude 3.5 do well in literature-review drafting, and what’s missing?

For an outline of a literature review on OPV devices, Claude correctly recognizes that OPV devices is the target area and produces a structured outline. However, the bullet points under each section are brief and don’t expand into the deeper, more detailed content that ChatGPT can generate when asked to flesh out each subsection.

How effective is Claude 3.5 for academic editing of an abstract?

Claude provides feedback that stays professional and constructive, including advice that criticism should not become personal. It then revises the abstract by tightening wording and improving clarity while keeping the key information and emphasizing the significance of the findings. The result is described as a good use of Claude for writing and revision.

How does Claude 3.5 perform when organizing uploaded research figures?

After uploading five figures, Claude proposes an ordering based on what it can read from the images and captions. It explains a narrative sequence—fabrication process, microscopy/SEM-AFM and related micrographs, then electrical performance plots (current density and voltage). Still, it doesn’t perfectly follow the user’s intended figure numbering and produces an order that differs from the labels the user expected.

What limitations appear around “artifacts” and image/SVG creation?

Claude refuses to generate or manipulate images when asked for a poster layout or an actual image-based deliverable. Even when asked for an SVG, it returns a basic pseudo-SVG template rather than a functional, properly rendered output. The mismatch between the marketing-style examples and the actual capabilities limits its usefulness for researcher-ready visuals.

Review Questions

  1. In which tasks does Claude 3.5 Sonnet act more like an editor than a research assistant, and why does that matter for researchers?
  2. What evidence from the figure-ordering test suggests Claude’s vision is helpful but not fully reliable for publication workflows?
  3. How do the “artifacts” limitations affect Claude’s usefulness for poster or graphic deliverables compared with text-based tasks?

Key Points

  1. 1

    Claude 3.5 Sonnet fails to retrieve actual peer-reviewed papers when asked, offering generic search guidance instead.

  2. 2

    It produces usable literature-review outlines for OPV devices, but its section details are thinner than what ChatGPT can generate.

  3. 3

    Claude’s abstract editing is a strong point: it provides constructive feedback and improves clarity while preserving key information.

  4. 4

    Claude can interpret uploaded figures and propose a logical narrative order, but figure sequencing may not match user intent.

  5. 5

    Paper explanations and “next steps” suggestions are generally adequate for identifying research gaps, though not described as exceptional.

  6. 6

    “Artifacts” and image/SVG generation expectations don’t match the delivered capabilities, limiting poster-style output.

  7. 7

    Overall, Claude 3.5 is best treated as a writing/editing aid rather than a full research copilot.

Highlights

Claude 3.5’s literature search attempt doesn’t return papers—just instructions—making it a weak replacement for search-capable tools.
In abstract revision, Claude gives professional, constructive feedback and tightens wording while keeping the core message intact.
Uploaded-figure ordering works to a point: Claude can read content and justify a narrative sequence, but it can still mis-sequence figures.
The “artifacts” promise runs into hard limits when image generation or proper SVG deliverables aren’t produced.

Topics

Mentioned