EPIC! ChatGPT vision does INSANE things for research!

TL;DR

Upload schematics, graphs, and photo figures to extract process steps and key trends, then convert those into paragraph-ready text for manuscripts.

Briefing Cornell Notes

Briefing

ChatGPT’s image understanding is becoming a practical shortcut for science writing: upload figures, ask it to interpret what’s happening, then iterate until the key trends and details match the paper—turning hours of figure comprehension and draft text into a fast first pass. The most striking results come from feeding it peer-reviewed-style schematics, complex multi-axis graphs, and photo panels, often without providing captions. In those tests, it reliably identified the overall process steps (including the order of operations), extracted meaningful relationships from cluttered plots, and summarized what the figure is communicating—then converted that understanding into paragraph-ready text suitable for a thesis or manuscript.

A clear example involves a schematic describing a transparent electrode made from carbon nanotubes and silver nanowires. Without the figure caption, ChatGPT still recognized the process structure: combining two solutions, passing them through filter paper, and describing what remains on the filter paper (including thickness/pattern formation). It made at least one mistake—stacking filter papers rather than the intended setup—but the error was described as fixable with a short follow-up correction. The same pattern showed up with a highly complex graph featuring many arrows, shared axes, and multiple data elements. Even without the caption, it produced an explanation that matched the major trends, such as the relationship between weight fraction of single-wall carbon nanotubes and properties like sheet resistance and average spectral transparency. While minor inaccuracies appeared, the “important trends” were treated as correct, which is exactly what researchers need when drafting figure narratives.

Photographic figures also worked well, especially when captions were included. For a structural stability experiment using acetone exposure across different film compositions, ChatGPT generated a paragraph that captured degradation behavior across columns—rapid degradation for the pure silver nanowire film and more limited structural changes for films with 20% single-wall carbon nanotube weight fraction. The transcript emphasizes that writing or supplying accurate figure captions boosts performance, likely because captions encode the experimental intent and column labels that the model might otherwise infer imperfectly.

Tables were the weak spot. A low-resolution table with mixed text and numbers was summarized correctly at a high level, but it struggled with specific numeric comparisons—such as which RS or RP values increased versus decreased. The workaround proposed is iterative prompting: correct the model by pointing out what’s wrong, then ask for a revised paragraph. Because the workflow is conversational rather than one-shot, the model can be guided toward the correct interpretation.

Finally, the transcript draws a comparison between paid and free tiers. ChatGPT Plus handled the same schematic prompts far better than a free/open model (“lava”), which missed key details in the schematic—details that matter most for research accuracy. The practical takeaway is a workflow: (1) screenshot someone else’s confusing figure and ask for an explanation, (2) upload your own figures and request trend extraction, (3) ask for paragraph-ready text, then (4) correct any mistakes and re-run the request. Used this way, image-capable ChatGPT becomes a self-editing tool for figure clarity and a time-saver for the “boring parts” of drafting scientific writing.

Cornell Notes

Image-capable ChatGPT can speed up scientific writing by interpreting figures and turning them into manuscript-ready text. In tests with peer-reviewed-style schematics, complex graphs, and photo panels, it often captured the right process steps and the correct overall trends—even when captions were omitted. Accuracy improves when figure captions and column labels are provided, especially for photographic experiments. Tables are harder when resolution is low, but iterative back-and-forth prompting can correct numeric mistakes. Paid ChatGPT Plus outperformed a free/open model on schematic detail extraction, suggesting tier differences matter for research-grade fidelity.

How does ChatGPT’s figure understanding translate into faster paper writing?

The workflow described is: upload figures (schematics, graphs, photos), ask for an explanation of what’s happening and the key trends, then request a paragraph suitable for a peer-reviewed publication. The model produces a first draft narrative based on the visual content, which can then be corrected with a short follow-up prompt when it misses details.

What happens when the model is given a schematic without its caption?

For a schematic involving carbon nanotubes and silver nanowires used to create a transparent electrode, ChatGPT still recognized the process structure: combining two solutions, passing through filter paper, and describing material/pattern formation on the filter paper. It made at least one setup-related error (filter paper stacking), but the transcript frames that as quickly fixable with a brief correction prompt.

Can it handle complex graphs with many arrows and shared axes?

Yes, it can still produce an explanation that matches major trends. In the example with a cluttered multi-element graph, it described the relationship between the weight fraction of single-wall carbon nanotubes and properties such as sheet resistance and average spectral transparency. It was not perfect, but it correctly identified the important directional relationships.

Why do captions matter for photographic figures?

Captions provide experimental context and column labeling that the model can use to avoid guessing. In the acetone structural integrity experiment, including captions enabled ChatGPT to generate a column-by-column narrative: rapid degradation for the pure silver nanowire film and more limited structural changes for films with 20% single-wall carbon nanotube weight fraction.

What are the limitations with tables, and how are they addressed?

Low-resolution tables caused numeric interpretation errors—such as which RS/RP values increased or decreased. The proposed fix is conversational correction: point out the specific discrepancy, then ask for a revised paragraph so the narrative matches the correct numeric comparisons.

How did ChatGPT Plus compare to a free/open model on the same schematic task?

ChatGPT Plus produced a more detailed and accurate schematic interpretation, while the free/open model (“lava”) missed important schematic details. The transcript attributes the gap to the need for fine-grained detail extraction, which is crucial for research accuracy.

Review Questions

When would you omit a figure caption, and what kinds of errors might you expect if you do?
What iterative prompting steps would you use to correct a mistaken numeric comparison in a table?
Which figure types (schematics, graphs, photos, tables) seem most reliable for trend extraction, and why might that differ?

Key Points

1
Upload schematics, graphs, and photo figures to extract process steps and key trends, then convert those into paragraph-ready text for manuscripts.
2
Expect minor mistakes even when major trends are correct; treat the workflow as iterative rather than one-shot.
3
Including accurate figure captions and column labels improves performance, particularly for photographic multi-panel experiments.
4
Complex graphs with shared axes and many annotations can still be summarized effectively, with the most important relationships often captured correctly.
5
Tables are more error-prone when resolution is low; correct numeric discrepancies through follow-up prompts and request a revised paragraph.
6
Paid ChatGPT Plus outperformed a free/open model on schematic detail extraction, suggesting tier choice affects research-grade fidelity.
7
Use figure interpretation as a self-editing tool: if the model can’t parse a figure, readers may struggle too.

Highlights

ChatGPT can interpret peer-reviewed-style schematics and multi-axis graphs well enough to draft the narrative around the correct trends—even without captions.

Supplying figure captions and column labels markedly improves accuracy for photographic experimental panels.

Low-resolution tables are where numeric mistakes show up most often, but conversational correction can fix them quickly.

ChatGPT Plus handled schematic detail extraction far better than a free/open model (“lava”), which missed key details.

A practical workflow emerges: explain → extract trends → draft paragraph → correct → regenerate.

Topics

ChatGPT Vision
Peer-Reviewed Figures
Scientific Writing
Figure Captioning
Model Accuracy