What I've Learned Testing 100+ AI Tools For Research

TL;DR

Large language models can handle most research tasks effectively when prompts include context, a precise request, constraints, and an explicit output format.

Briefing Cornell Notes

Briefing

Testing more than 100 AI tools for research leads to a blunt takeaway: large language models (LLMs) now handle most research tasks well—writing, editing, summarizing, and general assistance—provided the prompts are built correctly. The practical edge isn’t the tool’s brand or interface; it’s the prompt structure. A strong prompt includes (1) context (e.g., “I’m an expert researcher” or “I’m doing a literature review on X”), (2) a clear request plus constraints (what to produce and what to avoid, including formatting rules), and (3) an output format and audience expectation (for example, “write this for a peer-reviewed panel of experts”). With those elements in place, LLM-based tools like ChatGPT, Perplexity, Claude, and Bing can deliver useful results across many steps of the research workflow.

Even so, research isn’t one single tool. The ecosystem is fragmented: different tools excel at different sub-tasks, from literature review and semantic search to data handling, writing support, and gauging consensus across a field. Examples mentioned include Scispace for literature reviews, ChatPDF as an earlier starting point for paper-focused workflows, Julius AI for data-related work, and Consensus for identifying agreement across papers. The current reality is “wild west” tooling—interfaces, models, and prompts can change quickly, meaning a favorite tool can become less useful overnight. That volatility is why researchers should build their own AI toolkit by chaining multiple tools together rather than betting everything on one platform.

The workflow also requires iteration. AI output often needs back-and-forth refinement: if the response misses the mark, the user should provide additional detail, tighten constraints, and request a revised output until it matches the intended use. Over time, the “meat brain” still matters—humans add judgment, polish, and improvements on top of AI drafts.

There’s also a behavioral lesson: avoid both overspending and overwhelm. Using every tool all the time is impractical—time and cost add up, and “AI fatigue” (the burnout from constantly trying new tools) is normal. Instead, once a tool works for a specific research need, stick with it and only swap when it stops serving that purpose. The goal is to reduce FOMO unless a new option offers a truly large leap.

Finally, compliance with publication rules remains essential. Major publishers have shifted from strict bans on AI toward more permissive policies they can’t reliably enforce, but they still expect authors to follow specific guidelines. The advice is to check the rules of the target publisher (an example given is Nature Publishing Group) and adapt the workflow accordingly. In the near term, researchers should treat AI as a collaborative system—guided by strong prompts, curated toolchains, and ongoing rule-checking—while expecting consolidation into larger “mega platforms” later.

Cornell Notes

The core finding is that modern large language models can perform most research assistance tasks effectively, but results depend heavily on prompt design. A useful prompt includes context (who the user is and what they’re doing), a precise request with constraints (what to do and what to avoid), and an explicit output format and audience. Because AI tools change rapidly and each tool tends to specialize in certain steps, researchers should build a flexible toolkit by chaining different tools rather than relying on one. Iteration is normal—responses often need refinement through follow-up prompts. Finally, researchers must follow publisher-specific AI policies and avoid tool-chasing burnout (AI fatigue and FOMO).

Why does the transcript claim LLMs can handle “almost everything” in research, and what condition makes that true?

It credits large language models (e.g., ChatGPT, Perplexity, Claude, Bing) with strong general capability across common research tasks like writing and editing. The key condition is prompt quality: the model performs well when the prompt includes the right ingredients—clear context, an explicit request with constraints, and a specified output format. Without that structure, the model may produce generic or misaligned results.

What four-part prompt recipe is given for getting better research outputs?

The recipe is: (1) Context—identify the user as a researcher (e.g., “expert researcher”) and describe the task (e.g., literature review on a topic). (2) What you want—state the deliverable (e.g., explain a paper, provide understanding of a topic). (3) Constraints—add limitations such as “don’t talk about X” or “don’t use tables/bullets.” (4) Format and audience—specify how the output should look (paragraph length, bullets, table, graph) and who it’s for (e.g., “peer-reviewed panel of experts”).

How does the transcript justify using multiple AI tools instead of one platform?

It argues the ecosystem is specialized and fragmented: different tools excel at different steps. Examples include Scispace for literature reviews, ChatPDF for paper-focused workflows, Julius AI for data-related tasks, and Consensus for identifying agreement across papers. Since research spans many steps (finding literature, writing, editing, analysis), chaining specialized tools can outperform a single all-purpose option.

What workflow behavior is recommended when an AI response is wrong or incomplete?

The guidance is to treat it as a conversation. If the output isn’t right, the user should respond with corrections and extra detail, tighten constraints, and ask for a revised version. Repeating this loop a few times helps “hone in” on the desired response, after which the human can further improve it.

What risks come with relying on a favorite AI tool long-term?

Tools can change quickly—prompts, underlying models, and interfaces may shift overnight. That means a tool that once worked well can become less useful. The transcript recommends staying open to swapping tools in and out of a personal toolkit when performance drops.

How should researchers handle publication rules and AI usage?

The transcript notes that publishers have moved from strict “no AI” stances toward more permissive policies they can’t fully enforce, but authors still must follow specific rules. It advises checking the target publisher’s guidelines (Nature Publishing Group is cited) and complying with those requirements as they evolve.

Review Questions

What elements must a prompt include to reliably produce research-ready outputs, and how do constraints affect the result?
Why does the transcript recommend building a chain of AI tools rather than using a single platform for the entire research process?
What practical steps does the transcript suggest for staying compliant with publisher AI policies while avoiding AI fatigue and FOMO?

Key Points

1
Large language models can handle most research tasks effectively when prompts include context, a precise request, constraints, and an explicit output format.
2
Prompting is treated as a skill: specifying what to avoid (e.g., no tables or no certain topics) materially changes the output quality.
3
Research workflows benefit from chaining specialized tools because different platforms excel at different steps like literature review, data work, writing, and consensus finding.
4
AI outputs often require iterative back-and-forth; refining constraints and adding detail is normal rather than a sign of failure.
5
AI tools can change rapidly (models, prompts, interfaces), so personal toolkits should be flexible and updated when performance drops.
6
Avoid overspending and overwhelm by not using every tool all the time; stick with what works unless a new tool offers a major improvement.
7
Publication success depends on following publisher-specific AI rules, which have generally loosened but still require compliance (e.g., Nature Publishing Group guidelines).

Highlights

LLMs are positioned as the default research workhorse—if prompts are structured with context, constraints, and a clear output format.

The recommended prompt template is: context + what you want + constraints + format/audience (such as writing for a peer-reviewed panel).

Because tools change overnight and specialize in different tasks, researchers should build a flexible AI toolkit and iterate with the model.

Publisher policies on AI have shifted from strict bans toward enforceable guidelines, so authors must check the rules for where they plan to publish.

AI fatigue and FOMO are treated as normal; the advice is to stop chasing novelty once a toolchain works for a specific research need.

Topics

Prompt Engineering
AI Research Toolkits
Literature Review
Publication Policies
AI Fatigue

Mentioned

Andy Stapleton