What I've Learned Testing 100+ AI Tools For Research
Based on Andy Stapleton's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Large language models can handle most research tasks effectively when prompts include context, a precise request, constraints, and an explicit output format.
Briefing
Testing more than 100 AI tools for research leads to a blunt takeaway: large language models (LLMs) now handle most research tasks well—writing, editing, summarizing, and general assistance—provided the prompts are built correctly. The practical edge isn’t the tool’s brand or interface; it’s the prompt structure. A strong prompt includes (1) context (e.g., “I’m an expert researcher” or “I’m doing a literature review on X”), (2) a clear request plus constraints (what to produce and what to avoid, including formatting rules), and (3) an output format and audience expectation (for example, “write this for a peer-reviewed panel of experts”). With those elements in place, LLM-based tools like ChatGPT, Perplexity, Claude, and Bing can deliver useful results across many steps of the research workflow.
Even so, research isn’t one single tool. The ecosystem is fragmented: different tools excel at different sub-tasks, from literature review and semantic search to data handling, writing support, and gauging consensus across a field. Examples mentioned include Scispace for literature reviews, ChatPDF as an earlier starting point for paper-focused workflows, Julius AI for data-related work, and Consensus for identifying agreement across papers. The current reality is “wild west” tooling—interfaces, models, and prompts can change quickly, meaning a favorite tool can become less useful overnight. That volatility is why researchers should build their own AI toolkit by chaining multiple tools together rather than betting everything on one platform.
The workflow also requires iteration. AI output often needs back-and-forth refinement: if the response misses the mark, the user should provide additional detail, tighten constraints, and request a revised output until it matches the intended use. Over time, the “meat brain” still matters—humans add judgment, polish, and improvements on top of AI drafts.
There’s also a behavioral lesson: avoid both overspending and overwhelm. Using every tool all the time is impractical—time and cost add up, and “AI fatigue” (the burnout from constantly trying new tools) is normal. Instead, once a tool works for a specific research need, stick with it and only swap when it stops serving that purpose. The goal is to reduce FOMO unless a new option offers a truly large leap.
Finally, compliance with publication rules remains essential. Major publishers have shifted from strict bans on AI toward more permissive policies they can’t reliably enforce, but they still expect authors to follow specific guidelines. The advice is to check the rules of the target publisher (an example given is Nature Publishing Group) and adapt the workflow accordingly. In the near term, researchers should treat AI as a collaborative system—guided by strong prompts, curated toolchains, and ongoing rule-checking—while expecting consolidation into larger “mega platforms” later.
Cornell Notes
The core finding is that modern large language models can perform most research assistance tasks effectively, but results depend heavily on prompt design. A useful prompt includes context (who the user is and what they’re doing), a precise request with constraints (what to do and what to avoid), and an explicit output format and audience. Because AI tools change rapidly and each tool tends to specialize in certain steps, researchers should build a flexible toolkit by chaining different tools rather than relying on one. Iteration is normal—responses often need refinement through follow-up prompts. Finally, researchers must follow publisher-specific AI policies and avoid tool-chasing burnout (AI fatigue and FOMO).
Why does the transcript claim LLMs can handle “almost everything” in research, and what condition makes that true?
What four-part prompt recipe is given for getting better research outputs?
How does the transcript justify using multiple AI tools instead of one platform?
What workflow behavior is recommended when an AI response is wrong or incomplete?
What risks come with relying on a favorite AI tool long-term?
How should researchers handle publication rules and AI usage?
Review Questions
- What elements must a prompt include to reliably produce research-ready outputs, and how do constraints affect the result?
- Why does the transcript recommend building a chain of AI tools rather than using a single platform for the entire research process?
- What practical steps does the transcript suggest for staying compliant with publisher AI policies while avoiding AI fatigue and FOMO?
Key Points
- 1
Large language models can handle most research tasks effectively when prompts include context, a precise request, constraints, and an explicit output format.
- 2
Prompting is treated as a skill: specifying what to avoid (e.g., no tables or no certain topics) materially changes the output quality.
- 3
Research workflows benefit from chaining specialized tools because different platforms excel at different steps like literature review, data work, writing, and consensus finding.
- 4
AI outputs often require iterative back-and-forth; refining constraints and adding detail is normal rather than a sign of failure.
- 5
AI tools can change rapidly (models, prompts, interfaces), so personal toolkits should be flexible and updated when performance drops.
- 6
Avoid overspending and overwhelm by not using every tool all the time; stick with what works unless a new tool offers a major improvement.
- 7
Publication success depends on following publisher-specific AI rules, which have generally loosened but still require compliance (e.g., Nature Publishing Group guidelines).