LLM Agents That Run Their Own Experiments

The line between "AI writes about science" and "AI does science" is getting blurry fast. A new class of LLM agents doesn't just summarize papers or write code — they design experiments, control lab equipment, and iterate on results with minimal human intervention.

AI-Mandel: The AI Physicist

Built by Sören Arlt, Xuemei Gu, and Mario Krenn[1], AI-Mandel is an LLM agent that generates and implements ideas in quantum physics. Here's the workflow:

Reads literature — Formulates novel research ideas from existing papers
Uses PyTheus — A graph-based quantum experiment design framework that translates concepts into concrete lab setups
Generates Python code — Produces experiment-ready programs
Ideas are actually good — Two of its generated ideas already led to independent follow-up papers by human researchers

Specific ideas it came up with include new variations of quantum teleportation, quantum network primitives using indefinite causal orders, and novel geometric phases based on closed loops of quantum information transfer.

Coscientist: The Chemistry Robot

From Carnegie Mellon University[2], Coscientist (powered by GPT-4) operates in actual chemistry labs. It has:

Executed palladium-catalyzed cross-coupling reactions (Suzuki–Miyaura and Sonogashira)
Designed experiments faster and more accurately than humans working alone
A modular architecture with a central "planner" that commands GOOGLE, PYTHON, DOCUMENTATION, and EXPERIMENT modules

It bridges LLM reasoning with physical lab equipment — not just simulation.

The AI Scientist: Full Pipeline Automation

Sakana AI's The AI Scientist[3] (August 2024) aims for the full research lifecycle:

Brainstorms novel research ideas
Writes and executes code
Runs experiments
Generates complete scientific manuscripts
Performs automated peer review on its own output

It produces papers for about $15 each. One of its generated papers was accepted to a workshop track at a top ML conference.

Agent Laboratory: The Research Team

Agent Laboratory[4] uses multiple specialized LLM agents working together:

PhD agents — Literature review via arXiv
MLE-solver — Technical implementation
Paper-solver agents — Report writing

There's also AgentRxiv[5] (March 2025), where autonomous agents upload, retrieve, and build on each other's research — cumulative scientific discovery without human intermediaries.

The Three Levels of Autonomy

Researchers classify LLM scientific agents into three tiers:

Level 1 — Tool LLMs: Assist with single tasks (summarize, code snippet). Human directs everything.
Level 2 — Analyst LLMs: Chain subtasks, perform statistical analysis, synthesize multiple documents. Reduced human intervention.
Level 3 — Scientist LLMs: Full autonomy: hypothesis generation, experimental design, execution, and paper drafting with minimal human oversight.

Why This Matters

We're not at "AI replaces scientists" yet. But we're at "AI is a genuine co-investigator" — one that works 24/7, reads everything, never forgets a paper, and can iterate through a thousand failed ideas before breakfast. The bottleneck is shifting from "who has the idea" to "who validates it."

References

[1] Arlt, S., Gu, X., & Krenn, M. (2025). AI-Mandel: Towards autonomous quantum physics research using LLM agents. [arXiv:2511.11752]
[2] Boiko, D. A., et al. (2023). Autonomous chemical research with large language models. Nature 624, 570–578. [DOI]
[3] Lu, C., et al. (2024). The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery. [arXiv:2408.06292]
[4] Schmidgall, S., et al. (2025). Agent Laboratory: Using LLM Agents as Research Assistants. [arXiv:2501.04227]
[5] Ghosal, A., et al. (2025). AgentRxiv: Towards Collaborative AI Research. [arXiv:2503.18102]