LLMs & Agentic AI for Scientific Discovery

Last updated: January 2026

Large language models are reshaping how we approach early-phase discovery. This guide covers practical applications in synthesis planning, molecular design, clinical workflows, and autonomous experimentation.

Why This Matters

LLMs unlock four capabilities for discovery teams:

Low-data modeling — Pre-trained representations enable predictions with limited experimental data
Multi-endpoint prediction — One model handles ADME, tox, and activity across modalities
Workflow orchestration — Natural language interfaces to complex multi-step procedures
Literature-scale reasoning — Query and synthesize across millions of papers

The field is moving from chatbots to agents that plan, execute, and iterate autonomously.

Year-end reviews

Agents for Early-Phase Discovery

Autonomous Synthesis

Agent	What It Does	Links
Coscientist	Plans and executes organic syntheses via lab robotics	Nature ・ Code
LLM-RDF	End-to-end reaction development platform	Nature Comms ・ Code
ChemAgent	Multiagent robotic chemist for on-demand synthesis	JACS
SynAsk	ReAct agent with yield prediction and retrosynthesis	Chem Sci

Molecular Design & Optimization

Agent	What It Does	Links
ChemCrow	Tool-augmented LLM for similarity search, format conversion, route planning	arXiv ・ Code
dZiner	Inverse design of materials with AI agents	arXiv ・ Code
CACTUS	Chemistry agent connecting tool usage to science	ACS Omega ・ Code
Llamole	Multimodal LLM for inverse design with retrosynthesis	arXiv
SynCraft	LLM for given access to molecule edits to make designs more synthesizable	arXiv

Simulation & Dynamics

Agent	What It Does	Links
DynaMate	Autonomous protein-ligand MD simulations	arXiv ・ Code
MDCrow	Automates molecular dynamics workflows	arXiv ・ Code
El-Agente	Autonomous quantum chemistry calculations	arXiv
DREAMS	DFT-based agentic materials simulation	arXiv

Biology & Protein Engineering

Agent	What It Does	Links
Virtual Lab	Multi-agent nanobody design with experimental validation	bioRxiv ・ Code
ProtAgent	Protein discovery via multi-agent collaboration	arXiv ・ Code
CRISPR-GPT	Automated gene-editing experiment design	arXiv
SpatialAgent	Autonomous spatial biology analysis	bioRxiv
BioDiscoveryAgent	Biomedical discovery from Stanford SNAP	Code

Healthcare Applications

Clinical Reasoning & Decision Support

System	Application	Links
TxAgent	Therapeutic reasoning across biomedical tools via Tool-RAG	arXiv ・ Code
Personal Health Agent	Anatomy of a personal health assistant	arXiv
Med-Gemini	Google’s medical multimodal foundation model	arXiv
G-Mode	Healthcare AI platform	Website

Clinical Text & Documentation

System	Application	Links
Clinical Summarization	LLMs outperforming experts in 36% of cases	Nature Med ・ Code
CLEAR	Clinical entity augmented retrieval for extraction	npj Digital Med

Evaluation

Benchmark	Focus	Link
MedHELM	121 clinical tasks across 35 benchmarks	Website
The Optimization Paradox	Multi-agent clinical systems analysis	arXiv

Foundation Models for Discovery

Chemistry & Small Molecules

Model	Strength	Links
Tx-LLM	Multi-endpoint ADME prediction, positive transfer learning	arXiv
NatureLM	Cross-domain: molecules, proteins, materials	arXiv
LlaSMol	Instruction-tuned for chemistry tasks	Website
BioT5+	IUPAC integration, numerical tokenization	arXiv
Chemformer	Pre-trained transformer for computational chemistry	Code
ether0	Scientific reasoning model for chemistry (GRPO-trained)	Code

Proteins & Genomics

Model	Strength	Links
Evo2	Genomic foundation model	Arc Institute
ProCyon	Multimodal protein phenotype prediction	bioRxiv
EvoDiff	Diffusion-based protein generation in sequence space	Code
BioEmu-1	Protein dynamics and conformational changes	Microsoft

Literature & Knowledge Systems

RAG for Science

System	Application	Links
PaperQA	RAG agent that doesn’t hallucinate citations	arXiv
STORM	Wikipedia-style article generation from sources	Code
WikiCrow	Automated scientific knowledge synthesis	Website
MERMaid	Multimodal extraction from PDFs using VLMs	ChemRxiv

Data Curation

From text to insight — Tutorial review on LLMs for chemical data extraction (Jablonka group)
LLM organic synthesis extraction — Fine-tuned GPT for parsing synthesis procedures

Building Agents

Frameworks

SDK	Best For	Link
LangGraph	Graph-based workflows, multi-agent	Code
PydanticAI	Type-safe agent development	Code
Smolagents	Lightweight HuggingFace ecosystem	Code
CrewAI	Multi-agent orchestration	Website

Engineering Patterns

Claude Think Tool — Dedicated reasoning step for complex tool chains
AvaTaR — Contrastive learning for tool-use optimization (DSPy integrated)
ADAS — Automated design of agentic systems
Aviary — Training language agents as MDPs on scientific tasks

Infrastructure

Category	Tools
Code Sandbox	E2B, Arrakis
Browser	browser-use
Documents	Reducto, Firecrawl
MCP Servers	UniProt, Benchling

Benchmarks

Chemistry

Benchmark	Focus	Link
ChemIQ	Chemical intelligence beyond MCQs	Code
ChemBench	General chemistry understanding	Website
MaCBench	Materials chemistry	HuggingFace
FGBench	Molecule functional group-level reasoning using LLMs	arXiv

Biology & Healthcare

Benchmark	Focus	Link
LAB-Bench	Laboratory protocol execution	arXiv
BixBench	Bioinformatics tasks	arXiv
Bio-ML	Biology ML evaluation	bioRxiv

Agent Reliability

SWE-bench — Software engineering
HAL — General agent reliability

Current Limitations

Reliability: Non-deterministic outputs; query phrasing significantly affects results.

Tool Selection: Performance degrades as available tools increase.

Memory: Long workflow context management remains challenging.

Bottom line: Agents demonstrate impressive capabilities but lack production reliability. Build metrics that capture both.

Key Reading

Reviews - LLMs and Autonomous Agents in Chemistry — White lab survey - Empowering Biomedical Discovery with AI Agents — Zitnik lab (Cell) - Foundation Models for Materials Discovery — npj Computational Materials

Engineering - Building Effective Agents — Anthropic - 12-Factor Agents — Human Layer - Small Agents Position Paper — NVIDIA

Comprehensive Agent Timeline

Date	Agent	Description	Links
2025.12	DynaMate	Protein-ligand MD automation	Paper ・ Code
2025.09	Personal Health Agent	Health assistant architecture	Paper
2025.07	DREAMS	DFT-based materials simulation	Paper
2025.05	ROBIN	Multi-agent scientific discovery	Paper ・ Code
2025.05	El-Agente	Quantum chemistry automation	Paper
2025.04	SpatialAgent	Spatial biology	Paper
2025.03	TxAgent	Therapeutic reasoning	Paper ・ Code
2025.01	DataAnalysisCrow	Jupyter notebook agent	Code
2024.12	Aviary	Agent training framework	Paper ・ Code
2024.11	Virtual Lab	Multi-agent nanobody design	Paper ・ Code
2024.11	DrugAgent	Drug discovery automation	Paper
2024.11	LLM-RDF	Chemical synthesis platform	Paper ・ Code
2024.10	dZiner	Inverse materials design	Paper ・ Code
2024.10	CACTUS	Chemistry tool usage	Paper ・ Code
2024.06	LLaMP	Materials knowledge retrieval	Paper ・ Code
2024.04	CRISPR-GPT	Gene-editing design	Paper
2024.02	TAIS	Gene expression discovery	Paper
2024.02	WikiCrow	Scientific knowledge synthesis	Website
2024.02	STORM	Wikipedia-style generation	Paper ・ Code
2024.02	SciAgent	Tool-augmented scientific reasoning	Paper
2024.01	ProtAgent	Protein discovery multi-agent	Paper ・ Code
2023.12	PaperQA	Scientific literature RAG	Paper
2023.12	Coscientist	Autonomous chemical research	Paper ・ Code
2023.12	Eunomia	Materials data from literature	Paper ・ Code
2023.11	eXpertAI	Structure-property XAI	Paper ・ Code
2023.10	BioPlanner	Protocol planning evaluation	Paper ・ Code
2023.09	IBM ChemChat	Molecular discovery	Paper
2023.08	ChatMOF	MOF prediction and generation	Paper ・ Code
2023.06	AmadeusGPT	Animal behavior analysis	Paper ・ Code
2023.04	ChemCrow	Chemistry tools for LLMs	Paper ・ Code

Industry & Products

Lab Automation: Tetsuwan, Artificial AI, Lila Science, Mithri

Research Assistants: LOWE (Recursion), Mara (Nanome), Edison, SuperBio, Sam (ScienceMachine)

Platforms: FutureHouse, G-Mode, Biomni