Last updated: January 2026
Large language models are reshaping how we approach early-phase discovery. This guide covers practical applications in synthesis planning, molecular design, clinical workflows, and autonomous experimentation.
Why This Matters
LLMs unlock four capabilities for discovery teams:
- Low-data modeling — Pre-trained representations enable predictions with limited experimental data
- Multi-endpoint prediction — One model handles ADME, tox, and activity across modalities
- Workflow orchestration — Natural language interfaces to complex multi-step procedures
- Literature-scale reasoning — Query and synthesize across millions of papers
The field is moving from chatbots to agents that plan, execute, and iterate autonomously.
Agents for Early-Phase Discovery
Autonomous Synthesis
| Coscientist |
Plans and executes organic syntheses via lab robotics |
Nature ・ Code |
| LLM-RDF |
End-to-end reaction development platform |
Nature Comms ・ Code |
| ChemAgent |
Multiagent robotic chemist for on-demand synthesis |
JACS |
| SynAsk |
ReAct agent with yield prediction and retrosynthesis |
Chem Sci |
Molecular Design & Optimization
| ChemCrow |
Tool-augmented LLM for similarity search, format conversion, route planning |
arXiv ・ Code |
| dZiner |
Inverse design of materials with AI agents |
arXiv ・ Code |
| CACTUS |
Chemistry agent connecting tool usage to science |
ACS Omega ・ Code |
| Llamole |
Multimodal LLM for inverse design with retrosynthesis |
arXiv |
| SynCraft |
LLM for given access to molecule edits to make designs more synthesizable |
arXiv |
Simulation & Dynamics
| DynaMate |
Autonomous protein-ligand MD simulations |
arXiv ・ Code |
| MDCrow |
Automates molecular dynamics workflows |
arXiv ・ Code |
| El-Agente |
Autonomous quantum chemistry calculations |
arXiv |
| DREAMS |
DFT-based agentic materials simulation |
arXiv |
Biology & Protein Engineering
| Virtual Lab |
Multi-agent nanobody design with experimental validation |
bioRxiv ・ Code |
| ProtAgent |
Protein discovery via multi-agent collaboration |
arXiv ・ Code |
| CRISPR-GPT |
Automated gene-editing experiment design |
arXiv |
| SpatialAgent |
Autonomous spatial biology analysis |
bioRxiv |
| BioDiscoveryAgent |
Biomedical discovery from Stanford SNAP |
Code |
Healthcare Applications
Clinical Reasoning & Decision Support
| TxAgent |
Therapeutic reasoning across biomedical tools via Tool-RAG |
arXiv ・ Code |
| Personal Health Agent |
Anatomy of a personal health assistant |
arXiv |
| Med-Gemini |
Google’s medical multimodal foundation model |
arXiv |
| G-Mode |
Healthcare AI platform |
Website |
Clinical Text & Documentation
| Clinical Summarization |
LLMs outperforming experts in 36% of cases |
Nature Med ・ Code |
| CLEAR |
Clinical entity augmented retrieval for extraction |
npj Digital Med |
Evaluation
| MedHELM |
121 clinical tasks across 35 benchmarks |
Website |
| The Optimization Paradox |
Multi-agent clinical systems analysis |
arXiv |
Foundation Models for Discovery
Chemistry & Small Molecules
| Tx-LLM |
Multi-endpoint ADME prediction, positive transfer learning |
arXiv |
| NatureLM |
Cross-domain: molecules, proteins, materials |
arXiv |
| LlaSMol |
Instruction-tuned for chemistry tasks |
Website |
| BioT5+ |
IUPAC integration, numerical tokenization |
arXiv |
| Chemformer |
Pre-trained transformer for computational chemistry |
Code |
| ether0 |
Scientific reasoning model for chemistry (GRPO-trained) |
Code |
Proteins & Genomics
| Evo2 |
Genomic foundation model |
Arc Institute |
| ProCyon |
Multimodal protein phenotype prediction |
bioRxiv |
| EvoDiff |
Diffusion-based protein generation in sequence space |
Code |
| BioEmu-1 |
Protein dynamics and conformational changes |
Microsoft |
Literature & Knowledge Systems
RAG for Science
| PaperQA |
RAG agent that doesn’t hallucinate citations |
arXiv |
| STORM |
Wikipedia-style article generation from sources |
Code |
| WikiCrow |
Automated scientific knowledge synthesis |
Website |
| MERMaid |
Multimodal extraction from PDFs using VLMs |
ChemRxiv |
Building Agents
Frameworks
| LangGraph |
Graph-based workflows, multi-agent |
Code |
| PydanticAI |
Type-safe agent development |
Code |
| Smolagents |
Lightweight HuggingFace ecosystem |
Code |
| CrewAI |
Multi-agent orchestration |
Website |
Engineering Patterns
- Claude Think Tool — Dedicated reasoning step for complex tool chains
- AvaTaR — Contrastive learning for tool-use optimization (DSPy integrated)
- ADAS — Automated design of agentic systems
- Aviary — Training language agents as MDPs on scientific tasks
Benchmarks
Chemistry
| ChemIQ |
Chemical intelligence beyond MCQs |
Code |
| ChemBench |
General chemistry understanding |
Website |
| MaCBench |
Materials chemistry |
HuggingFace |
| FGBench |
Molecule functional group-level reasoning using LLMs |
arXiv |
Biology & Healthcare
| LAB-Bench |
Laboratory protocol execution |
arXiv |
| BixBench |
Bioinformatics tasks |
arXiv |
| Bio-ML |
Biology ML evaluation |
bioRxiv |
Agent Reliability
- SWE-bench — Software engineering
- HAL — General agent reliability
Current Limitations
Reliability: Non-deterministic outputs; query phrasing significantly affects results.
Tool Selection: Performance degrades as available tools increase.
Memory: Long workflow context management remains challenging.
Bottom line: Agents demonstrate impressive capabilities but lack production reliability. Build metrics that capture both.
Comprehensive Agent Timeline
| 2025.12 |
DynaMate |
Protein-ligand MD automation |
Paper ・ Code |
| 2025.09 |
Personal Health Agent |
Health assistant architecture |
Paper |
| 2025.07 |
DREAMS |
DFT-based materials simulation |
Paper |
| 2025.05 |
ROBIN |
Multi-agent scientific discovery |
Paper ・ Code |
| 2025.05 |
El-Agente |
Quantum chemistry automation |
Paper |
| 2025.04 |
SpatialAgent |
Spatial biology |
Paper |
| 2025.03 |
TxAgent |
Therapeutic reasoning |
Paper ・ Code |
| 2025.01 |
DataAnalysisCrow |
Jupyter notebook agent |
Code |
| 2024.12 |
Aviary |
Agent training framework |
Paper ・ Code |
| 2024.11 |
Virtual Lab |
Multi-agent nanobody design |
Paper ・ Code |
| 2024.11 |
DrugAgent |
Drug discovery automation |
Paper |
| 2024.11 |
LLM-RDF |
Chemical synthesis platform |
Paper ・ Code |
| 2024.10 |
dZiner |
Inverse materials design |
Paper ・ Code |
| 2024.10 |
CACTUS |
Chemistry tool usage |
Paper ・ Code |
| 2024.06 |
LLaMP |
Materials knowledge retrieval |
Paper ・ Code |
| 2024.04 |
CRISPR-GPT |
Gene-editing design |
Paper |
| 2024.02 |
TAIS |
Gene expression discovery |
Paper |
| 2024.02 |
WikiCrow |
Scientific knowledge synthesis |
Website |
| 2024.02 |
STORM |
Wikipedia-style generation |
Paper ・ Code |
| 2024.02 |
SciAgent |
Tool-augmented scientific reasoning |
Paper |
| 2024.01 |
ProtAgent |
Protein discovery multi-agent |
Paper ・ Code |
| 2023.12 |
PaperQA |
Scientific literature RAG |
Paper |
| 2023.12 |
Coscientist |
Autonomous chemical research |
Paper ・ Code |
| 2023.12 |
Eunomia |
Materials data from literature |
Paper ・ Code |
| 2023.11 |
eXpertAI |
Structure-property XAI |
Paper ・ Code |
| 2023.10 |
BioPlanner |
Protocol planning evaluation |
Paper ・ Code |
| 2023.09 |
IBM ChemChat |
Molecular discovery |
Paper |
| 2023.08 |
ChatMOF |
MOF prediction and generation |
Paper ・ Code |
| 2023.06 |
AmadeusGPT |
Animal behavior analysis |
Paper ・ Code |
| 2023.04 |
ChemCrow |
Chemistry tools for LLMs |
Paper ・ Code |
Industry & Products
Lab Automation: Tetsuwan, Artificial AI, Lila Science, Mithri
Research Assistants: LOWE (Recursion), Mara (Nanome), Edison, SuperBio, Sam (ScienceMachine)
Platforms: FutureHouse, G-Mode, Biomni