LLMs & Agentic AI for Scientific Discovery

Curated resources on LLM agents for drug discovery, synthesis planning, and healthcare applications
chemical-science
machine-learning
resources
AI
healthcare
Published

December 15, 2025

Last updated: January 2026

Large language models are reshaping how we approach early-phase discovery. This guide covers practical applications in synthesis planning, molecular design, clinical workflows, and autonomous experimentation.

Why This Matters

LLMs unlock four capabilities for discovery teams:

  1. Low-data modeling — Pre-trained representations enable predictions with limited experimental data
  2. Multi-endpoint prediction — One model handles ADME, tox, and activity across modalities
  3. Workflow orchestration — Natural language interfaces to complex multi-step procedures
  4. Literature-scale reasoning — Query and synthesize across millions of papers

The field is moving from chatbots to agents that plan, execute, and iterate autonomously.

Year-end reviews

Agents for Early-Phase Discovery

Autonomous Synthesis

Agent What It Does Links
Coscientist Plans and executes organic syntheses via lab robotics NatureCode
LLM-RDF End-to-end reaction development platform Nature CommsCode
ChemAgent Multiagent robotic chemist for on-demand synthesis JACS
SynAsk ReAct agent with yield prediction and retrosynthesis Chem Sci

Molecular Design & Optimization

Agent What It Does Links
ChemCrow Tool-augmented LLM for similarity search, format conversion, route planning arXivCode
dZiner Inverse design of materials with AI agents arXivCode
CACTUS Chemistry agent connecting tool usage to science ACS OmegaCode
Llamole Multimodal LLM for inverse design with retrosynthesis arXiv
SynCraft LLM for given access to molecule edits to make designs more synthesizable arXiv

Simulation & Dynamics

Agent What It Does Links
DynaMate Autonomous protein-ligand MD simulations arXivCode
MDCrow Automates molecular dynamics workflows arXivCode
El-Agente Autonomous quantum chemistry calculations arXiv
DREAMS DFT-based agentic materials simulation arXiv

Biology & Protein Engineering

Agent What It Does Links
Virtual Lab Multi-agent nanobody design with experimental validation bioRxivCode
ProtAgent Protein discovery via multi-agent collaboration arXivCode
CRISPR-GPT Automated gene-editing experiment design arXiv
SpatialAgent Autonomous spatial biology analysis bioRxiv
BioDiscoveryAgent Biomedical discovery from Stanford SNAP Code

Healthcare Applications

Clinical Reasoning & Decision Support

System Application Links
TxAgent Therapeutic reasoning across biomedical tools via Tool-RAG arXivCode
Personal Health Agent Anatomy of a personal health assistant arXiv
Med-Gemini Google’s medical multimodal foundation model arXiv
G-Mode Healthcare AI platform Website

Clinical Text & Documentation

System Application Links
Clinical Summarization LLMs outperforming experts in 36% of cases Nature MedCode
CLEAR Clinical entity augmented retrieval for extraction npj Digital Med

Evaluation

Benchmark Focus Link
MedHELM 121 clinical tasks across 35 benchmarks Website
The Optimization Paradox Multi-agent clinical systems analysis arXiv

Foundation Models for Discovery

Chemistry & Small Molecules

Model Strength Links
Tx-LLM Multi-endpoint ADME prediction, positive transfer learning arXiv
NatureLM Cross-domain: molecules, proteins, materials arXiv
LlaSMol Instruction-tuned for chemistry tasks Website
BioT5+ IUPAC integration, numerical tokenization arXiv
Chemformer Pre-trained transformer for computational chemistry Code
ether0 Scientific reasoning model for chemistry (GRPO-trained) Code

Proteins & Genomics

Model Strength Links
Evo2 Genomic foundation model Arc Institute
ProCyon Multimodal protein phenotype prediction bioRxiv
EvoDiff Diffusion-based protein generation in sequence space Code
BioEmu-1 Protein dynamics and conformational changes Microsoft

Literature & Knowledge Systems

RAG for Science

System Application Links
PaperQA RAG agent that doesn’t hallucinate citations arXiv
STORM Wikipedia-style article generation from sources Code
WikiCrow Automated scientific knowledge synthesis Website
MERMaid Multimodal extraction from PDFs using VLMs ChemRxiv

Data Curation

Building Agents

Frameworks

SDK Best For Link
LangGraph Graph-based workflows, multi-agent Code
PydanticAI Type-safe agent development Code
Smolagents Lightweight HuggingFace ecosystem Code
CrewAI Multi-agent orchestration Website

Engineering Patterns

  • Claude Think Tool — Dedicated reasoning step for complex tool chains
  • AvaTaR — Contrastive learning for tool-use optimization (DSPy integrated)
  • ADAS — Automated design of agentic systems
  • Aviary — Training language agents as MDPs on scientific tasks

Infrastructure

Category Tools
Code Sandbox E2B, Arrakis
Browser browser-use
Documents Reducto, Firecrawl
MCP Servers UniProt, Benchling

Benchmarks

Chemistry

Benchmark Focus Link
ChemIQ Chemical intelligence beyond MCQs Code
ChemBench General chemistry understanding Website
MaCBench Materials chemistry HuggingFace
FGBench Molecule functional group-level reasoning using LLMs arXiv

Biology & Healthcare

Benchmark Focus Link
LAB-Bench Laboratory protocol execution arXiv
BixBench Bioinformatics tasks arXiv
Bio-ML Biology ML evaluation bioRxiv

Agent Reliability

  • SWE-bench — Software engineering
  • HAL — General agent reliability

Current Limitations

Reliability: Non-deterministic outputs; query phrasing significantly affects results.

Tool Selection: Performance degrades as available tools increase.

Memory: Long workflow context management remains challenging.

Bottom line: Agents demonstrate impressive capabilities but lack production reliability. Build metrics that capture both.

Key Reading

Reviews - LLMs and Autonomous Agents in Chemistry — White lab survey - Empowering Biomedical Discovery with AI Agents — Zitnik lab (Cell) - Foundation Models for Materials Discovery — npj Computational Materials

Engineering - Building Effective Agents — Anthropic - 12-Factor Agents — Human Layer - Small Agents Position Paper — NVIDIA

Comprehensive Agent Timeline

Date Agent Description Links
2025.12 DynaMate Protein-ligand MD automation PaperCode
2025.09 Personal Health Agent Health assistant architecture Paper
2025.07 DREAMS DFT-based materials simulation Paper
2025.05 ROBIN Multi-agent scientific discovery PaperCode
2025.05 El-Agente Quantum chemistry automation Paper
2025.04 SpatialAgent Spatial biology Paper
2025.03 TxAgent Therapeutic reasoning PaperCode
2025.01 DataAnalysisCrow Jupyter notebook agent Code
2024.12 Aviary Agent training framework PaperCode
2024.11 Virtual Lab Multi-agent nanobody design PaperCode
2024.11 DrugAgent Drug discovery automation Paper
2024.11 LLM-RDF Chemical synthesis platform PaperCode
2024.10 dZiner Inverse materials design PaperCode
2024.10 CACTUS Chemistry tool usage PaperCode
2024.06 LLaMP Materials knowledge retrieval PaperCode
2024.04 CRISPR-GPT Gene-editing design Paper
2024.02 TAIS Gene expression discovery Paper
2024.02 WikiCrow Scientific knowledge synthesis Website
2024.02 STORM Wikipedia-style generation PaperCode
2024.02 SciAgent Tool-augmented scientific reasoning Paper
2024.01 ProtAgent Protein discovery multi-agent PaperCode
2023.12 PaperQA Scientific literature RAG Paper
2023.12 Coscientist Autonomous chemical research PaperCode
2023.12 Eunomia Materials data from literature PaperCode
2023.11 eXpertAI Structure-property XAI PaperCode
2023.10 BioPlanner Protocol planning evaluation PaperCode
2023.09 IBM ChemChat Molecular discovery Paper
2023.08 ChatMOF MOF prediction and generation PaperCode
2023.06 AmadeusGPT Animal behavior analysis PaperCode
2023.04 ChemCrow Chemistry tools for LLMs PaperCode

Industry & Products

Lab Automation: Tetsuwan, Artificial AI, Lila Science, Mithri

Research Assistants: LOWE (Recursion), Mara (Nanome), Edison, SuperBio, Sam (ScienceMachine)

Platforms: FutureHouse, G-Mode, Biomni