Last updated: March 2026
Large language models are reshaping how we approach early-phase discovery. This guide covers practical applications in synthesis planning, molecular design, clinical workflows, and autonomous experimentation.
Why This Matters
LLMs unlock four capabilities for discovery teams:
- Low-data modeling — Pre-trained representations enable predictions with limited experimental data
- Multi-endpoint prediction — One model handles ADME, tox, and activity across modalities
- Workflow orchestration — Natural language interfaces to complex multi-step procedures
- Literature-scale reasoning — Query and synthesize across millions of papers
The field is moving from chatbots to agents that plan, execute, and iterate autonomously.
Year-end reviews
Local Language Models
hf-agents: Huggingface’s CLI extension to detect best LLMs to run on local hardware
State of OSS LMMs from Nathan Lambert
General agentic capabilities
New evolving fields
Authors introduce a self-referential agents that integrate a task agent and a meta agent (a modifier agent) into a editable program paradigm.
New policy to train agents which can learn ‘taste’ what of is good science from community’s perception of what ‘good’ looks like. Authors call it ‘Reinforcement Learning from Community Feedback’ where they use two models, a ‘Scientific Judge’ to judge the ‘quality’ of the idea, and a separate ‘Scientific Thinker’ to propose research ideas with high potential.
Agents for Early-Phase Discovery
Autonomous Synthesis
| Agent | What It Does | Links |
|---|---|---|
| Coscientist | Plans and executes organic syntheses via lab robotics | Nature ・ Code |
| LLM-RDF | End-to-end reaction development platform | Nature Comms ・ Code |
| ChemAgent | Multiagent robotic chemist for on-demand synthesis | JACS |
| SynAsk | ReAct agent with yield prediction and retrosynthesis | Chem Sci |
Molecular Design & Optimization
| Agent | What It Does | Links |
|---|---|---|
| ChemCrow | Tool-augmented LLM for similarity search, format conversion, route planning | arXiv ・ Code |
| dZiner | Inverse design of materials with AI agents | arXiv ・ Code |
| CACTUS | Chemistry agent connecting tool usage to science | ACS Omega ・ Code |
| Llamole | Multimodal LLM for inverse design with retrosynthesis | arXiv |
| SynCraft | LLM for given access to molecule edits to make designs more synthesizable | arXiv |
Simulation & Dynamics
| Agent | What It Does | Links |
|---|---|---|
| DynaMate | Autonomous protein-ligand MD simulations | arXiv ・ Code |
| MDCrow | Automates molecular dynamics workflows | arXiv ・ Code |
| El-Agente | Autonomous quantum chemistry calculations | arXiv |
| DREAMS | DFT-based agentic materials simulation | arXiv |
Biology & Protein Engineering
| Agent | What It Does | Links |
|---|---|---|
| Virtual Lab | Multi-agent nanobody design with experimental validation | bioRxiv ・ Code |
| ProtAgent | Protein discovery via multi-agent collaboration | arXiv ・ Code |
| CRISPR-GPT | Automated gene-editing experiment design | arXiv |
| SpatialAgent | Autonomous spatial biology analysis | bioRxiv |
| BioDiscoveryAgent | Biomedical discovery from Stanford SNAP | Code |
Healthcare Applications
Clinical Reasoning & Decision Support
| System | Application | Links |
|---|---|---|
| TxAgent | Therapeutic reasoning across biomedical tools via Tool-RAG | arXiv ・ Code |
| Personal Health Agent | Anatomy of a personal health assistant | arXiv |
| Med-Gemini | Google’s medical multimodal foundation model | arXiv |
| G-Mode | Healthcare AI platform | Website |
Clinical Text & Documentation
| System | Application | Links |
|---|---|---|
| Clinical Summarization | LLMs outperforming experts in 36% of cases | Nature Med ・ Code |
| CLEAR | Clinical entity augmented retrieval for extraction | npj Digital Med |
Evaluation
| Benchmark | Focus | Link |
|---|---|---|
| MedHELM | 121 clinical tasks across 35 benchmarks | Website |
| The Optimization Paradox | Multi-agent clinical systems analysis | arXiv |
Foundation Models for Discovery
Chemistry & Small Molecules
| Model | Strength | Links |
|---|---|---|
| Tx-LLM | Multi-endpoint ADME prediction, positive transfer learning | arXiv |
| NatureLM | Cross-domain: molecules, proteins, materials | arXiv |
| LlaSMol | Instruction-tuned for chemistry tasks | Website |
| BioT5+ | IUPAC integration, numerical tokenization | arXiv |
| Chemformer | Pre-trained transformer for computational chemistry | Code |
| ether0 | Scientific reasoning model for chemistry (GRPO-trained) | Code |
Proteins & Genomics
| Model | Strength | Links |
|---|---|---|
| Evo2 | Genomic foundation model | Arc Institute |
| ProCyon | Multimodal protein phenotype prediction | bioRxiv |
| EvoDiff | Diffusion-based protein generation in sequence space | Code |
| BioEmu-1 | Protein dynamics and conformational changes | Microsoft |
Literature & Knowledge Systems
RAG for Science
| System | Application | Links |
|---|---|---|
| PaperQA | RAG agent that doesn’t hallucinate citations | arXiv |
| STORM | Wikipedia-style article generation from sources | Code |
| WikiCrow | Automated scientific knowledge synthesis | Website |
| MERMaid | Multimodal extraction from PDFs using VLMs | ChemRxiv |
Data Curation
- From text to insight — Tutorial review on LLMs for chemical data extraction (Jablonka group)
- LLM organic synthesis extraction — Fine-tuned GPT for parsing synthesis procedures
Building Agents
Frameworks
| SDK | Best For | Link |
|---|---|---|
| LangGraph | Graph-based workflows, multi-agent | Code |
| PydanticAI | Type-safe agent development | Code |
| Smolagents | Lightweight HuggingFace ecosystem | Code |
| CrewAI | Multi-agent orchestration | Website |
Engineering Patterns
- Claude Think Tool — Dedicated reasoning step for complex tool chains
- AvaTaR — Contrastive learning for tool-use optimization (DSPy integrated)
- ADAS — Automated design of agentic systems
- Aviary — Training language agents as MDPs on scientific tasks
Infrastructure
| Category | Tools |
|---|---|
| Code Sandbox | E2B, Arrakis |
| Browser | browser-use |
| Documents | Reducto, Firecrawl |
| MCP Servers | UniProt, Benchling |
Benchmarks
Chemistry
| Benchmark | Focus | Link |
|---|---|---|
| ChemIQ | Chemical intelligence beyond MCQs | Code |
| ChemBench | General chemistry understanding | Website |
| MaCBench | Materials chemistry | HuggingFace |
| FGBench | Molecule functional group-level reasoning using LLMs | arXiv |
Biology & Healthcare
| Benchmark | Focus | Link |
|---|---|---|
| LAB-Bench | Laboratory protocol execution | arXiv |
| BixBench | Bioinformatics tasks | arXiv |
| Bio-ML | Biology ML evaluation | bioRxiv |
Agent Reliability
Current Limitations
Reliability: Non-deterministic outputs; query phrasing significantly affects results.
Tool Selection: Performance degrades as available tools increase.
Memory: Long workflow context management remains challenging.
Bottom line: Agents demonstrate impressive capabilities but lack production reliability. Build metrics that capture both.
Key Reading
Reviews - LLMs and Autonomous Agents in Chemistry — White lab survey - Empowering Biomedical Discovery with AI Agents — Zitnik lab (Cell) - Foundation Models for Materials Discovery — npj Computational Materials
Engineering - Building Effective Agents — Anthropic - 12-Factor Agents — Human Layer - Small Agents Position Paper — NVIDIA
Comprehensive Agent Timeline
| Date | Agent | Description | Links |
|---|---|---|---|
| 2025.12 | DynaMate | Protein-ligand MD automation | Paper ・ Code |
| 2025.09 | Personal Health Agent | Health assistant architecture | Paper |
| 2025.07 | DREAMS | DFT-based materials simulation | Paper |
| 2025.05 | ROBIN | Multi-agent scientific discovery | Paper ・ Code |
| 2025.05 | El-Agente | Quantum chemistry automation | Paper |
| 2025.04 | SpatialAgent | Spatial biology | Paper |
| 2025.03 | TxAgent | Therapeutic reasoning | Paper ・ Code |
| 2025.01 | DataAnalysisCrow | Jupyter notebook agent | Code |
| 2024.12 | Aviary | Agent training framework | Paper ・ Code |
| 2024.11 | Virtual Lab | Multi-agent nanobody design | Paper ・ Code |
| 2024.11 | DrugAgent | Drug discovery automation | Paper |
| 2024.11 | LLM-RDF | Chemical synthesis platform | Paper ・ Code |
| 2024.10 | dZiner | Inverse materials design | Paper ・ Code |
| 2024.10 | CACTUS | Chemistry tool usage | Paper ・ Code |
| 2024.06 | LLaMP | Materials knowledge retrieval | Paper ・ Code |
| 2024.04 | CRISPR-GPT | Gene-editing design | Paper |
| 2024.02 | TAIS | Gene expression discovery | Paper |
| 2024.02 | WikiCrow | Scientific knowledge synthesis | Website |
| 2024.02 | STORM | Wikipedia-style generation | Paper ・ Code |
| 2024.02 | SciAgent | Tool-augmented scientific reasoning | Paper |
| 2024.01 | ProtAgent | Protein discovery multi-agent | Paper ・ Code |
| 2023.12 | PaperQA | Scientific literature RAG | Paper |
| 2023.12 | Coscientist | Autonomous chemical research | Paper ・ Code |
| 2023.12 | Eunomia | Materials data from literature | Paper ・ Code |
| 2023.11 | eXpertAI | Structure-property XAI | Paper ・ Code |
| 2023.10 | BioPlanner | Protocol planning evaluation | Paper ・ Code |
| 2023.09 | IBM ChemChat | Molecular discovery | Paper |
| 2023.08 | ChatMOF | MOF prediction and generation | Paper ・ Code |
| 2023.06 | AmadeusGPT | Animal behavior analysis | Paper ・ Code |
| 2023.04 | ChemCrow | Chemistry tools for LLMs | Paper ・ Code |
Industry & Products
Lab Automation: Tetsuwan, Artificial AI, Lila Science, Mithri
Research Assistants: LOWE (Recursion), Mara (Nanome), Edison, SuperBio, Sam (ScienceMachine)
Platforms: FutureHouse, G-Mode, Biomni