Cheminformatics basics - Why the SMIRKS?

Walkthrough about what are SMIRKS, and how they can be used for reaction filtering and molecule enumeration
Published

August 31, 2022

The code and discussion in this notebook is inspired and borrowed from: * Efficient Bits

The Reaction SMARTS or SMIRKS way to query chemical reactions

SMIRKS as per the Daylight definition are used to describe a transform (or reaction) to modify molecules. They are rules to make new molecules but also be used a ‘Reaction SMARTS’ to search for reactions smiles which match that transformation.

RDKit treats these slightly differently - it has Reaction SMARTS which are used for substructure matching alone.

reaction  ::=   reactants ">>" products
reactants ::=  molecules
products  ::=   molecules
molecules ::=  molecule
               molecules "." molecule
molecule  ::=   a valid SMARTS string without "." characters
                "(" a valid SMARTS string without "." characters ")"
<tr>
  <td align="center">C&gt;&gt;C</td>
  <td align="center">CC&gt;&gt;CC</td>
  <td align="center">4</td>
  <td>No maps, normal match.</td>
</tr>     
<tr>
  <td align="center">C&gt;&gt;C</td>
  <td align="center">[CH3:7][CH3:8]&gt;&gt; [CH3:7][CH3:8]</td>
  <td align="center">4</td>
  <td>No maps in query, maps in target are ignored.</td>
</tr>
<tr>
  <td align="center">[C:1]&gt;&gt;C</td>
  <td align="center">[CH3:7][CH3:8]&gt;&gt; [CH3:7][CH3:8]</td>
  <td align="center">4</td>
  <td>Unpaired map in query ignored.</td>
</tr>
<tr>
  <td align="center">[C:1]&gt;&gt;[C:1]</td>
  <td align="center">CC&gt;&gt;CC</td>
  <td align="center">0</td>
  <td>No maps in target, hence no matches.</td>
</tr>
<tr>
  <td align="center">[C:?1]&gt;&gt;[C:?1]</td>
  <td align="center">CC&gt;&gt;CC</td>
  <td align="center">4</td>
  <td>Query says mapped as shown or not present.</td>
</tr>
<tr>
  <td align="center">[C:1]&gt;&gt;[C:1]</td>
  <td align="center">[CH3:7][CH3:8]&gt;&gt;[CH3:7][CH3:8]</td>
  <td align="center">2</td>
  <td>Matches for target 7,7 and 8,8 atom pairs.</td>
</tr>
<tr>
  <td align="center">[C:1]&gt;&gt;[C:2]</td>
  <td align="center">[CH3:7][CH3:8]&gt;&gt; [CH3:7][CH3:8]</td>
  <td align="center">4</td>
  <td>When a query class is not found on both<br>sides of the query, it is
  ignored;<br>this query does NOT say that the atoms<br>are in different
  classes. </td>
</tr>
<tr>
  <td align="center">[C:1][C:1]&gt;&gt;[C:1]</td>
  <td align="center">[CH3:7][CH3:7]&gt;&gt; [CH3:7][CH3:7]</td>
  <td align="center">4</td>
  <td>Atom maps match with "or" logic.  All atoms<br>get bound to
  class 7.</td>
</tr>
<tr>
  <td align="center">[C:1][C:1]&gt;&gt;[C:1]</td>
  <td align="center">[CH3:7][CH3:8]&gt;&gt; [CH3:7][CH3:8]</td>
  <td align="center">4</td>
  <td>The reactant atoms are bound to classes 7<br>and 8. Note that having
  the first query atom<br>bound to class 7 does not preclude<br>binding the
  second atom. Next, the product<br>atom can bind to classes 7 or 8.</td>
</tr>
<tr>
  <td align="center">[C:1][C:1]&gt;&gt;[C:1]</td>
  <td align="center">[CH3:7][CH3:7]&gt;&gt; [CH3:7][CH3:8]</td>
  <td align="center">2</td>
  <td>The reactants are bound to class 7.  The<br>product atom can bind to
  class 7 only.</td>
</tr>
Example Reaction SMARTS:
Query: Target: Matches: Comment:
# collapse_output
# Install requirements for the tutorial
!pip install pandas rdkit-pypi mols2grid matplotlib scikit-learn ipywidgets
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: pandas in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (1.1.5)
Requirement already satisfied: rdkit-pypi in /home/l017301/.local/lib/python3.8/site-packages (2021.9.4)
Requirement already satisfied: mols2grid in /home/l017301/.local/lib/python3.8/site-packages (0.2.1)
Requirement already satisfied: matplotlib in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (3.3.0)
Requirement already satisfied: scikit-learn in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (0.23.2)
Requirement already satisfied: ipywidgets in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (7.5.1)
Requirement already satisfied: numpy>=1.15.4 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from pandas) (1.18.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from pandas) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from pandas) (2020.1)
Requirement already satisfied: Pillow in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from rdkit-pypi) (7.2.0)
Requirement already satisfied: jinja2>=2.11.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from mols2grid) (2.11.2)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from matplotlib) (2.4.7)
Requirement already satisfied: cycler>=0.10 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from matplotlib) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from matplotlib) (1.2.0)
Requirement already satisfied: joblib>=0.11 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from scikit-learn) (0.16.0)
Requirement already satisfied: threadpoolctl>=2.0.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from scikit-learn) (2.1.0)
Requirement already satisfied: scipy>=0.19.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from scikit-learn) (1.4.1)
Requirement already satisfied: traitlets>=4.3.1 in /home/l017301/.local/lib/python3.8/site-packages (from ipywidgets) (5.1.1)
Requirement already satisfied: nbformat>=4.2.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipywidgets) (5.0.7)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipywidgets) (3.5.1)
Requirement already satisfied: ipykernel>=4.5.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipywidgets) (5.3.4)
Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipywidgets) (7.17.0)
Requirement already satisfied: six>=1.5 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas) (1.15.0)
Requirement already satisfied: MarkupSafe>=0.23 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from jinja2>=2.11.0->mols2grid) (1.1.1)
Requirement already satisfied: ipython-genutils in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (0.2.0)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (3.2.0)
Requirement already satisfied: jupyter-core in /home/l017301/.local/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (4.9.1)
Requirement already satisfied: notebook>=4.4.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets) (6.1.1)
Requirement already satisfied: tornado>=4.2 in /home/l017301/.local/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1)
Requirement already satisfied: jupyter-client in /home/l017301/.local/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (7.1.2)
Requirement already satisfied: backcall in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.2.0)
Requirement already satisfied: jedi>=0.10 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.17.1)
Requirement already satisfied: pygments in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (2.6.1)
Requirement already satisfied: pexpect; sys_platform != "win32" in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (4.8.0)
Requirement already satisfied: pickleshare in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.7.5)
Requirement already satisfied: decorator in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (4.4.2)
Requirement already satisfied: setuptools>=18.5 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (47.1.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/l017301/.local/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (3.0.24)
Requirement already satisfied: pyrsistent>=0.14.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (0.16.0)
Requirement already satisfied: attrs>=17.4.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (19.3.0)
Requirement already satisfied: argon2-cffi in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.1.0)
Requirement already satisfied: nbconvert in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (5.6.1)
Requirement already satisfied: Send2Trash in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.0)
Requirement already satisfied: terminado>=0.8.3 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.3)
Requirement already satisfied: prometheus-client in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.0)
Requirement already satisfied: pyzmq>=17 in /home/l017301/.local/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (22.3.0)
Requirement already satisfied: nest-asyncio>=1.5 in /home/l017301/.local/lib/python3.8/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (1.5.4)
Requirement already satisfied: entrypoints in /home/l017301/.local/lib/python3.8/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (0.3)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.7.0)
Requirement already satisfied: ptyprocess>=0.5 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from pexpect; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.6.0)
Requirement already satisfied: wcwidth in /home/l017301/.local/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0; python_version >= "3.3"->ipywidgets) (0.2.5)
Requirement already satisfied: cffi>=1.0.0 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.14.1)
Requirement already satisfied: bleach in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (3.1.5)
Requirement already satisfied: defusedxml in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.6.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.4)
Requirement already satisfied: testpath in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.4.4)
Requirement already satisfied: pandocfilters>=1.4.1 in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.4.2)
Requirement already satisfied: pycparser in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.20)
Requirement already satisfied: webencodings in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.1)
Requirement already satisfied: packaging in /lrlhps/apps/python/python-3.8.5/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.4)
WARNING: You are using pip version 20.2.2; however, version 22.2.2 is available.
You should consider upgrading via the '/lrlhps/apps/python/python-3.8.5/bin/python -m pip install --upgrade pip' command.
import os 
import pandas as pd
import numpy as np 

# RDkit imports
import rdkit
from rdkit import Chem #This gives us most of RDkits's functionality
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole #Needed to show molecules
IPythonConsole.ipython_useSVG=True  #SVG's tend to look nicer than the png counterparts
print(rdkit.__version__)

## Enumerator 
from rdkit.Chem.Draw import rdDepictor
from rdkit.Chem import rdMolEnumerator

# Mute all errors except critical
Chem.WrapLogs()
lg = rdkit.RDLogger.logger() 
lg.setLevel(rdkit.RDLogger.CRITICAL)

from collections import defaultdict
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
2021.09.3
#----- PLOTTING PARAMS ----# 
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm

# High DPI rendering for mac
%config InlineBackend.figure_format = 'retina'

# Plot matplotlib plots with white background: 
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}

plot_params = {
'font.size' : 15,
'axes.titlesize' : 15,
'axes.labelsize' : 15,
'axes.labelweight' : 'bold',
'xtick.labelsize' : 12,
'ytick.labelsize' : 12,
}
 
plt.rcParams.update(plot_params)
rxn = AllChem.ReactionFromSmarts('[C:1]=[O,N:2]>>[C:1][*:2]')
rxn

[Chem.MolToSmiles(x,1) for x in rxn.RunReactants((Chem.MolFromSmiles('CC=O'),))[0]]
['CCO']
rxn = AllChem.ReactionFromSmarts('[C:1]~[O,N:2]>>*[C:1]~[*:2]')
rxn

[Chem.MolToSmiles(x,1) for x in rxn.RunReactants((Chem.MolFromSmiles('C#N'),))[0]]
['*C#N']
rxn = AllChem.ReactionFromSmarts("([C:1]=[C;H2].[C:2]=[C;H2])>>[*:1]=[*:2]")
rxn

m1 = Chem.MolFromSmiles('C=CCOCC=C')
m1
ps = rxn.RunReactants((m1,))
ps[0][0]

Generate positional varying molecules in RDkit

#Sanitise generated regio isomers
from rdkit.Chem import rdFMCS
def align_bundle_coords(bndl):
    ps = rdFMCS.MCSParameters()
    for m in bndl:
        Chem.SanitizeMol(m)
    mcs = rdFMCS.FindMCS(bndl,completeRingsOnly=True)
    q = Chem.MolFromSmarts(mcs.smartsString)
    rdDepictor.Compute2DCoords(q)
    for m in bndl:
        rdDepictor.GenerateDepictionMatching2DStructure(m,q)
pv1 = Chem.MolFromMolBlock('''
  Mrv2007 06232015292D          

  0  0  0     0  0            999 V3000
M  V30 BEGIN CTAB
M  V30 COUNTS 9 8 0 0 0
M  V30 BEGIN ATOM
M  V30 1 C -1.7083 2.415 0 0
M  V30 2 C -3.042 1.645 0 0
M  V30 3 C -3.042 0.105 0 0
M  V30 4 N -1.7083 -0.665 0 0
M  V30 5 C -0.3747 0.105 0 0
M  V30 6 C -0.3747 1.645 0 0
M  V30 7 * -0.8192 1.3883 0 0
M  V30 8 O -0.8192 3.6983 0 0
M  V30 9 C 0.5145 4.4683 0 0
M  V30 END ATOM
M  V30 BEGIN BOND
M  V30 1 1 1 2
M  V30 2 2 2 3
M  V30 3 1 3 4
M  V30 4 2 4 5
M  V30 5 1 5 6
M  V30 6 2 1 6
M  V30 7 1 7 8 ENDPTS=(3 1 5 6) ATTACH=ANY
M  V30 8 1 8 9
M  V30 END BOND
M  V30 END CTAB
M  END''')
pv1
pv1_bundle = rdMolEnumerator.Enumerate(pv1)
align_bundle_coords(pv1_bundle)
Draw.MolsToGridImage(pv1_bundle)

Set up the reactants and reaction definitions

from rdkit.Chem import rdmolfiles as rdfiles

lib_alcohols = rdfiles.SmilesMolSupplier('./molecule_enumerations/alcohol.smi') #Read input file with all reactants
lib_alkynes = rdfiles.SmilesMolSupplier('./molecule_enumerations/alkynes.smi')
reaction_library = pd.read_csv('./molecule_enumerations/rxn_definitions.txt', sep='\t', index_col = 'ReactionName')
reaction_library
ReactionSMARTS
ReactionName
Primary alc ox [CH2:1][OD1] >> [C:1]=[OD1]
Amide coupling [C:1](=[O:2])O.[N:3] >> [C:1](=[O:2])[N:3]
Diels-Alder '[C:1]=[C:2]-[C:3]=[C:4].[C:5]=[C:6] >> [C:5]-...
4-Click [N:1]=[N+1:2]=[N-1:3].[CH:4]#[C:5]>>[N:1]-1-[C...

Single molecule reaction

def unimolecular(molecule, reaction_smirk):
  rxn = AllChem.ReactionFromSmarts(reaction_smirk)
  product = rxn.RunReactants([molecule,])
  
  final_smiles_list = []
  try: 
    for i in range(len(product)):
      final_smiles = Chem.MolToSmiles(product[i][0])
      final_smiles_list.append(final_smiles)
  except:
    pass
  return final_smiles_list
Draw.MolsToGridImage(lib_alcohols)

reaction_library.ReactionSMARTS['Primary alc ox']
'[CH2:1][OD1] >> [C:1]=[OD1]'
unimolecular(lib_alcohols[0], reaction_library.ReactionSMARTS['Primary alc ox'])
['CC=O']
products = []
for alcohol_mol in lib_alcohols:
  prod = unimolecular(alcohol_mol, reaction_library.ReactionSMARTS['Primary alc ox'])
  for mol in prod:
    product = Chem.MolFromSmiles(mol)
    products.append(product)
align_bundle_coords(products)
Draw.MolsToGridImage(products)

Reacting two compounds

def bimolecular_rxn(mol1, mol2, reaction_smirk):
  rxn = AllChem.ReactionFromSmarts(reaction_smirk)
  product = rxn.RunReactants([mol1, mol2])
  
  final_smiles_list = []
  try: 
    for i in range(len(product)):
      final_smiles = Chem.MolToSmiles(product[i][0])
      final_smiles_list.append(final_smiles)
  except:
    pass
  return final_smiles_list
reaction_smirks = reaction_library.ReactionSMARTS["4-Click"] #Define the reaction type from the reaction library
reaction_smirks
'[N:1]=[N+1:2]=[N-1:3].[CH:4]#[C:5]>>[N:1]-1-[CH:4]=[C:5]-[N-0:3]=[N+0:2]-1'
product_bimolecular = [] #List of enumerated molecules
Core = Chem.MolFromSmiles('O[C@@H]1[C@@H](O)[C@H](OCC2=CC=CC=C2)[C@H](C[C@H]1N=[N+]=[N-])C([O-])=O') #Define the molecular core
Core
for i in lib_alkynes: #Read the .smi-file of reagents to be used for enumeration
    resulting_smiles = bimolecular_rxn(Core, i, reaction_smirks)
    
    for i in resulting_smiles:
        prod1 = Chem.MolFromSmiles(i)
        product_bimolecular.append(prod1)
align_bundle_coords(product_bimolecular)
Draw.MolsToGridImage(product_bimolecular)