The code in this notebook is inspired from:

Pattern matching

In molecules

The SMART way

SMARTS (SMiles ARbitrary Target Specification) is a lanuguage used to search, select, and match a substructure pattern in a molecule SMILE. The idea of SMARTS is reminiscent of regular expressions (regex) for texts. Following some pre-defined rules for searching, SMARTS offer a powerful way to systematically search though a large corpus of molecules for a particular chemical phenotype.

More details on the SMARTS and its 'grammar' can be found on the Daylight's official page

Few words of caution:

  • SMILES represent whole molecule (graph), SMARTS identify a substructure (subgraph).
  • All SMILES are valid SMARTS
  • It is better to be precise with your query than be general (you dont know what it might hit if not meticulous)

    For instance, the SMILES O means an aliphatic oxygen with zero charge and two hydrogens, i.e. water. In SMARTS, the same expression means any aliphatic oxygen regardless of charge, hydrogen count, etc, e.g. it will match the oxygen in water, but also those in ethanol, acetone, molecular oxygen, hydroxy and hydronium ions, etc. Specifying [OH2] limits the pattern to match only water (this is also the fully specified SMILES for water).

SMARTS Atomic Primitives
Symbol Symbol name Atomic property requirements Default
* wildcard any atom (no default)
a aromatic aromatic (no default)
A aliphatic aliphatic (no default)
D<n> degree <n> explicit connections exactly one
H<n> total-H-count <n> attached hydrogens exactly one1
h<n> implicit-H-count <n> implicit hydrogens at least one
R<n> ring membership in <n> SSSR rings any ring atom
r<n> ring size in smallest SSSR ring of size <n> any ring atom2
v<n> valence total bond order <n> exactly one2
X<n> connectivity <n> total connections exactly one2
x<n> ring connectivity <n> total ring connections at least one2
- <n> negative charge -<n> charge -1 charge (-- is -2, etc)
+<n> positive charge +<n> formal charge +1 charge (++ is +2, etc)
#n atomic number atomic number <n> (no default)2
@ chirality anticlockwise anticlockwise, default class2
@@ chirality clockwise clockwise, default class2
@<c><n> chirality chiral class <c> chirality <n> (nodefault)
@<c><n>? chiral or unspec chirality <c><n> or unspecified (no default)
<n> atomic mass explicit atomic mass unspecified mass
Examples1:
C aliphatic carbon atom
c aromatic carbon atom
a aromatic atom
[#6] carbon atom
[Ca] calcium atom
[++] atom with a +2 charge
[R] atom in any ring
[D3] atom with 3 explicit bonds (implicit H's don't count)
[X3] atom with 3 total bonds (includes implicit H's)
[v3] atom with bond orders totaling 3 (includes implicit H's)
C[C@H](F)O match chirality (H-F-O anticlockwise viewed from C)
C[C@?H](F)O matches if chirality is as specified or is not specified
cc any pair of attached aromatic carbons
c:c aromatic carbons joined by an aromatic bond
c-c aromatic carbons joined by a single bond (e.g. biphenyl).
O any aliphatic oxygen
[O;H1] simple hydroxy oxygen
[O;D1] 1-connected (hydroxy or hydroxide) oxygen
[O;D2] 2-connected (etheric) oxygen
[C,c] any carbon
F,Cl,Br,I] the 1st four halogens.
[N;R] must be aliphatic nitrogen AND in a ring
[!C;R] ( NOTaliphatic carbon ) AND in a ring
[n;H1] H-pyrrole nitrogen
[n&H1] same as above
[c,n&H1] any arom carbon OR H-pyrrole nitrogen
[c,n;H1] (arom carbon OR arom nitrogen) and exactly one H
*!@* two atoms connected by a non-ringbond
*@;!:* two atoms connected by a non-aromatic ringbond
[C,c]=,#[C,c] two carbons connected by a double or triple bond

Query files

Greg Landrum of Rdkit on query files

Pre-defined filters

In chemical reactions

The Reaction SMARTS way

SMIRKS

Example Reaction SMARTS:
Query: Target: Matches: Comment:
C>>C CC>>CC 4 No maps, normal match.
C>>C [CH3:7][CH3:8]>> [CH3:7][CH3:8] 4 No maps in query, maps in target are ignored.
[C:1]>>C [CH3:7][CH3:8]>> [CH3:7][CH3:8] 4 Unpaired map in query ignored.
[C:1]>>[C:1] CC>>CC 0 No maps in target, hence no matches.
[C:?1]>>[C:?1] CC>>CC 4 Query says mapped as shown or not present.
[C:1]>>[C:1] [CH3:7][CH3:8]>>[CH3:7][CH3:8] 2 Matches for target 7,7 and 8,8 atom pairs.
[C:1]>>[C:2] [CH3:7][CH3:8]>> [CH3:7][CH3:8] 4 When a query class is not found on both
sides of the query, it is ignored;
this query does NOT say that the atoms
are in different classes.
[C:1][C:1]>>[C:1] [CH3:7][CH3:7]>> [CH3:7][CH3:7] 4 Atom maps match with "or" logic. All atoms
get bound to class 7.
[C:1][C:1]>>[C:1] [CH3:7][CH3:8]>> [CH3:7][CH3:8] 4 The reactant atoms are bound to classes 7
and 8. Note that having the first query atom
bound to class 7 does not preclude
binding the second atom. Next, the product
atom can bind to classes 7 or 8.
[C:1][C:1]>>[C:1] [CH3:7][CH3:7]>> [CH3:7][CH3:8] 2 The reactants are bound to class 7. The
product atom can bind to class 7 only.

Install necessary modules

# Install requirements for the tutorial
!pip install pandas rdkit-pypi mols2grid matplotlib scikit-learn ipywidgets

Requirement already satisfied: pandas in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (1.3.4)
Requirement already satisfied: rdkit-pypi in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (2021.9.2.1)
Requirement already satisfied: mols2grid in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (0.1.0)
Requirement already satisfied: matplotlib in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (3.5.1)
Requirement already satisfied: scikit-learn in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (1.0.2)
Requirement already satisfied: ipywidgets in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (7.6.5)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.21.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from pandas) (1.21.4)
Requirement already satisfied: jinja2 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from mols2grid) (3.0.3)
Requirement already satisfied: cycler>=0.10 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (4.28.3)
Requirement already satisfied: pillow>=6.2.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (8.4.0)
Requirement already satisfied: pyparsing>=2.2.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (3.0.6)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (1.3.2)
Requirement already satisfied: packaging>=20.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from matplotlib) (21.3)
Requirement already satisfied: joblib>=0.11 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from scikit-learn) (1.1.0)
Requirement already satisfied: scipy>=1.1.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from scikit-learn) (1.7.3)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from scikit-learn) (3.1.0)
Requirement already satisfied: ipython-genutils~=0.2.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (0.2.0)
Requirement already satisfied: nbformat>=4.2.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (5.1.3)
Requirement already satisfied: traitlets>=4.3.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (5.1.1)
Requirement already satisfied: ipython>=4.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (7.30.1)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (1.0.2)
Requirement already satisfied: ipykernel>=4.5.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (6.6.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipywidgets) (3.5.2)
Requirement already satisfied: tornado<7.0,>=4.2 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1)
Requirement already satisfied: jupyter-client<8.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipykernel>=4.5.1->ipywidgets) (7.1.0)
Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.3)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipykernel>=4.5.1->ipywidgets) (1.5.1)
Requirement already satisfied: appnope in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.2)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (3.0.24)
Requirement already satisfied: pygments in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (2.10.0)
Requirement already satisfied: setuptools>=18.5 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (59.4.0)
Requirement already satisfied: decorator in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (5.1.0)
Requirement already satisfied: pickleshare in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)
Requirement already satisfied: backcall in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)
Requirement already satisfied: jedi>=0.16 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (0.18.1)
Requirement already satisfied: pexpect>4.3 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from ipython>=4.0.0->ipywidgets) (4.8.0)
Requirement already satisfied: jupyter-core in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbformat>=4.2.0->ipywidgets) (4.9.1)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbformat>=4.2.0->ipywidgets) (4.2.1)
Requirement already satisfied: six>=1.5 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from python-dateutil>=2.7.3->pandas) (1.16.0)
Requirement already satisfied: notebook>=4.4.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from widgetsnbextension~=3.5.0->ipywidgets) (6.4.6)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jinja2->mols2grid) (2.0.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.8.3)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (0.18.0)
Requirement already satisfied: attrs>=17.4.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (21.2.0)
Requirement already satisfied: pyzmq>=13 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (22.3.0)
Requirement already satisfied: entrypoints in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (0.3)
Requirement already satisfied: nest-asyncio>=1.5 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets) (1.5.4)
Requirement already satisfied: argon2-cffi in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (21.1.0)
Requirement already satisfied: nbconvert in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (6.3.0)
Requirement already satisfied: terminado>=0.8.3 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.12.1)
Requirement already satisfied: Send2Trash>=1.8.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.8.0)
Requirement already satisfied: prometheus-client in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.12.0)
Requirement already satisfied: ptyprocess>=0.5 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)
Requirement already satisfied: wcwidth in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.5)
Requirement already satisfied: cffi>=1.0.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.15.0)
Requirement already satisfied: bleach in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (4.1.0)
Requirement already satisfied: jupyterlab-pygments in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.1.2)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.9)
Requirement already satisfied: testpath in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.0)
Requirement already satisfied: defusedxml in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.7.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.4)
Requirement already satisfied: pycparser in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.21)
Requirement already satisfied: webencodings in /Users/pghaneka/miniconda3/envs/small_mols/lib/python3.10/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.1)
import os 
import pandas as pd
import numpy as np 

The majority of the basic molecular functionality is found in module rdkit.Chem

import rdkit
from rdkit import Chem #This gives us most of RDkits's functionality
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole #Needed to show molecules
IPythonConsole.ipython_useSVG=True  #SVG's tend to look nicer than the png counterparts
print(rdkit.__version__)

# Mute all errors except critical
Chem.WrapLogs()
lg = rdkit.RDLogger.logger() 
lg.setLevel(rdkit.RDLogger.CRITICAL)
2021.09.2
from collections import defaultdict
from rdkit.Chem.Draw import rdMolDraw2D
from IPython.display import SVG
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm

# High DPI rendering for mac
%config InlineBackend.figure_format = 'retina'

# Plot matplotlib plots with white background: 
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}

plot_params = {
'font.size' : 15,
'axes.titlesize' : 15,
'axes.labelsize' : 15,
'axes.labelweight' : 'bold',
'xtick.labelsize' : 12,
'ytick.labelsize' : 12,
}
 
plt.rcParams.update(plot_params)
diclofenac = Chem.MolFromSmiles('O=C(O)Cc1ccccc1Nc1c(Cl)cccc1Cl')
diclofenac
sub_pattern = Chem.MolFromSmarts('O=CCccN')
hit_ats = list(diclofenac.GetSubstructMatch(sub_pattern))
hit_bonds = []

for bond in sub_pattern.GetBonds():
    aid1 = hit_ats[bond.GetBeginAtomIdx()]
    aid2 = hit_ats[bond.GetEndAtomIdx()]
    
    hit_bonds.append( diclofenac.GetBondBetweenAtoms(aid1, aid2).GetIdx() )
d2d = rdMolDraw2D.MolDraw2DSVG(400, 400) # or MolDraw2DCairo to get PNGs
rdMolDraw2D.PrepareAndDrawMolecule(d2d, diclofenac, highlightAtoms=hit_ats,  highlightBonds=hit_bonds)
d2d.FinishDrawing()
SVG(d2d.GetDrawingText())

Defining a function to make it easy and reproducible:

def viz_substruct(main_smile, substructure_smarts):
    
    mol_file = Chem.MolFromSmiles(main_smile)
    sub_pattern = Chem.MolFromSmarts(substructure_smarts)
    
    hit_ats = list(mol_file.GetSubstructMatch(sub_pattern)) # Returns the indices of the molecule’s atoms that match a substructure query
    hit_bonds = []

    for bond in sub_pattern.GetBonds():
        aid1 = hit_ats[bond.GetBeginAtomIdx()]
        aid2 = hit_ats[bond.GetEndAtomIdx()]

        hit_bonds.append( mol_file.GetBondBetweenAtoms(aid1, aid2).GetIdx() )

    d2d = rdMolDraw2D.MolDraw2DSVG(400, 400) # or MolDraw2DCairo to get PNGs
    rdMolDraw2D.PrepareAndDrawMolecule(d2d, mol_file, highlightAtoms=hit_ats,  highlightBonds=hit_bonds)
    d2d.FinishDrawing()
    return SVG(d2d.GetDrawingText())
viz_substruct('C1=NC(=NC(=C1)C)C2=CC=CC=N2','[ar]!@[ar]')

Online resource to visualize the SMARTS.

This code is inspired from Pen Taka's blogpost

SMARTS.plus is a nice web GUI to visualize different SMART queries real-time.

from rdkit import Chem
from IPython.display import Image
import requests
import urllib
from time import time 
baseurl = "https://smarts.plus/smartsview/download_rest?"
def get_img(query):
    url = baseurl+query
    start = time()
    res = requests.get(url)
    _img = Image(res.content, embed=True, retina=True)
    print('Time taken: {0:0.2f} secs'.format(time() - start))
    return _img
im0 = get_img("smarts=[ar]!@[ar]")
im0
Time taken: 1.40 secs
im1 = get_img("smarts=[CX3](=[OX1])[OX2][CX3](=[OX1])")
im1
Time taken: 1.32 secs