Ben Fulton

I571 Fall 2013 - Assignment 1


Mutagens are chemicals that have the potential to change or damage DNA. A chemical that has the property of mutagenicity is less likely to be a viable drug. As such, a screening test that can detect the mutagenicity of a chemical in silico would be a useful tool in identifying marketable drugs.

Kazius, et al, identified a series of molecular substructures which can be responsible for mutagenicity. These substructures, or toxicophores, can be represented as SMARTS strings, which enables the comparison of the structure to various target molecules.

Kazius' list of toxicophores, which includes compounds such as alkyl nitrites and triazenes, was used to search through the list of compounds in PubChem BioAssay 504848, which screened for delayed death inhibitors of the malarial parasite plastid, to determine how many of the compounds in that assay contained toxicophores.

Results: The results showed that many different chemicals in the assay included toxicophores, including more than 150 with the aromatic nitro structure.


BioAssay 504848 was downloaded from the PubChem website. The assay included 2,306 chemical structures, identified by CID in the PubChem database. The PubChemPy package was used to download the SMILES strings for these structures. Kazius et al listed 29 compounds that could be responsible for mutagenicity in a drug, along with the SMARTS strings for each one. 28 of the compounds were described by a single SMARTS string. The 29th described a polycyclic planar system which was describable by any one of nine SMARTS strings. Therefore, 37 SMARTS strings were available to be compared to the 2,306 SMILES strings.

The rdkit package was used to do the comparisons. One toxicophore, sulphonate-bonded carbon (alkyl alkane sulphonate or dialkyl sulphate), was represented in the Kazius paper by the following SMARTS string:


The rdkit parser was unable to correctly parse this string. After discussion with the rdkit developer, the string was modified to


Supplemental Files: Source code and supporting files may be downloaded from


Nearly three times as many compounds (173) contained the specific aromatic nitro as contained any other toxicophore (Fig. 1). Aromatic amine was the next most common, occurring in 67 compounds, while aliphatic halide occurred in 52. While no other SMARTS pattern occurred in more than 25 compounds, five different conformations of the polycyclic planar system did occur, summing to 92 compounds. If these planar systems were considered as a single toxicophore, they would comprise the second most common toxicophore in the assay.

Fig 1. Toxicophores found in the assay


Jeroen Kazius, Ross McGuire, and Roberta Bursi, 2004. Derivation and Validation of Toxicophores for Mutagenicity Prediction. J. Med. Chem., 2005, 48 (1), pp 312–320 DOI: 10.1021/jm040835a

National Center for Biotechnology Information. PubChem BioAssay Database; AID=504848, Source=Scripps Research Institute Molecular Screening Center, (accessed Sep. 22, 2013).

RDKit: Open-source cheminformatics;