Assignment 1.2

Mutagenicity is one of the most important adverse effects of compounds which prevent a compound to be a marketable drug. A seminal paper by Kazius etal. have identified many substructures ( parts of the molecule) which is responsible for mutagenicity. Such substructures are called toxicophores.The paper identified 29 toxicophores containing new substructures responsible for mutagenicity.

In this assignment, all active compounds for the pubchem Malaria Bioassay datasets (AID: 449704 ,AID: 449703, AID: 504848, AID: 504850) have been downloaded; and have been analysed to identify presence of any of the 29 toxicophores.

A KNIME workflow ( has been created to read all downloaded SDF files from pubchem. RDKit nodes have been used for substructure filter and a CSV has been generated with a list of all active compounds and toxicophores presence.

This CSV file has been analysed using TIBCO Spotfire ( to create several visualizations.

Note: KNIME RDKit node has discarded a SMART of one of the toxicophores, so it has been removed from the analysis (sulphonateBondedCarbonAlkylAlkaneSulphoneateOrDialkylSulphate [$([C,c]OS((=O)=O)O!@[c,C]),$([c,C]S((=O)=O)O!@[c,C])])

Figure 1 describes the active compounds downloaded from pubchem for the different bioassays.

Figure 1:


In Figure 2, distribution of compounds with at least one toxicophore present is displayed for each bioassay. Note that Red color is used to indicate presence while Green color is used to indicate absence of toxicophore. Sector size corresponds to the number and percentage of unique compounds id in each bioassay.

Figure 2:


It's interesting to observe that percentage of toxicophore presence in all sets are similar (around 17-18%). This would suggest to prepare some additional clustering/diversity analysis to understand chemical space of each subset.

Finally, Figure 3.1 is displaying the frequency of the studied toxicophores in all compounds (number of unique compounds in which each toxicophore is present) while Figure 3.2 is displaying that frequency by studies and Figure 3.3 displays the absolute frequency in all the data sets (number of times each toxicophore is present).

Figure 3.1:


Figure 3.2:


Figure 3.3: