Skip to main content
Wikispaces Classroom is now free, social, and easier than ever.
Try it today.
Pages and Files
Free and open source resources
Mining chemical information
Structural analysis of diverse sets using scaffold analysis .
Assignment 1.2 Aliaksandr Krukau
Assignment 1.2 EduH
Assignment 6 Aliaksandr Krukau
I571_Fall 2015 Edu Harguindey
Toxicophores in a Malarial Bioassay
Add "All Pages"
Mutagenicity is one of the most important adverse effects of compounds which prevent a compound to be a marketable drug. A seminal paper by Kazius etal. have identified many substructures ( parts of the molecule) which is responsible for mutagenicity. Such substructures are called toxicophores.
identified 29 toxicophores containing new substructures responsible for mutagenicity.
In this assignment, all active compounds for the pubchem Malaria Bioassay datasets (AID: 449704 ,AID: 449703, AID: 504848, AID: 504850) have been downloaded; and have been analysed to identify presence of any of the 29 toxicophores.
A KNIME workflow (
) has been created to read all downloaded SDF files from pubchem. RDKit nodes have been used for substructure filter and a CSV has been generated with a list of all active compounds and toxicophores presence.
This CSV file has been analysed using TIBCO Spotfire (
) to create several visualizations.
Note: KNIME RDKit node has discarded a SMART of one of the toxicophores, so it has been removed from the analysis (sulphonateBondedCarbonAlkylAlkaneSulphoneateOrDialkylSulphate [$([C,c]OS((=O)=O)O!@[c,C]),$([c,C]S((=O)=O)O!@[c,C])])
Figure 1 describes the active compounds downloaded from pubchem for the different bioassays.
In Figure 2, distribution of compounds with at least one toxicophore present is displayed for each bioassay. Note that Red color is used to indicate presence while Green color is used to indicate absence of toxicophore. Sector size corresponds to the number and percentage of unique compounds id in each bioassay.
It's interesting to observe that percentage of toxicophore presence in all sets are similar (around 17-18%). This would suggest to prepare some additional clustering/diversity analysis to understand chemical space of each subset.
Finally, Figure 3.1 is displaying the frequency of the studied toxicophores in all compounds (number of unique compounds in which each toxicophore is present) while Figure 3.2 is displaying that frequency by studies and Figure 3.3 displays the absolute frequency in all the data sets (number of times each toxicophore is present).
help on how to format text
Turn off "Getting Started"