Skip to main content
Try Wikispaces Classroom now.
Brand new from Wikispaces.
Pages and Files
Free and open source resources
Mining chemical information
Structural analysis of diverse sets using scaffold analysis .
Assignment 1.2 Aliaksandr Krukau
Assignment 1.2 EduH
Assignment 6 Aliaksandr Krukau
I571_Fall 2015 Edu Harguindey
Toxicophores in a Malarial Bioassay
Add "All Pages"
Assignment 6 Aliaksandr Krukau
In this assignment, we need to create a pharmacophore ligand-based model for PKNB kinase. I downloaded the test set of 73 inhibitors of PKNB kinase, from Pubchem (bioassay with AID
624753). I excluded compound with SID
from further analysis because the structure of this compound was not available. All compounds with
10000 nM Kd or more were considered inactive, and the remaining compounds as active. Pharmacore model was created using 18.104.22.168 version of Ligandscout software from Inteligand GmbH.
As pharmacophore generation is very expensive, I chose to use only a subset of the original bioassay. I selected 14 compounds with the lowest values of Kd as actives, and added 3 inactive compounds. For the test set, I used 4 active compounds with CID 11667893, 9809715, 44551653, and 3 inactive compounds with CID 5329102, 11213558, 16725726. For training set, I used 10 compounds with CID 44259, 16722836, 11427553, 9977819, 10138259, 11409972, 11338033, 5291, 123631, and 151194. For all the compounds in training and test set, I generated conformers using BEST settings (with 500 conformers). I then generated merge features pharmacophore using default settings. I show three pharamacophore models with the highest score below.
Pharmacophore with the highest test score
Pharmacophore with the second highest test score
Pharmacophore with the third highest test score
Our pharmacophore is much more compact than pharmacophores in the recent paper by Abhik Seal et al. (
Journal of Cheminformatics
). It appears that ligand-based pharmacophores for this problem are more compact in size than docking-based pharmacophores.
For screening, I used the test set provided by Abhik Seal with
36 actives and 999 decoy sets.
The performance of the pharmacophore with the highest score on the test set was rather poor. Therefore, I first show the results for pharmacophore with the second highest score.
I obtained the following receiver operating curve (ROC):
ROC curve for the pharmacophore with the second highest test score.
The enrichment factor is the share of true positives among the molecules with the highest fit score. For top 1%, top 5%, and top 10% hits, the enrichment factor is, respectively, 14.4, 7.3, 6.1. Area under operating curve is 0.77. Out of 36 active compounds, screening method classified 31 as active, so classifier sensitivity is rather high, 86%. Classifier specificity is much lower, 38%, because of the large number of false positives. Classifier precision, i.e. the share of true positives among all hits, is also low, around 5.5%. For comparison, in the paper by Abhik Seal et al., sensitivity for the pharmacophore I is 68%, and specificity is 71%.
pharmacophore with the highest score on the test set had rather poor performance for enrichment factors.
For top 1%, top 5%, and top 10% hits, the enrichment factor is, respectively, 0.0, 1.1, 2.5. Investigation in LigandScout has shown that the first and the second pharmacophore are rather similar, but they put an H-bond donor in different places.
Out of 36 active compounds, screening method classified 31 as active, so classifier sensitivity is still high, 81%. ROC curve is:
Receiver operating curve for pharmacophore with the highest test score
help on how to format text
Turn off "Getting Started"