In this assignment, we need to create a pharmacophore ligand-based model for PKNB kinase. I downloaded the test set of 73 inhibitors of PKNB kinase, from Pubchem (bioassay with AID 624753). I excluded compound with SID 136935277
from further analysis because the structure of this compound was not available. All compounds with 10000 nM Kd or more were considered inactive, and the remaining compounds as active. Pharmacore model was created using 3.12.0.0 version of Ligandscout software from Inteligand GmbH. As pharmacophore generation is very expensive, I chose to use only a subset of the original bioassay. I selected 14 compounds with the lowest values of Kd as actives, and added 3 inactive compounds. For the test set, I used 4 active compounds with CID 11667893, 9809715, 44551653, and 3 inactive compounds with CID 5329102, 11213558, 16725726. For training set, I used 10 compounds with CID 44259, 16722836, 11427553, 9977819, 10138259, 11409972, 11338033, 5291, 123631, and 151194. For all the compounds in training and test set, I generated conformers using BEST settings (with 500 conformers). I then generated merge features pharmacophore using default settings. I show three pharamacophore models with the highest score below.

NovyPharmacophore1.png
Pharmacophore with the highest test score

NovyPharmacophore2.png
Pharmacophore with the second highest test score

NovyPharmacophore3.png
Pharmacophore with the third highest test score

Our pharmacophore is much more compact than pharmacophores in the recent paper by Abhik Seal et al. (Journal of Cheminformatics 2013, vol. 5: 2). It appears that ligand-based pharmacophores for this problem are more compact in size than docking-based pharmacophores.

For screening, I used the test set provided by Abhik Seal with 36 actives and 999 decoy sets.
The performance of the pharmacophore with the highest score on the test set was rather poor. Therefore, I first show the results for pharmacophore with the second highest score. I obtained the following receiver operating curve (ROC):
ROC-graph-2.PNG
ROC curve for the pharmacophore with the second highest test score.


The enrichment factor is the share of true positives among the molecules with the highest fit score. For top 1%, top 5%, and top 10% hits, the enrichment factor is, respectively, 14.4, 7.3, 6.1. Area under operating curve is 0.77. Out of 36 active compounds, screening method classified 31 as active, so classifier sensitivity is rather high, 86%. Classifier specificity is much lower, 38%, because of the large number of false positives. Classifier precision, i.e. the share of true positives among all hits, is also low, around 5.5%. For comparison, in the paper by Abhik Seal et al., sensitivity for the pharmacophore I is 68%, and specificity is 71%.

The pharmacophore with the highest score on the test set had rather poor performance for enrichment factors.
For top 1%, top 5%, and top 10% hits, the enrichment factor is, respectively, 0.0, 1.1, 2.5. Investigation in LigandScout has shown that the first and the second pharmacophore are rather similar, but they put an H-bond donor in different places.
Out of 36 active compounds, screening method classified 31 as active, so classifier sensitivity is still high, 81%. ROC curve is:
ROC-graph-1.PNG
Receiver operating curve for pharmacophore with the highest test score