Skip to main content
Get your Wikispaces Classroom now:
the easiest way to manage your class.
Pages and Files
Free and open source resources
Mining chemical information
Structural analysis of diverse sets using scaffold analysis .
Assignment 1.2 Aliaksandr Krukau
Assignment 1.2 EduH
Assignment 6 Aliaksandr Krukau
I571_Fall 2015 Edu Harguindey
Toxicophores in a Malarial Bioassay
Add "All Pages"
A group of confirmatory assays that target Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) was used from PubChem for fingerprint generation and similarity searching. The assay data used in this analysis demonstrate association between PfDHODH inhibition and parasite toxicity. RStudio was used along with the rcdk and gplots package. The active and inactive source codes were downloaded from
rpubchem package in R
Once the active compounds were downloaded along with the canonical smiles; the smiles were parsed into a vector. Each component of the vector represents a bit position in the fingerprint. Before performing the clustering, it was necessary to generate a distance matrix. In this example the Tanimoto metric was used. This step can be preformed using a fingerprint package found on
. This package is designed to handle fingerprint data and provides a function to evaluate a distance matrix using the Tanimoto distance. Thus we have:
> ## generate fingerprints
> for (i in 1:length(ActiveSmiles))
> fp.sim <- fp.sim.matrix(unlist(cmp.fp))
> fp.dist <- 1 - fp.sim
Now that the similarity matrix is generated, the structures can be clustered. A heat map was generated of the Tanimoto similarity matrix. The process described above was repeated for the inactive compounds found in the datasets for biological target dihydroorotate dehydrogenase.
Observations: There was high similarity among the active compounds, represented by the yellow and white coloring (Figure 1). It was very difficult to infer useful information from the inactive compound heatmap (Figure 2). There were only 3 compounds from the dataset that were tagged as inactive. Most compounds that were not labeled as Active or Inactive were tagged as ‘Inconclusive’ or ‘Unspecified’. Of the 3 inactive compounds, all structures were very dissimilar (<0.4).
Figure 1. Active Compound Heatmap
Figure 2. Inactive Compound Heatmap
help on how to format text
Turn off "Getting Started"