Assignment 2:
A group of confirmatory assays that target Plasmodium falciparum dihydroorotate dehydrogenase (PfDHODH) was used from PubChem for fingerprint generation and similarity searching. The assay data used in this analysis demonstrate association between PfDHODH inhibition and parasite toxicity. RStudio was used along with the rcdk and gplots package. The active and inactive source codes were downloaded from rpubchem package in R (

Once the active compounds were downloaded along with the canonical smiles; the smiles were parsed into a vector. Each component of the vector represents a bit position in the fingerprint. Before performing the clustering, it was necessary to generate a distance matrix. In this example the Tanimoto metric was used. This step can be preformed using a fingerprint package found on rcdk. This package is designed to handle fingerprint data and provides a function to evaluate a distance matrix using the Tanimoto distance. Thus we have:

> ## generate fingerprints
> for (i in 1:length(ActiveSmiles))

> fp.sim <- fp.sim.matrix(unlist(cmp.fp))
> fp.dist <- 1 - fp.sim

Now that the similarity matrix is generated, the structures can be clustered. A heat map was generated of the Tanimoto similarity matrix. The process described above was repeated for the inactive compounds found in the datasets for biological target dihydroorotate dehydrogenase.

Observations: There was high similarity among the active compounds, represented by the yellow and white coloring (Figure 1). It was very difficult to infer useful information from the inactive compound heatmap (Figure 2). There were only 3 compounds from the dataset that were tagged as inactive. Most compounds that were not labeled as Active or Inactive were tagged as ‘Inconclusive’ or ‘Unspecified’. Of the 3 inactive compounds, all structures were very dissimilar (<0.4).


Figure 1. Active Compound Heatmap

inactives Rplot.png

Figure 2. Inactive Compound Heatmap