Open Babel: An open chemical toolbox (link)
Noel M O'Boyle1,Michael Banck2,Craig A James3,Chris Morley4,Tim Vandermeersch4andGeoffrey R Hutchison5
Journal of Cheminformatics 2011, 3:33 doi:10.1186/1758-2946-3-33


This paper describes Open Babel(OB) in very detail. OB is one of the open source package in the chemistry field. In 2001, OpenEye decided to build an in-house library and abandon OELib. Therefore, they decide to rewrite a new package based on OELib. Now, OB is an open-source and freely available for not only users but also software develop. It incorporates many different features such as file format converting, fingerprints generating, searching, atom typing, forcefields and so on. I will describe those features more in following paragraphs.

It always comes up a very basic problem when we tried to read information from some file. Chemistry is an active research fields, which have tons of different software and file formats. Some of these file formats are well documented but some are not. It is impossible to write our own file convert to deal with hundreds of format. Now, OB support 111 chemical file formats, including structure, common chemical software, 2D or 3D files. Some of the formats are not directly exchangeable, but OB will try to gather as much information as it can to elucidate chemical information.

OB can generate a fingerprint for chemical compounds. Usually, we will store chemical information into a database, and these fingerprints can help us to search and index the chemical compounds. OB use path-based fingerprint which use substructure and hash function to generate 1024 bit fingerprint. We can utilize these fingerprints for structure matching or substructure searching. It also provides fastindex to speed up the searching.

OB can handle hundreds of formats, but some of the format lack of certain information such as connection table. In this situation, OB can automatically assign bond by using atom distance to construct the atom connectivity. Furthermore, OB can determine the bond order if the user request. This order is based on bond angle and geometries. Finally, atom typing is also automatic assigned by the algorithm.

In above section, I mention about the fingerprint, and that is one way to identify duplicate or substructure. The other way is to generate a canonical representation. For those file formats without build-in coordinate, it is a common problem having non-unique representation. OB uses an algorithm which similar to Morgan to generate a canonical representation. OB also can generate 2D or 3D coordinates for molecular from 0D formats.

There are two recently include features, stereochemistry and forcefields. Stereochemistry is a recent focus of OB. OB can handle cis/trans double bond stereochemistry tetrahedral stereochemistry and partial square-planar stereochemistry. Forcefields features is implemented using GAFF algorithm and several conformer searching methods, which based on the torsion-driving approach utilize forcefields as well.

OB is implemented in standard C++, which is platform independent. The goals of design class architecture are extensible and separable. The file format convert is designed as plug-in type, and each format can be a different plug-in. This can give developer flexibility to include wanted and exclude unwanted ones. Furthermore, it allows 3rd-party to build their own plug-in for OB.

Software validation is also a major point in this version of OB. They run unit test for nightly build. It is always true that before writing software start write the test case first. OB use 18084 3D structures from PubChem3D and use those as the test set. The accuracy keeps going down from the early version to the latest version. It receives 99.99992% success on the eMolecules catalogue.

OB provides not only programming library but also GUI and command line interface. It is good for providing various interfaces for users. It can the maximize number of users. Because it is platform independent, OB can run on Windows, Linux, OSX. It also can work with various programming languages such as C++, C# and Java and even scripting language such as perl, python, Ruby.

Open Babel is a great tool and help users, researchers and developers to achieve their goals rapidly. This tool still under maintenance, and it might add more fantastic features in future build.