Sunday, August 30, 2015

Generating 2D Images of Molecules from MOL Files using Open Babel

Open Babel is a tool to work with molecular data in any way from converting one type to another, analyzing, molecular modeling, etc. It also has a method to convert MOL files into SVG or PNG images to represent them as 2D images.

Install Open Babel in Linux as following or go to their page for different operating systems

sudo apt-get install openbabel

Open Babel uses the same command to generate SVG or PNG and recognizes the file format using the given filename to as the output option (-O). Also, it's possible to generate transparent images in SVG using option "-xb" with a value "none". This doesn't work for PNGs. There are also other options, one of which is "--title" to write the name of the molecule to the image. Leave it blank if you don't want to see any title.

An example PNG generation from MOL file of benzene molecule.

gungor@gungors-mint ~/Desktop $ obabel benzene.mol -O benzene.png --title Benzene
1 molecule converted

To see all other options for available image formats, follow the Open Babel documentation Image formats page.

Simple Way of Python's subprocess.Popen with a Timeout Option

subprocess module in Python provides us a variety of methods to start a process from a Python script. We may use these methods to run an external commands / programs, collect their output and manage them. An example use of it might be as following:

from subprocess import Popen, PIPE 
p = Popen(['ls', '-l'], stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate()
print stdout, stderr

These lines can be used to run ls -l command in Terminal and collect the output (standard output and standard error) in stdout and stderr variables using communicate method defined in the process.

However, if the ls command never ends, we will never get any output because the external program just hangs. This happens sometimes with some programs e.g. trying to generate 3D molecule from 2D MOL file using Open Babel when the MOL file is not correctly formed. As the Open Babel just tries to generate 3D and doesn't check if the MOL file is okay, we never get an output from that run. So the program and our script hang.

Python has many different options to solve this problem like using multiprocessing or threading module, signal module etc. But here I'll describe a very simple method that worked for me fine. This is tested in Linux environment with Python version 2.7.

from time import sleep
from subprocess import Popen, PIPE 
def popen_timeout(command, timeout):
    p = Popen(command, stdout=PIPE, stderr=PIPE)
    for t in xrange(timeout):
        if p.poll() is not None:
            return p.communicate()
    return False 
print popen_timeout(['python',
                    '/home/gungor/Desktop/'], 25)

import time 
for i in xrange(10):
    print "Gungor", i

Example run 1 where the external program doesn't exceed the timeout threshold

gungor@gungors-mint ~/Desktop $ python
('Gungor 0\nGungor 1\nGungor 2\nGungor 3\nGungor 4\nGungor 5\nGungor 6\nGungor 7\nGungor 8\nGungor 9\n', '')

In this example, I ran with 25 seconds timeout on an external program ( which runs for 20 seconds and outputs lines of strings to the standard output which are collected with communicate method by

Example run 2 where the external program would take longer than the timeout threshold

gungor@gungors-mint ~/Desktop $ python

The just returns False. Because the external program was still running when the timeout has been achieved and it has been killed afterwards.

You can use this method to control the execution of the external programs in Python.