Saturday, June 27, 2015

Convert XLS/XLSX to CSV in Bash

In most of the modern Linux distributions, Libre Office is available and it can be used to convert XLS or XLSX file(s) to CSV file(s) in bash.

For XLS file(s):

for i in *.xls; do libreoffice --headless --convert-to csv "$i"; done

For XLSX file(s):

for i in *.xlsx; do libreoffice --headless --convert-to csv "$i"; done

You may get following warning but it still works fine:

javaldx: Could not find a Java Runtime Environment!
Warning: failed to read path from javaldx

You'll see a standard output similar to following when the conversion is successful:

convert /home/gbudak/Downloads/cmap/cmap_instances_02.xls -> /home/gbudak/Downloads/cmap/cmap_instances_02.csv using Text - txt - csv (StarCalc)

Sunday, June 7, 2015

Obtaining Molecule Description using Open Babel / PyBel

Open Babel is a great tool to analyze and investigate molecular data (.MOL, .SDF files). Its Python API is particularly very nice if you are familiar with Python already. In this post, I'll demonstrate how you can obtain molecule description such as molecular weight, HBA, HBD, logP, formula, number of chiral centers using PyBel.

Installation

$ sudo apt-get install openbabel python-openbabel

Usage for MW, HBA, HBD, logP

After reading .MOL file, we need to use calcdesc method with descnames argument for getting the descriptions. Getting formula is different, though.

gungor@gungors-mint ~/Desktop $ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pybel import *
>>> mol = readfile('mol', 'benzene.mol').next()
>>> desc = mol.calcdesc(descnames=['MW', 'logP', 'HBA1', 'HBD'])
>>> desc['formula'] = mol.formula
>>> print desc
{'HBA1': 0.0, 'HBD': 0.0, 'MW': 78.11184, 'logP': 1.6865999999999999, 'formula': 'C6H6'}

Usage for Number of Chiral Centers

This is not included in PyBel, or I couldn't find it but Open Babel comes with obchiral terminal command which returns the chiral atoms found in the molecules. We can parse this output to get the number of chiral centers.

>>> from subprocess import Popen, PIPE
>>> p = Popen(['obchiral', 'brg_flap_1.mol'], stdout=PIPE, stderr=PIPE)
>>> stdout, stderr = p.communicate()
>>> chiral_num = stdout.count('Is Chiral')
>>> print chiral_num
1

So you can use above codes to get molecule description including molecular weight, HBA, HBD, logP, formula, number of chiral centers.