Thursday, February 27, 2014

An Exon of Length 2 Appeared in Ensembl

I want to share an interesting finding about our research on exon/intron analysis of human evolutionary history.

So I had the genes that emerged at each pass point of human history and I was using Ensembl API to get exons and introns of these genes to perform further analyses.

There was one gene (ENSG00000197568 - HERV-H LTR-associating 3 - HHLA3) with a surprise. Because it's one transcript (ENST00000432224) had an exon (ENSE00001707577) of length 2. At first I couldn't realize the oddness but later in group discussions it was obvious that an exon with only 2 bases cannot occur.

So we checked different databases (NCBI, UCSC Genome Browser) for the same gene and realized that that exon was not there and their gene finding algorithms placed those 2 bases as a part of an intron and the transcript has one less exon compared to the one in Ensembl databases.

This shows gene finding algorithms are still not in their best forms and different sources need to be checked before going into a conclusion about exons/introns.

Monday, February 24, 2014

How to Convert PLINK Binary Formats into Non-binary Formats

PLINK is a whole genome association analysis toolset and to save time and space, you need to convert your data files to binary formats (BED, FAM, BIM) but of course when you need to view the files, you have to convert them back to non-binary formats (PED, MAP) to be able to open them in your text editor such as Notepad on Windows OS.

This operation is really easy. It requires PLINK of course, and the following line of code written to DOS window (Run -> type cmd; hit ENTER) in the directory of PLINK:

plink --bfile YOUR_BINARY_FILE --recode --out YOUR_NON-BINARY_FILE

First, you need to install PLINK if you don't have.

Note this tut is for Windows OS.

Go to Download section and download the correct version for your system. For Windows OS, it's MS-DOS.

Then, extract it to "C:" folder in your Computer. Make sure that you have plink.exe in the extracted folder. That's it.

To convert your files, start a new DOS window and navigate to your PLINK directory which is "C:\plink-1.07-dos". To do that type:

cd c:\plink-1.07-dos

When you changed the directory to PLINK's dir, you are ready to start conversion.

Not to confuse, it's better to create a folder inside "C:\plink-1.07-dos", say, "files". Then, move BED, FAM and BIM files inside this folder. Then with the code below, you can convert these files into non-binary forms.

plink --bfile files/YOUR_BINARY_FILE_NAME --recode --out files/YOUR_NON-BINARY_FILE_NAME

Change "YOUR_BINARY_FILE_NAME" with the name of your files (they have the same name except for the extension). And change "YOUR_NON-BINARY_FILE_NAME" with anything you want.

Next, hit ENTER and wait for the analysis. After it's done you'll see:

"Analysis finished: CURRENT DATE"

You can navigate to your files folder (C:\plink-1.07-dos\files) and see your non-binary forms PED and MAP.

More about PLINK and information for other operating systems can be found on PLINK website.