Wednesday, May 25, 2016

Computing Significance of Overlap between Two Sets using Hypergeometric Test

There are many cases where we have two sets (e.g. under two different conditions) of things such as transcripts, genes or proteins and we want to compute the significance of the overlap between them. Hypergeometric test is very simple and widely used option for such cases.

I'll use the phyper function in R but you can use the same idea in SciPy (Python).

Let's say you have from 200 genes (A);

  • 10 genes common or overlapping (set B ∩ set C)
  • 25 genes in set B
  • 50 genes in set C
  • 135 genes not in set B or set C

To compute the significance of overlap use;

phyper(10, 50, 200 - 50, 25, lower.tail = FALSE)
[1] 0.0214406

So, if your threshold for p-value is 0.05 (or 5%), then you can say the overlap is significant.

Tuesday, May 10, 2016

ODTÜ Enformatik Enstitüsü'nün 20. Yılı Etkinliği

ODTÜ Enformatik Enstitüsü kuruluşunun 20. yılını bir bilim festivaliyle kutluyor. 16 Mayıs 2016'da, ODTÜ Kültür ve Kongre Merkezi'nde gerçekleştirilecek olan bilim festivaline herkes davetlidir!

Bilime, sanat ve müziğin de eşlik edeceği bu festivalde aşağıdaki ana konuşmacılar yer alacaktır:

  • Prof. Dr. Jennifer Hayes
    New England Microsoft Araştırma ve New York Microsoft Araştırma yönetici ve eş kurucu
  • Assoc. Prof. Claudio Ferretti
    Milano-Bicocca Üniversitesi, Bilgisayar Bilimi, Sistemleri ve İletişimi
  • Dr. Christian Borgs
    Araştırmacı, New England Microsoft Araştırma vekil yönetici ve eş kurucu

Etkinlik programı

Saturday, January 16, 2016

Mann Whitney U Test (Wilcoxon Rank-Sum Test) Javascript Implementation

Currently Javascript is really poor in statistical methods compared to Python (SciPy) and R. There are several efforts to fill this gap, most notably from jStat. However, still many functions, distributions and tests are missing in this library. In one of my projects, I had to implement a Javascript version of Mann Whitney U test (or also called Wilcoxon rank-sum test). Here, I'm giving a link to its source code and describing how it works.


Mann Whitney U test is a nonparametric test with a null hypothesis that two samples belong to the same population. Consider you have two groups of numbers, they don't follow any known distribution and you want to test if they are different. In such cases, you'd use Mann Whitney U test.

The above plots show that between two groups (A/A and G/G, G/A) minimal CD4 levels are significantly different (Kobayashi et al., Jpn. J. Infect. Dis., 55, 131-133, 2002). And their significance are shown at the top as p-values.

This implementation is adapted from SciPy and R source codes and tested in both with several datasets.

GitHub Gist for Mann Whitney U test Javascript implementation (mannwhitneyu.js).

How to use

Just download mannwhitneyu.js file and add a script tag to your HTML pointing the downloaded file and call the test with your datasets. Create a file called index.html and paste the following in it and save. Also make sure you place mannwhitneyu.js next to it.

    <title>Mann Whitney U test</title>
    <script type="text/javascript" src="mannwhitneyu.js"></script>
    <script type="text/javascript">
        var x = [2, 4, 6, 2, 3, 7, 5, 1],
            y = [8, 10, 11, 14, 20, 18, 19, 9];
        var t = mannwhitneyu.test(x, y, alternative = 'less');

Open index.html, and look at Console (Ctrl + Shift + J), you'll see the result.

Object {U: 0, p: 0.0004654861357875073}

This result shows that numbers in x are significantly different and smaller than the ones in y. The alternative argument can also be greater, which is again a one-sided test, testing if the first group has numbers that are significantly different and greater. Also, you may give two-sided as alternative, which will compute a two-sided test.

Soon, I'll send this code to jStat and hopefully it'll be available there. I'm also considering making it as a Node module so that developers can easily include it in their Node projects.