Thursday, June 27, 2013

Retrieving Data with AJAX using jQuery, PHP and MySQL

Last semester, I took a course from Informatics Institute at METU called "Biological Databases and Data Analysis Tools" where first we learned what is a database and how to do queries on it. Also, the technology behind databases are taught. Then, we learned many biological databases and data analysis tools available. These include gene, protein and pathway databases, tools for creating databases.

As a final project, we were asked to create an online tool that can search a database and get the data and display it on any web browsers. For that, we were given a table and using that table and some given conditions, we retrieved another table from Biomart Ensembl and created the database. Then, in searching and displaying the data, we created a user interface.

MySQL database was used and PHP was the choice of programming language, which is powerful and good in web programming.

For our task, we implemented AJAX using jQuery (a JavaScript library). The purpose was to make the search process easy and fast. In this way, the search was triggered when the first three letters from the query were entered and at the same time, the result was displayed on the page without page refresh.

The project is available online on this website.

To do this, as I said, we used AJAX calls. AJAX works on client-side and provides asynchronous calls from the server without any intervention on the current state on the page. That is, to get the data, we don't have to stop viewing the page and get it, then refresh the page with the data. So, in this way, without page refresh it is possible to get any data from the database.

The method includes the use of jQuery method "ajax". This method gets the necessary information from user, sends it to a scripting language that will work on server such as PHP, and at the end, retrieves the result from the server-side language and shows it to the user.

In the function below, the script gets the value in the text field with id "query" and also the names of check boxes (which determine which column to get from the database) and stores them in an object called "data". Then, when the value of "query" is greater than 2, it executes "ajax" method where submission type is set as "post", the script that weill interact with the database is set as "/process.php", data is given and a callback function which will insert the result into a div with id "results".

function getResults() {
var data = {};
data['query'] = $("#query").val().trim();
var boxes = $("input[name=options]:checked");

$.each(boxes, function(key, value){
data[key] = $(value).val();
});

if (data['query'].length > 2) {

$.ajax({
url: '/process.php',
type: 'post',
data: data,
success: function(response) {
$("#results").html(response);
}
});
}
}


And in process.php file, the data coming from AJAX is accepted and used to query the database. Basically, while the data is being retrieved the html codes for the insertion is generated and echoed out at the end. This is how jQuery gets it and displays.

This JavaScript function is able to get the results but it should be somehow executed. There are meny possibilities for this. Typing letters into text field, pasting something into text field, pressing enter, and changing search options are the ones I did. There are great jQuery methods for these, and really simple. We used change(), bind(), on() and keypress() methods.

For example, below you can see how ENTER key (the number 13 indicates it) is used to trigger the function. And note that here we prevent the key from submitting form by returning false.

$("#query").keypress(function(action) {

if(action.which == 13) {
getResults();
return false;
    }
});


The use of others can be found on jQuery documentation.

If you have any question about this post, please leave a comment below.

Using Online Tools for Teaching Bioinformatics

I attended one of science cafe meetings of BiGCaT group today and we discussed use of online tools for teaching bioinformatics.

Andra Waagmeester (PhD student form BiGCaT) introduced Rosalind Project as a teaching tool. This project mainly focuses on bioinformatics solutions. Various questions about bioinformatics are asked on the website. Actually, those are various problems that can be seen in any bioinformatics research and by solving them, it helps you learn bioinformatics.

On the website, it is possible to start a class (with a faculty member account) and generate a curriculum with the desired content from the project. It is also possible to post new problems. There is also discussion part where one can ask questions about problems and look for help. The replies to those can be up or downvoted so it can generally be useful.

On Rosalind, problems can be solved using any programming language but they said it's optimized to Python, so Python should be better choice. Among problems, one set is about learning Python, which is good. There are also two sets Bioinformatics Stronghold and Bioinformatics Armory where you build up bioinformatics knowledge.

There is also Code Academy where both learning and teaching programming & coding are possible. It doesn't focus on bioinformatics specifically but it might be used to teach bioinformatics so it's worth checking out.

The use of online tools as a studying platform in classes is really novel and I think it should be done by including this study to grading (which might raise the interest of students to the study). However, there are also some issues about it, which are discussed in the meeting. One is the stability and availability of the tools for whole course time. Second is the availability of solutions on web such as on GitHub. There can be these problems of using these tools for teaching.

Network Inference DREAM Breast Cancer Challenge

The inference of causal edges are described as the change on a node seen after the intervention of another node. If the curves obtained over time overlap (under intervention or no intervention), then there is no relation. Otherwise, we can draw an edge between those nodes and according to the level, up or down, the edge will be activating or inhibiting. These causal edges are context-specific so in different cell line data, we may have different relations.

Also, edge confidence scores should be obtained. Right now, I have no idea how to get them but we will discuss.

The relations and scores will be stored in SIF and EDA files and submitted to the competition.

This can be done by writing scripts specific to the task. However, before that I have looked for existing tools. I have found some. There is an R package called RPPAnalyzer which is designed to read RPPA result and compare the samples and plot a graph at the end but this is not exactly what we need in this challenge (See its CRAN page). Another R package specifically written for constructing signaling networks is present, and it is called ddepn (Dynamic Deterministic Effects Propagation Networks). It infers signalling networks for time-course RPPA data (See its R-Forge page).

So I started with ddepn. I installed the package (R version 3.0.1). And before using our data, I used the example described in its vignette. A similar example is also present in its documentation. However, I got an error before plotting the network. When I attempted to run ddepn function to apply genetic algorithm, it gives  "Error in get("envDDEPN") : object 'envDDEPN' not found". So I have to find a way to solve it, then I can move on doing the same for our data.

It looks like the inference I got from this step will be necessary for the next step with the data files. Because for the next step the use of CellNOptR package (See its official page) is suggested and it needs noth network and data to make predictions.

DREAM Breast Cancer Sub-challenges

I have been going over the sub-challenges before attempting to solve them. As I mentioned, there are three sub-challenges and somehow they are connected.

First, using given data and other possible data sources such as pathway databases, the causal signaling network of the phosphoproteins. There are 4 cell lines and 8 stimulus so they make total 32 networks at the end. Nodes are phosphoproteins and edges should be directed and causal (activator or inhibitor).

4 different treatments are applied to samples before stimulation, these are inhibition treatments and one of them is vehicle control (DMSO). After that, the samples are stimulated and their levels are measured in different time points.

This sub-challenge has also another part where in silico data is provided and only one network inference using the data is asked. The characteristics of this data is different. The training dataset has time points for 20 phosphoproteins under various stimuli and inhibition of nodes (See Sub-challenge 1: Network Inference for more).

Second, predictions on phosphoprotein trajectories should be made. Also, it's asked to propose a model that can cover beyond of this data (breast cancer proteomics and in silico datasets) (See Sub-challenge 2: Time-course Prediction for more).

Third, visualization of the data should be made to be interpreted in meaningful ways. This is only for breast cancer proteomics dataset (See Sub-challenge 3: Visualization for more).

Wednesday, June 26, 2013

HPN-DREAM Breast Cancer Network Inference Challenge

Understanding signaling networks might bring more insights on cancer treatment because cells respond to their environment by activating these networks and phosphorylation reactions play important roles in these networks.

The goal of this challenge is to advance our ability and knowledge on signaling networks inference and protein phosphorylation dynamics prediction. Also, we are asked to develop a visualization method for the data.

The dataset provided is extensive and a result of RPPA (reverse-phase protein array) experiments. It has four (breast cancer) cell lines, each has proteomics data obtained under 3 different inhibitors and one control (DMSO) and 8 different stimuli over 7 time points. And each contains levels of about 45 phosphoproteins. There is also additional dataset with all proteins measured (phosphorylated forms and total proteins) later time points. Moreover, there is an in silico data with similar characteristics (See Data Description on Synapse).

RPPA is a method to quantitate protein levels in lysates from cells or tissues. A video about this technique can be watched on this link.

Using this data, we are asked to complete three sub-challenges.

(1) Network Inference: Modeling causal signaling networks from training data
(2) Time-course Prediction: Prediction of trajectories of protein levels following inhibitor perturbation(s) not seen in the training data
(3) Visualization: Designing a visualization strategy for high-dimensional molecular time-course data sets such as the ones used in this challenge

More information can be found their official website.

And more about the sub-challenges and how we approach to solve them are coming soon.

Dream Challenge

This year, 8th Dream Challenge takes place and I will be working on this project as my internship job in BiGCaT, Bioinformatics, UM. The challenge brings scientists to catalyze the interaction between experiment and theory in the area of cellular network inference and quantitative model building in systems biology (as said on their webpage).

In this competition, I will work on a specific challenge about network modeling, dynamic response predictions and data visualization. The name of this specific challenge is "HPN-DREAM breast cancer network inference challenge" and the information can be found on this Synapse page.

In this blog, I will write the progression of the project, try to explain the steps and the tools and methods I use and explain more about the challenge.