Hadoop for Bioinformatics

Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.


Hadoop for Bioinfomatics - Deepak Singh from Cloudera on Vimeo.

This is a roughly 23 minute video of a presentation made by Deepak Singh of Amazon Web Services, courtesy of Hadoop World, that goes into many of the challenges facing this community and shows how Hadoop and Elastic Computing can be game changers.
About 15 minutes into this video, there's an interesting 3D visualization of a running Hadoop job, showing processor nodes as cubes in a spinning pyramid: green nodes are working normally; a node turns black and falls down to the bottom, signalling a failed job on that processor. I thought it was an interesting visualization. But I also found the presentation interesting overall, since I studied molecular biology in grad school and have an interest in bioinformatics. Beyond that, I have an interest, lately, in all things related to scalability. (Let that be a hint of things to come in future blog posts!) 

0 comments:

Post a Comment

Translate this blog:

Copyrights

Disclaimer

No responsibility is taken for any potential inaccuracies and/or errors in the text, and any damages that are incurred through the use of this material.Most of the material is sourced from internet and taken from other scientific blogs(text or idea;as well as pictures)& if you find any of your copyright material which you donot wish to appear on this blog,kindly inform at the e-mail id:ooogyx@gmail.com and it will be promptly removed.All the opinions expressed on this blog are solely of the author and not of any organization or institution.

  © MAD_HELIX

Design by OogYx