Hadoop for Bioinformatics
Bioinformatics applications currently require both processing of huge amounts of data and heavy computation. Fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization model that seems particularly well-suited to this task and for which an open source implementation (Hadoop) is available. Here we report on its application to three relevant algorithms: BLAST, GSEA and GRAMMAR. The first is characterized by relatively low-weight computation on large data sets, while the second requires heavy processing of relatively small data sets. The third one can be considered as containing a mixture of these two computational flavors. Our results are encouraging and indicate that the framework could have a wide range of bioinformatics applications while maintaining good computational efficiency, scalability and ease of maintenance.
Hadoop for Bioinfomatics - Deepak Singh from Cloudera on Vimeo.
This is a roughly 23 minute video of a presentation made by Deepak Singh of Amazon Web Services, courtesy of Hadoop World, that goes into many of the challenges facing this community and shows how Hadoop and Elastic Computing can be game changers.
About 15 minutes into this video, there's an interesting 3D visualization of a running Hadoop job, showing processor nodes as cubes in a spinning pyramid: green nodes are working normally; a node turns black and falls down to the bottom, signalling a failed job on that processor. I thought it was an interesting visualization. But I also found the presentation interesting overall, since I studied molecular biology in grad school and have an interest in bioinformatics. Beyond that, I have an interest, lately, in all things related to scalability. (Let that be a hint of things to come in future blog posts!)
Hadoop for Bioinfomatics - Deepak Singh from Cloudera on Vimeo.
This is a roughly 23 minute video of a presentation made by Deepak Singh of Amazon Web Services, courtesy of Hadoop World, that goes into many of the challenges facing this community and shows how Hadoop and Elastic Computing can be game changers.
About 15 minutes into this video, there's an interesting 3D visualization of a running Hadoop job, showing processor nodes as cubes in a spinning pyramid: green nodes are working normally; a node turns black and falls down to the bottom, signalling a failed job on that processor. I thought it was an interesting visualization. But I also found the presentation interesting overall, since I studied molecular biology in grad school and have an interest in bioinformatics. Beyond that, I have an interest, lately, in all things related to scalability. (Let that be a hint of things to come in future blog posts!)
0 comments:
Post a Comment