Harness the power of Hadoop

According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants to harness the power of Hadoop. He suggests 3 important steps for working with big data,
  • Getting the data into your computation system (a huge part of the challenge).
  • Defining the analytics you want to do.
  • Visualizing the data to make it accessible.          
      I share some interesting big data links with you.


What is the facebook approach to big-data:

Some interesting Big Data Links

Closing the gap between big data and people who need It

According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants to harness the power of Hadoop. He suggests 3 important steps for working with big data,
  • Getting the data into your computation system (a huge part of the challenge).
  • Defining the analytics you want to do.
  • Visualizing the data to make it accessible.

Facebook Approach to the Big Data

Facebook’s design team manager talks about the use of data to understand how people are using different features. Seems like Facebook’s being data informed approach is working better than Google’s data driven strategy. In addition, in a detailed post Dhruba Borthakur highlights how Facebook is using their Hadoop clusters to handle and anylyse the big data generated by Facebook users,
Apache Hadoop is being used in three broad types of systems: as a warehouse for web analytics, as storage for a distributed database and for MySQL database backups.
So how large these Hadoop clusters are? Well according to the article,
Data warehousing clusters are largely batch processing systems where the emphasis is on scalability and throughput. We have enhanced the NameNode’s locking model to scale to 30 petabytes, possibly the largest single Hadoop cluster in the world. We have multiple clusters at separate data centers and our largest warehouse cluster currently spans three thousands of machines. Our jobs scan around 2 petabytes per day in this warehouse and more than 300 people throughout the company query this warehouse every month.

0 comments:

Post a Comment

Translate this blog:

Copyrights

Disclaimer

No responsibility is taken for any potential inaccuracies and/or errors in the text, and any damages that are incurred through the use of this material.Most of the material is sourced from internet and taken from other scientific blogs(text or idea;as well as pictures)& if you find any of your copyright material which you donot wish to appear on this blog,kindly inform at the e-mail id:ooogyx@gmail.com and it will be promptly removed.All the opinions expressed on this blog are solely of the author and not of any organization or institution.

  © MAD_HELIX

Design by OogYx