Harness the power of Hadoop
According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants to harness the power of Hadoop. He suggests 3 important steps for working with big data,
What is the facebook approach to big-data:
- Getting the data into your computation system (a huge part of the challenge).
- Defining the analytics you want to do.
- Visualizing the data to make it accessible.
What is the facebook approach to big-data:
Some interesting Big Data Links
Closing the gap between big data and people who need It
According to Stefan Groschupf, the CEO of Datameer there is “a gap between the data and the people that want to get close to it”, particularly those who wants to harness the power of Hadoop. He suggests 3 important steps for working with big data,- Getting the data into your computation system (a huge part of the challenge).
- Defining the analytics you want to do.
- Visualizing the data to make it accessible.
Facebook Approach to the Big Data
Facebook’s design team manager talks about the use of data to understand how people are using different features. Seems like Facebook’s being data informed approach is working better than Google’s data driven strategy. In addition, in a detailed post Dhruba Borthakur highlights how Facebook is using their Hadoop clusters to handle and anylyse the big data generated by Facebook users,Apache Hadoop is being used in three broad types of systems: as a warehouse for web analytics, as storage for a distributed database and for MySQL database backups.
So how large these Hadoop clusters are? Well according to the article,
Data warehousing clusters are largely batch processing systems where the emphasis is on scalability and throughput. We have enhanced the NameNode’s locking model to scale to 30 petabytes, possibly the largest single Hadoop cluster in the world. We have multiple clusters at separate data centers and our largest warehouse cluster currently spans three thousands of machines. Our jobs scan around 2 petabytes per day in this warehouse and more than 300 people throughout the company query this warehouse every month.
0 comments:
Post a Comment