The Cube - Hadoop Summit 2012 - David Mariani, Klout, with John Furrier and Jeff Kelly
Everybody wants to improve their Klout score, a value that represents a user’s influence across their social network. And while Klout describes to its users how the number is calculated, few people understand how the platform behind the score really works. To lend some insight, Dave Mariani, vice president of engineering at Klout, joined John Furrier and Jeff Kelly at The Cube, broadcasting during Hadoop Summit 2012 in San Jose, Calif (full video below).
Mariani explained how Hadoop’s distributed file system has enabled his start-up to not just process, but also store data cheaply. Hadoop is horizontally scalable, meaning if an organization wants to increase the capacity or speed to process its data, it can increase the number of machines in its Hadoop cluster without changing anything in the underlying software.
Hadoop lets small companies wrestle with huge amounts of data. Klout prefers to work with Hadoop inside its own hosted data center, but for organizations lacking the the resources that Klout has at its disposal, Hadoop can run on top of Amazon EC2. “It’s very inexpensive and very easy out of the gate to get scale,” Mariani said. “We can’t do what we’re doing without Hadoop. We’re out of business without that infrastructure.”
But, Mariani also wasn’t shy to express what he believes Hadoop’s current limitations are, and what he would like to see from the open-source framework moving forward. In a nutshell, platforms like Hadoop — or HBase and Hive, for that matter — lack robust business intelligence capabilities. “You still need schemas on the unstructured data to get the most out of it,” Mariani said.
For a company like Klout, which collects a billion “signals” from its registered users every day, it craves real-time business intelligence to develop better social media analytics that will ultimately lead to more satisfied customers and larger profits for the company. The problem with Hadoop is that it is a batch processing system that struggles in the “real-time world,” Mariani said. As a result, he is waiting for developers to create analytical engines that can run on top of Hadoop to enable it to perform interactive queries.
In the meantime, Klout turns to SQL Server Analysis Services to conduct that sought-after business intelligence. But Mariani would love to see this functionality available in Hadoop. “If you think about what makes Hadoop so great, when you store a piece of data — let’s just say it’s a file — it appears virtually to you as a file…but that actually is distributed across as many nodes as you have in the cluster…So when I do a query…it’s a massive parallel table scan across all these individual hard disks that are out there that I get to take advantage of…So that’s what I want to do with [business intelligence]…versus trying to pipe it and load it into something else.”
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2012 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Hadoop Summit 2012 | San Jose
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Hadoop Summit 2012 | San Jose.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2012 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Hadoop Summit 2012 | San Jose
Please sign in with LinkedIn to continue to Hadoop Summit 2012 | San Jose. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
David Mariani | Hadoop Summit 2012
The Cube - Hadoop Summit 2012 - David Mariani, Klout, with John Furrier and Jeff Kelly
Everybody wants to improve their Klout score, a value that represents a user’s influence across their social network. And while Klout describes to its users how the number is calculated, few people understand how the platform behind the score really works. To lend some insight, Dave Mariani, vice president of engineering at Klout, joined John Furrier and Jeff Kelly at The Cube, broadcasting during Hadoop Summit 2012 in San Jose, Calif (full video below).
Mariani explained how Hadoop’s distributed file system has enabled his start-up to not just process, but also store data cheaply. Hadoop is horizontally scalable, meaning if an organization wants to increase the capacity or speed to process its data, it can increase the number of machines in its Hadoop cluster without changing anything in the underlying software.
Hadoop lets small companies wrestle with huge amounts of data. Klout prefers to work with Hadoop inside its own hosted data center, but for organizations lacking the the resources that Klout has at its disposal, Hadoop can run on top of Amazon EC2. “It’s very inexpensive and very easy out of the gate to get scale,” Mariani said. “We can’t do what we’re doing without Hadoop. We’re out of business without that infrastructure.”
But, Mariani also wasn’t shy to express what he believes Hadoop’s current limitations are, and what he would like to see from the open-source framework moving forward. In a nutshell, platforms like Hadoop — or HBase and Hive, for that matter — lack robust business intelligence capabilities. “You still need schemas on the unstructured data to get the most out of it,” Mariani said.
For a company like Klout, which collects a billion “signals” from its registered users every day, it craves real-time business intelligence to develop better social media analytics that will ultimately lead to more satisfied customers and larger profits for the company. The problem with Hadoop is that it is a batch processing system that struggles in the “real-time world,” Mariani said. As a result, he is waiting for developers to create analytical engines that can run on top of Hadoop to enable it to perform interactive queries.
In the meantime, Klout turns to SQL Server Analysis Services to conduct that sought-after business intelligence. But Mariani would love to see this functionality available in Hadoop. “If you think about what makes Hadoop so great, when you store a piece of data — let’s just say it’s a file — it appears virtually to you as a file…but that actually is distributed across as many nodes as you have in the cluster…So when I do a query…it’s a massive parallel table scan across all these individual hard disks that are out there that I get to take advantage of…So that’s what I want to do with [business intelligence]…versus trying to pipe it and load it into something else.”