Quantcast's Jim Kelly joins Wikibon's Dave Vellante and SiliconAngle's John Furrier, inside theCUBE at Strata Conference 2013. Quantcast has has Big Data in its sights for years.
Jim Kelly is the Vice President of Research and Development at Quantcast.
Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it.
QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting. Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense. So with QFS, a goal was to build a more efficient file system that makes better use of space.
QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS.
The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it's half.
QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too.
Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers:
50 terabytes of data in the door per day
avg. day process over 20 petabytes
1000 machines (reasonably modest commodity hardware)
While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara. If you don’t think you received an email check your
spam folder.
Sign in to O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara. If you don’t think you received an email check your
spam folder.
Sign in to O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara
Please sign in with LinkedIn to continue to O'Reilly Strata Conference + Hadoop World 2013 | Santa Clara. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Jim Kelly | Strata Data Conference 2013
Quantcast's Jim Kelly joins Wikibon's Dave Vellante and SiliconAngle's John Furrier, inside theCUBE at Strata Conference 2013. Quantcast has has Big Data in its sights for years.
Jim Kelly is the Vice President of Research and Development at Quantcast.
Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it.
QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting. Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense. So with QFS, a goal was to build a more efficient file system that makes better use of space.
QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS.
The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it's half.
QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too.
Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers:
50 terabytes of data in the door per day
avg. day process over 20 petabytes
1000 machines (reasonably modest commodity hardware)
While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.