Jim Kelly | Strata Data Conference 2013
Quantcast's Jim Kelly joins Wikibon's Dave Vellante and SiliconAngle's John Furrier, inside theCUBE at Strata Conference 2013. Quantcast has has Big Data in its sights for years. Jim Kelly is the Vice President of Research and Development at Quantcast. Quantcast, by its own admission, has been dealing in Big Data since 2006 — before it was cool. Jim Kelly, VP Research and Development at Quantcast stopped by theCube during Strata last month to give some background on the Quantcast File System (QFS). As an alternative to HTFS and free to the open source community, Quantcast hopes to deliver better cost efficiencies at large scale to anyone who adopts it. QFS started 5-6 years ago when Quantcast began innovating a lot of technologies internally to handle the volume they were getting. Released in September 2012, it is a direct alternative to HTFS. A problem QFS is trying to fix is that Big Data sets tend to grow and have high operating costs. Power computing can quickly become a six- to seven-figure monthly operating expense. So with QFS, a goal was to build a more efficient file system that makes better use of space. QFS effectively doubles storage capacity of a Hadoop cluster compared to stock HTFS. The #1 challenge in designing a distributive file system is fault tolerance. Software needs to tolerate bits of your data going missing. HTFS makes 3 copies. QFS uses read Reed-Solomon Encoding (same used in CDs, DVDs). Big space savings, 1.5 copies, so relative to HTFS it's half. QFS uses data slices and parody slices (six data slices and three parody slices) to nine separate places by default. If QFS can read any six, it can reconstruct the data. HTFS you can only lose two, thus QFS has a better fault tolerance too. Here are some interesting factoids that show host Dave Vellante got Kelly to confirm during the interview as far as Quantcast numbers: 50 terabytes of data in the door per day avg. day process over 20 petabytes 1000 machines (reasonably modest commodity hardware) While he remained vague, Kelly said that Quantcast would measure success by the number of high quality collaborators that help extend the product together. File systems are an especially critical piece of the infrastructure puzzle. QFS stands to benefit from the scrutiny of open source, and Hadoop will benefit from having a file system that runs its framework. The giveback of QFS to open source is a win-win for all.