Doug Cutting, Hadoop Creator - Hadoop World 2010 - theCUBE
Doug Cutting of Cloudera joins SiliconAngle's John Furrier and Wikibon's Dave Vellante at Hadoop World 2010. Doug brought with him Hadoop, his son's elephant doll and the inspiration for the Hadoop name.
John Furrier and Dave Vellante had the opportunity to sit down with Hadoop creator Doug Cutting for an episode of Inside the Cube at Hadoop World 2010 to share the Hadoop movement.
Furrier asks Cutting to describe how it all started. "it started a logn time ago, I was working on [another project] and trying to build an open source equivalent of Google... that indexes the entire web." It couldn't be done on one machine, figuring it was maybe a couple billion pages, but by the time they really got moving, it was tens of billions of pages that needed to be constantly refreshed every few weeks. They got it running on four machines and some 100 million pages, and then Google published a paper on how they were doing the very same thing about the Google File System. Cutting then got to thinking about how do to something like that, and about a year later Google publishes another paper on how to process the data, which was exactly what they needed, algorithms they were using supported by the framework and all automated, eliminating the person who had to run from machine to machine and keep everything running smoothly. So him and Mike Caperella got working on augmenting this eventually having it running on twenty machines, and "it became clear that we weren't going to get to running on thousands of machines without a lot of work..." and around that time Yahoo! found them and met to talk.
Yahoo! was interested in the technology that Cutting had developed and had a team of people—which Cutting needed—who were ready and able to work. Yahoo! didn't want the search specific parts, they didn't want to replace their own system but they wanted a better distributive file and map produce system which turned into Hadoop, named after a green elephant beloved by Cutting's son, now the mascot. Now the movement has, in the words of Furrier, "gone to the extreme."
Cutting wanted to create a better web search solution, and saw this as a general-purpose tool for that application. Hadoop is a general sort of technology that can be altered to solve the users' particular problem. Once they get their problem solved, they start to find other problems that they might not even have recognized as existing, and can then use Hadoop to form a solution for the newer problems.
So many people use open source software, that that becomes your success, when the software isn't open, it doesn't get the developers and
followers that allows the sort of explosions that Hadoop had zeroed in on. Code is like a baby, and when it dies with a bankrupt company (such as Cutting has experienced) it's a sad loss, open source can help to prevent that type of painful death. Apache, which lets people to do what they want to with the software free of charge and with freedom, changed the field, creating a community that is collaborating in their interest. Yet the one area that they run into problems with the trademark, to solve that problem in the license says that you cannot use the name Hadoop, but the code is not solely the property of the name. If the community were to stagnate then the hope of Cutting is that someone takes what they've created and create something bigger and better, give it a new name, and invite the community into it to make it better.