Omer Trajman - Intel Developer Forum 2012 - theCUBE
John Furrier speaking with Omer Trajman, Customer Solutions for Cloudera at Intel Developer Forum, 2012.
At IDF 2012 John Furrier interviews Cloudera's Omer Trajman in the developer zone. The two shed some light on the evolution of storage and computing, Cloudera, Hadoop, datacenters, and Intel's role in carrying our data forward. Furrier opens the interview by touching on fourth generation processors, its effect on opens source and how that impacts Cloudera.
Trajman responds, elaborating on the connection between the demand from improved processors and Cloudera's role in Intel. As more types and varieties of data are created, greater pressure is placed on the classic data center model. This causes the desire for industry standard hardware and industry standard components. Although there will continue to be variations such as spinning disk or flash which will take over as the next generation, we're now seeing more types of storage heavy or compute heavy applications.
Intel is moving from a component player to a more of a data center player. Furrier asks Omer where and if he sees an architectural change that impacts flash and solid state given Intel's change of focus. Flash and solid states are still moving forward, states Trajman. Disk is still in there to some extent, but the move is away from separating storage and computing and towards using standardized components to bring them together. By putting the intelligent software with the servers you'll need a lot less separate storage.
Furrier then mentions the first Hadoop World where Abhi Mehta coined the term data factories. He then asks how that has evolved today and how do you see people organizing their data? "Instead of data flowing freely throughout the enterprise... its getting centralized on joint storage and compute architectures," Trajman responds. We're seeing data being put in the same place that you compute, and that becomes your data hub, data factory, or data reservoir where you have pristine data.
The interview switched gears and the two began to discuss the difference between Hadoop and Mongo in data analytics. Mongo is very useful; it's a very important specialized engine. It solves a lot of interesting problems, primarily in the document and application serving space. Mongo typically gets seen on the front end serving an application. "Hadoop 's philosophy is that there isn't any single engine that will solve all problems." When u generate a lot of data, you need to compute the data and get it back into your application, that part would happen on a Hadoop system.
Furrier then asks for an update on Cloudera and their growth since their inception, as well as their work on Apache in the open source arena. Trajman was happy to elaborate stating, "We are growing fast we're close to around 300 employees, which for a four year company is pretty wild growth." A lot of the investment has been on the open source side. Over one third of the company is just focused on building great open source software that people can use. We spend a lot of our time contributing to the Apache open source community. CDH 4 now has Apache high availability.
Lastly, Trajman talks about what separates Cloudera from its peer corporations such as MapR, EMC, Greegplum and Hortonworks. In addition, Furrier asks Trajman to tell us how relevant is HBase, part of the Holy trinity of Hadoop, despite the criticism regarding its scalability or versatility. Its very critical, replied Trajman. Hbase is modeled on what Google created when they created Big Table. Today Big Table powers a lot of their applications and infrastructure. We're seeing HBase power equivalent things outside of Google, for example, the messages on Facebook. Hbase touches and impacts a lot more people in different ways than they realize. It is great for real-time "atomic" access to data. While it differs from classic relation based systems, HBase is a lot more focused on discreet access.