Matei Zaharia, Databricks | Spark Summit East 2016
01. Matei Zaharia, Databricks, visits #theCUBE!. (00:19) 02. The Difference Betwen Streaming and Real-Time. (01:11) 03. Practical Applications of the Spark Engine Operating in Batches. (03:10) 04. Spark Response in IOT Situations. (05:02) 05. The Importance of Application in Evaluating Real-Time Situations. (06:05) 06. Spark SQL Optimizer Use Cases. (07:36) 07. Scale: Traditional Databases vs. Spark. (09:41) 08. The Addition of Dataframes. (12:03) 09. Spark Possibilties in Machine Learning. (14:03) 10. How Fast is Spark Growing?. (16:40) 11. Engaging a New Class of Users with a Spark-Powered Dashboard. (18:04) Track List created with http://www.vinjavideo.com. https://siliconangle.com/2016/02/17/spark-solutions-making-life-easier-for-data-scientists-sparksummiteast/  --- --- Spark solutions: Making life easier for data scientists | #SparkSummit by Betsy Amy-Vogt | Feb 17, 2016 “We are trying to make it easy to plug these models into the same interface …” “We have focused on making it easy for people to contribute to Spark …” “Just compute the thing you need and get out of there …” There’s a common theme to theCUBE’s conversation with Matei Zaharia, CTO of Databricks, Inc. and creator of Apache Spark: making life easier for data scientists. But rather than adding to the pool of existing solutions, Zaharia is focused on more complex processing to solve problems that do not have existing solutions. In an interview at Spark Summit East 2016 at the New York Hilton Midtown in NYC, Zaharia talked with Jeff Frick and George Gilbert, cohosts of theCUBE , from the SiliconANGLE Media team. Realtime vs. streaming “Everyone wants real-time,” stated Gilbert, as he invited Zaharia to clarify the confusion between real-time and streaming. Zaharia described the differences in detail, before the topic moved specifically to Spark streaming and the internal operations of the engine, which is designed for high-volume batch operations. “We design our roadmap and engine based on what people want,” Zaharia said, and most use cases do not require both huge volume and true real-time. The latency of Spark – a few hundreds of milliseconds – is considered real time for most operations. Geeking out with the rockstar of Spark Summit In this in depth discussion, Zaharia – the rockstar of Spark Summit East – shared his knowledge and opinions on Spark SQL [Spark’s module for working with structured data], Catalyst [a query optimization framework for Spark], DataFrames [a distributed collection of data organized into named columns], and machine learning systems, as well as the exponential growth of the Spark community, new solutions from Databricks, and the future of Big Data. @Databricks @SiliconANGLE theCUBE #theCUBE @theCUBE #SparkSummit - #SparkSummit East 2016 - #theCUBE