Name: Michael Armbrust, Databricks | Spark Summit 2016
Uploaded: 2016-06-08T18:52:00.000Z
Duration: 19 min 4 s

Michael Armbrust, Databricks | Spark Summit 2016

01. Michael Armbrust, Databricks, visits #theCUBE!. (00:17) 02. Databricks' Demo on Data Analytics with Spark. (01:30) 03. Structured API and Structured Streaming. (02:15) 04. The Shift to Streaming and Continuous Apps. (03:25) 05. The Vision for Computational Documents. (05:41) 06. Communicating the Value of Streaming to Clients. (06:48) 07. Areas of Focus: Structured Streaming. (08:12) 08. How Structured Streaming Will Change Applications. (10:12) 09. Use Case Examples with Streaming API. (13:07) 10. Factoring Open Source into Future Planning. (15:02) 11. The Mixing of Data; Bringing it All Together. (17:42) Track List created with http://www.vinjavideo.com. --- --- Beyond batch with Spark 2.0: The new continuous data application | #SparkSummit by R. Danes | Jun 8, 2016 Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable. Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers. “As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said. Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said. The opensource win-win Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said. He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said. #SparkSummit #theCUBE

Share this session