01. Michael Armbrust, Databricks, visits #theCUBE!. (00:17)
02. Databricks' Demo on Data Analytics with Spark. (01:30)
03. Structured API and Structured Streaming. (02:15)
04. The Shift to Streaming and Continuous Apps. (03:25)
05. The Vision for Computational Documents. (05:41)
06. Communicating the Value of Streaming to Clients. (06:48)
07. Areas of Focus: Structured Streaming. (08:12)
08. How Structured Streaming Will Change Applications. (10:12)
09. Use Case Examples with Streaming API. (13:07)
10. Factoring Open Source into Future Planning. (15:02)
11. The Mixing of Data; Bringing it All Together. (17:42)
Track List created with http://www.vinjavideo.com.
--- ---
Beyond batch with Spark 2.0: The new continuous data application | #SparkSummit
by R. Danes | Jun 8, 2016
Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable.
Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers.
“As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said.
Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said.
The opensource win-win
Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said.
He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said.
#SparkSummit
#theCUBE
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit 2016 | San Francisco. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Spark Summit 2016 | San Francisco
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Spark Summit 2016 | San Francisco.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit 2016 | San Francisco. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Spark Summit 2016 | San Francisco
Please sign in with LinkedIn to continue to Spark Summit 2016 | San Francisco. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Michael Armbrust, Databricks | Spark Summit 2016
01. Michael Armbrust, Databricks, visits #theCUBE!. (00:17)
02. Databricks' Demo on Data Analytics with Spark. (01:30)
03. Structured API and Structured Streaming. (02:15)
04. The Shift to Streaming and Continuous Apps. (03:25)
05. The Vision for Computational Documents. (05:41)
06. Communicating the Value of Streaming to Clients. (06:48)
07. Areas of Focus: Structured Streaming. (08:12)
08. How Structured Streaming Will Change Applications. (10:12)
09. Use Case Examples with Streaming API. (13:07)
10. Factoring Open Source into Future Planning. (15:02)
11. The Mixing of Data; Bringing it All Together. (17:42)
Track List created with http://www.vinjavideo.com.
--- ---
Beyond batch with Spark 2.0: The new continuous data application | #SparkSummit
by R. Danes | Jun 8, 2016
Building the perfect data application is tricky business. Long hours are spent figuring out what data to use, wrangling and aggregating, writing code — and then new, perhaps contradictory, data arrives upsetting the model at its foundation. The fluctuating nature of data requires applications that are similarly changeable.
Michael Armbrust, software engineer and lead developer of the Spark SQL project at Databricks, Inc., said this very problem led to the development of Spark 2.0. He told John Walls and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during Spark Summit 2016 about a common problem he’d run into with customers.
“As soon as they get it working in batch mode, you immediately have the question, ‘Wait, but new data arrived. What’s the answer now?’ And typically, this was starting from scratch,” he said.
Armbrust said that batch should be looked at as a “sandbox” where you experiment and figure out what type of application you need. Then, using the exact same code, make that application streaming and continuous using Spark’s new tools. “The Spark optimizer — this thing we call Catalyst — should be able to figure out how to do that incrementalization,” he said.
The opensource win-win
Armbrust spoke enthusiastically about Databricks’ Community Edition, a new free cloud-based, big data, open-source platform. “Anybody can use this for free. You sign up. You get six gigabyte clusters. All you need is an email address,” he said.
He stated that open source has always been a core value for Spark and Databricks. He said that opening their software to the community allows users to give back by saying, “Hey, you’re missing this optimization,” and adding it. “That is the power of opensource. I think that alone is going to give us a velocity that’s hard to match in closed-source software,” he said.
#SparkSummit
#theCUBE