01. Matei Zaharia, Databricks, visits #theCUBE!. (00:19)
02. The Difference Betwen Streaming and Real-Time. (01:11)
03. Practical Applications of the Spark Engine Operating in Batches. (03:10)
04. Spark Response in IOT Situations. (05:02)
05. The Importance of Application in Evaluating Real-Time Situations. (06:05)
06. Spark SQL Optimizer Use Cases. (07:36)
07. Scale: Traditional Databases vs. Spark. (09:41)
08. The Addition of Dataframes. (12:03)
09. Spark Possibilties in Machine Learning. (14:03)
10. How Fast is Spark Growing?. (16:40)
11. Engaging a New Class of Users with a Spark-Powered Dashboard. (18:04)
Track List created with http://www.vinjavideo.com.
https://siliconangle.com/2016/02/17/spark-solutions-making-life-easier-for-data-scientists-sparksummiteast/
--- ---
Spark solutions: Making life easier for data scientists | #SparkSummit
by Betsy Amy-Vogt | Feb 17, 2016
“We are trying to make it easy to plug these models into the same interface …”
“We have focused on making it easy for people to contribute to Spark …”
“Just compute the thing you need and get out of there …”
There’s a common theme to theCUBE’s conversation with Matei Zaharia, CTO of Databricks, Inc. and creator of Apache Spark: making life easier for data scientists. But rather than adding to the pool of existing solutions, Zaharia is focused on more complex processing to solve problems that do not have existing solutions.
In an interview at Spark Summit East 2016 at the New York Hilton Midtown in NYC, Zaharia talked with Jeff Frick and George Gilbert, cohosts of theCUBE , from the SiliconANGLE Media team.
Realtime vs. streaming
“Everyone wants real-time,” stated Gilbert, as he invited Zaharia to clarify the confusion between real-time and streaming. Zaharia described the differences in detail, before the topic moved specifically to Spark streaming and the internal operations of the engine, which is designed for high-volume batch operations.
“We design our roadmap and engine based on what people want,” Zaharia said, and most use cases do not require both huge volume and true real-time. The latency of Spark – a few hundreds of milliseconds – is considered real time for most operations.
Geeking out with the rockstar of Spark Summit
In this in depth discussion, Zaharia – the rockstar of Spark Summit East – shared his knowledge and opinions on Spark SQL [Spark’s module for working with structured data], Catalyst [a query optimization framework for Spark], DataFrames [a distributed collection of data organized into named columns], and machine learning systems, as well as the exponential growth of the Spark community, new solutions from Databricks, and the future of Big Data.
@Databricks @SiliconANGLE theCUBE #theCUBE @theCUBE
#SparkSummit - #SparkSummit East 2016 - #theCUBE
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit East 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Spark Summit East 2016 | New York
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Spark Summit East 2016 | New York.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit East 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Spark Summit East 2016 | New York
Please sign in with LinkedIn to continue to Spark Summit East 2016 | New York. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Matei Zaharia, Databricks | Spark Summit East 2016
01. Matei Zaharia, Databricks, visits #theCUBE!. (00:19)
02. The Difference Betwen Streaming and Real-Time. (01:11)
03. Practical Applications of the Spark Engine Operating in Batches. (03:10)
04. Spark Response in IOT Situations. (05:02)
05. The Importance of Application in Evaluating Real-Time Situations. (06:05)
06. Spark SQL Optimizer Use Cases. (07:36)
07. Scale: Traditional Databases vs. Spark. (09:41)
08. The Addition of Dataframes. (12:03)
09. Spark Possibilties in Machine Learning. (14:03)
10. How Fast is Spark Growing?. (16:40)
11. Engaging a New Class of Users with a Spark-Powered Dashboard. (18:04)
Track List created with http://www.vinjavideo.com.
https://siliconangle.com/2016/02/17/spark-solutions-making-life-easier-for-data-scientists-sparksummiteast/
--- ---
Spark solutions: Making life easier for data scientists | #SparkSummit
by Betsy Amy-Vogt | Feb 17, 2016
“We are trying to make it easy to plug these models into the same interface …”
“We have focused on making it easy for people to contribute to Spark …”
“Just compute the thing you need and get out of there …”
There’s a common theme to theCUBE’s conversation with Matei Zaharia, CTO of Databricks, Inc. and creator of Apache Spark: making life easier for data scientists. But rather than adding to the pool of existing solutions, Zaharia is focused on more complex processing to solve problems that do not have existing solutions.
In an interview at Spark Summit East 2016 at the New York Hilton Midtown in NYC, Zaharia talked with Jeff Frick and George Gilbert, cohosts of theCUBE , from the SiliconANGLE Media team.
Realtime vs. streaming
“Everyone wants real-time,” stated Gilbert, as he invited Zaharia to clarify the confusion between real-time and streaming. Zaharia described the differences in detail, before the topic moved specifically to Spark streaming and the internal operations of the engine, which is designed for high-volume batch operations.
“We design our roadmap and engine based on what people want,” Zaharia said, and most use cases do not require both huge volume and true real-time. The latency of Spark – a few hundreds of milliseconds – is considered real time for most operations.
Geeking out with the rockstar of Spark Summit
In this in depth discussion, Zaharia – the rockstar of Spark Summit East – shared his knowledge and opinions on Spark SQL [Spark’s module for working with structured data], Catalyst [a query optimization framework for Spark], DataFrames [a distributed collection of data organized into named columns], and machine learning systems, as well as the exponential growth of the Spark community, new solutions from Databricks, and the future of Big Data.
@Databricks @SiliconANGLE theCUBE #theCUBE @theCUBE
#SparkSummit - #SparkSummit East 2016 - #theCUBE