01. Holen Karau, IBM, Visits #theCUBE!. (00:22)
02. What's New In Your World With Spark. (01:07)
03. What Is Junky API. (02:010)
04. How Does Spark Address Complexity. (03:51)
05. Is The Open Source Community Finding A Way To Colaberate. (05:47)
06. Is The Open Source Community Working To Find It's Way Into The Tools. (07:34)
07. How Much Of A Data Scientist Do You Have To Be. (08:37)
08. What Is Your Perspective On Machine Learning. (10:01)
09. Is Sampling Dead. (12:17)
10. Are We Going To Start Turning Big Data On The Problem Of Big Data. (13:05)
Track List created with http://www.vinjavideo.com.
--- ---
Collaboration and machine learning with Spark | #BigDataNYC
by Brittany Greaner | Sep 29, 2016
Open-source technology is paving the way to the future of affordable and flexible IT, and the Apache Spark open-source processing engine is no exception. There is even a Spark components page where users can share useful tools and technology.
“Even vendors are sharing,” said Holden Karau, principal software engineer at IBM. This enables much wider collaboration throughout the community, as well as narrower collaboration between friends and colleagues. “If you have a notebook and share it with your friend, you can work together more collaboratively. A lot of companies are building notebook solutions,” added Karau.
Karau was interviewed by Dave Vellante (@dvellante) and Peter Burris (@plburris), hosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 in New York, NY.
Spark’s range of complexity
Another standout feature of Spark is its range of complexity. It allows users who may not have much knowledge of Python or Java to be able to build what they need without that coding ability.
At the same time, if users of Spark do understand coding, they can also use that knowledge to their benefit to create exactly what they want. “I think Spark does a good job of being user friendly. With Spark it’s much simpler and exposed in ways people are already used to working with their data,” said Karau.
Machine learning
Another area Spark and other platforms are beginning to dive into is machine learning. The traditional method is to down-sample, but this isn’t the most efficient or thorough method. Machine learning allows a much wider, agile way of doing things.
“When you move people to a laptop, you can train an algorithm to
recommend datasets to people,” said Karau. “The combination of notebooks and Spark means data scientists can directly apply data during the exploration phase.” It speeds up the process by eliminating the need to consult coworkers for their data sets or do manual searches. And that is very powerful, with strong implications for the future, Karau concluded.
#BigDataNYC
#theCUBE
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData NYC 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For BigData NYC 2016 | New York
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for BigData NYC 2016 | New York.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData NYC 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to BigData NYC 2016 | New York
Please sign in with LinkedIn to continue to BigData NYC 2016 | New York. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Holden Karau, IBM | BigDataNYC 2016
01. Holen Karau, IBM, Visits #theCUBE!. (00:22)
02. What's New In Your World With Spark. (01:07)
03. What Is Junky API. (02:010)
04. How Does Spark Address Complexity. (03:51)
05. Is The Open Source Community Finding A Way To Colaberate. (05:47)
06. Is The Open Source Community Working To Find It's Way Into The Tools. (07:34)
07. How Much Of A Data Scientist Do You Have To Be. (08:37)
08. What Is Your Perspective On Machine Learning. (10:01)
09. Is Sampling Dead. (12:17)
10. Are We Going To Start Turning Big Data On The Problem Of Big Data. (13:05)
Track List created with http://www.vinjavideo.com.
--- ---
Collaboration and machine learning with Spark | #BigDataNYC
by Brittany Greaner | Sep 29, 2016
Open-source technology is paving the way to the future of affordable and flexible IT, and the Apache Spark open-source processing engine is no exception. There is even a Spark components page where users can share useful tools and technology.
“Even vendors are sharing,” said Holden Karau, principal software engineer at IBM. This enables much wider collaboration throughout the community, as well as narrower collaboration between friends and colleagues. “If you have a notebook and share it with your friend, you can work together more collaboratively. A lot of companies are building notebook solutions,” added Karau.
Karau was interviewed by Dave Vellante (@dvellante) and Peter Burris (@plburris), hosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 in New York, NY.
Spark’s range of complexity
Another standout feature of Spark is its range of complexity. It allows users who may not have much knowledge of Python or Java to be able to build what they need without that coding ability.
At the same time, if users of Spark do understand coding, they can also use that knowledge to their benefit to create exactly what they want. “I think Spark does a good job of being user friendly. With Spark it’s much simpler and exposed in ways people are already used to working with their data,” said Karau.
Machine learning
Another area Spark and other platforms are beginning to dive into is machine learning. The traditional method is to down-sample, but this isn’t the most efficient or thorough method. Machine learning allows a much wider, agile way of doing things.
“When you move people to a laptop, you can train an algorithm to
recommend datasets to people,” said Karau. “The combination of notebooks and Spark means data scientists can directly apply data during the exploration phase.” It speeds up the process by eliminating the need to consult coworkers for their data sets or do manual searches. And that is very powerful, with strong implications for the future, Karau concluded.
#BigDataNYC
#theCUBE