Name: Holden Karau, IBM | BigDataNYC 2016
Uploaded: 2016-09-30T21:10:00.000Z
Duration: 14 min 42 s

Holden Karau, IBM | BigDataNYC 2016

01. Holen Karau, IBM, Visits #theCUBE!. (00:22) 02. What's New In Your World With Spark. (01:07) 03. What Is Junky API. (02:010) 04. How Does Spark Address Complexity. (03:51) 05. Is The Open Source Community Finding A Way To Colaberate. (05:47) 06. Is The Open Source Community Working To Find It's Way Into The Tools. (07:34) 07. How Much Of A Data Scientist Do You Have To Be. (08:37) 08. What Is Your Perspective On Machine Learning. (10:01) 09. Is Sampling Dead. (12:17) 10. Are We Going To Start Turning Big Data On The Problem Of Big Data. (13:05) Track List created with http://www.vinjavideo.com. --- --- Collaboration and machine learning with Spark | #BigDataNYC by Brittany Greaner | Sep 29, 2016 Open-source technology is paving the way to the future of affordable and flexible IT, and the Apache Spark open-source processing engine is no exception. There is even a Spark components page where users can share useful tools and technology. “Even vendors are sharing,” said Holden Karau, principal software engineer at IBM. This enables much wider collaboration throughout the community, as well as narrower collaboration between friends and colleagues. “If you have a notebook and share it with your friend, you can work together more collaboratively. A lot of companies are building notebook solutions,” added Karau. Karau was interviewed by Dave Vellante (@dvellante) and Peter Burris (@plburris), hosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016 in New York, NY. Spark’s range of complexity Another standout feature of Spark is its range of complexity. It allows users who may not have much knowledge of Python or Java to be able to build what they need without that coding ability. At the same time, if users of Spark do understand coding, they can also use that knowledge to their benefit to create exactly what they want. “I think Spark does a good job of being user friendly. With Spark it’s much simpler and exposed in ways people are already used to working with their data,” said Karau. Machine learning Another area Spark and other platforms are beginning to dive into is machine learning. The traditional method is to down-sample, but this isn’t the most efficient or thorough method. Machine learning allows a much wider, agile way of doing things. “When you move people to a laptop, you can train an algorithm to recommend datasets to people,” said Karau. “The combination of notebooks and Spark means data scientists can directly apply data during the exploration phase.” It speeds up the process by eliminating the need to consult coworkers for their data sets or do manual searches. And that is very powerful, with strong implications for the future, Karau concluded. #BigDataNYC #theCUBE