Holden Karau, IBM - #BigDataSV 2016 - #theCUBE
01. Holden Karau, IBM, Visits #theCUBE!. (00:21) 02. Give Us An Update On Spark. (00:43) 03. Do The Hardcore Spark Developers Have To Main Stream It. (01:48) 04. There's A Lot Of Integration What Are Your Thoughts On That. (03:22) 05. Is Spark A Comparable Investment To Lynx. (04:32) 06. Give Me An Example Of The Magnitude Of Spark. (06:11) 07. Can You Give Us Examples Of Products That Are Moving To Spark. (07:24) 08. Who Is Policing The Agorithms. (08:26) 09. Where Are We In Machine Learning Put On The Process Of The Design And Run Time. (11:03) 10. Do We See Big Packet Apps Emerging For This Class Of Apps. (15:32) 11. What Is Your Take On The Status Of Machine Learning. (17:32) 12. Do You Have Another Book On The Horizon. (19:24) Track List created with http://www.vinjavideo.com. --- --- Machine learning on machine learning software: It’s closer than you think | #BigDataSV by Amber Johnson | Mar 31, 2016 As the tech world pivots on game-changing applications, data scientists rise to the occasion. Such is the case with Holden Karau, principal software engineer of Big Data at IBM and coauthor of Learning Spark. When asked about the current renovations within Spark, Karau said she sees this time as an “opportunity to get rid of dead weight” by streamlining certain processes. For example, she cited getting functional and relative queries to talk to each other within Spark. Two area of expansion include sequencing and machine learning. Karau noted another “massive expansion” in getting other applications to run on top of Spark during an interview with John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE from the SiliconANGLE Media team, during the BigDataSV 2016 event in San Jose, California, where theCUBE is celebrating #BigDataWeek, including news and events from the #StrataHadoop conference. The three self-described tech geeks discussed the advances with Spark since the bandwagon effect has kicked in. Karau predicted that machine learning on machine learning software will arrive sooner than Gilbert’s conservative five-year estimate. While she didn’t give a specific time frame, Karau stated emphatically that it is “closer than five years.” How data science is changing software dynamics Karau conferred with Furrier and Gilbert about several aspects of data science and how it is changing software dynamics. One side project in particular stood out. Karau is working on a Spark validator that will help with “policing quality” in regards to algorithms within pipeline models. Pipeline models present challenges regarding working large scale and still being able to work with the Big Data interactively. When asked about getting data science to work on data science, Karau said the tech was “there-ish.” In addition, Karau is working with her coauthor, Rachel Warren, on a new book called High Performance Spark. Karau spoke eloquently and candidly about sources of frustration in working with Spark pipeline issues, saying, “How do I save this damn thing?” However, when it comes to Spark, Karau literally wrote the book. @theCUBE #BigDataSV #StrataHadoop