Name: Fernando Perez. Lawrence Berkeley Laboratory | IBM Spark Summit 2015
Uploaded: 2015-06-16T02:51:00.000Z
Duration: 22 min 42 s

Fernando Perez. Lawrence Berkeley Laboratory | IBM Spark Summit 2015

01. Fernando Perez. Lawrence Berkeley Laboratory, Visits theCUBE . (00:21) 02. Perez's Background and Role at Berkeley. (00:59) 03. How the Tools for Data Scientists Have Evolved. (03:46) 04. The Spirit of Collaboration in Open Source. (10:35) 05. Using the Proper Tools for Extracting Data. (14:57) #theCUBE #SparkSummit #Spark #IBM #SiliconANGLE --- --- One scientist’s perspective on Spark | #SparkInsight by Amber Johnson | Jun 19, 2015 Fernando Perez is a scientist at Lawrence Berkeley National Laboratory and a founding investigator of Berkeley Institute for Data Science. In addition, Perez is a particle physicist who worked with the Python Project that led to the Jupyter Project, which is part of the Spark ecosystem. “The Jupyter environment is precisely about building an environment where you can build code and narrative together,” Perez told Jeff Frick and George Gilbert of theCUBE at IBM Spark Summit 2015. The Spark system uses the Jupyter technology to run code, data and narrative live. “Now in the last few years, the folks at the amp lab have built PySpark, which is the Python layer on top, [which] allows you to call Spark with a Python API … and then once you have run all your large-scale analytics in Spark, then you can import all of these Python libraries that these physical scientists have been writing for the last 10, 15 years … and use those … with the interactive facilities we have been building,” Perez said. Contributions led to the current Spark program innovations When asked how Perez started working with Python, he replied, “I realized that was I probably spending more time switching between coding languages rather than doing any work.” Then, while Perez was in graduate school, he learned about Python. “We were all able to interact very quickly” with data, and according to Perezm in the early 2000sm multiple laboratories and institutions began contributing in Python. This trend of contributions led to the current innovation of the Spark program. Around 2002, Perez “realized there was a value to seeing these things as open source projects, and many of us realized that we should actually work and try to get these things funded.” Today, many government organization are funding such academic projects. Perez commented that DARPA partially funded Spark. “I think what Spark has brought to the game is an additional layer of enterprise-level analytics,” he said. “It’s not so much for everyday numerical computing workloads that many people in the physical sciences were using … Spark made a real killing in that space.” Watch the full interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of IBM Spark 2015. @theCUBE #SparkInsight

Share this session