Name: Bryan Duxbury, StreamSets | Spark Summit East 2017
Uploaded: 2017-02-09T19:28:00.000Z
Duration: 19 min 1 s

Bryan Duxbury, StreamSets | Spark Summit East 2017

Bryan Duxbury, Vice President of Engineering at StreamSets, sits down with Dave Vellante & George Gilbert at Spark Summit East 2017 at the Hynes Convention Center in Boston, Massachusetts. #SparkSummit #theCUBE https://siliconangle.com/2017/02/09/data-move-getting-value-numbers-sparksummit/ Data on the move: getting the most value out of the numbers | #SparkSummit While most organizations understand the inherent value of big data — the more data, the better — there can be issues around managing and moving that data. The true value comes from the analysis of the data, not from static data itself. Many are leaning on Apache Spark (an open-source cluster computing framework) to reduce data management complexity, according to Bryan Duxbury (pictured), vice president of engineering at StreamSets Inc. “We’re seeing a lot of interest in the Spark arena. People want to add their complex event processing or their aggregation and analysis, like Spark SQL [Apache Spark’s module for working with structured data],” Duxbury said. He explained that these customers are looking for continuous workloads and moving away from batch. Customers are seeking analytics occurring almost simultaneously at the time of ingest, he said. To help with that, StreamSets is building integration via their Spark processor, making it possible to do the ingest and capture real-time analytics along the way. Duxbury recently joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during Spark Summit East 2017 Boston, held in Boston, MA. (*Disclosure below.) The topic of discussion included how data movement software maximizes the value of data, including the use of Spark, and why Duxbury believes it’s better for organizations to buy than to build solutions. Building a data pipeline without code While many companies will build their own internal tools to move their data, and make it a science project of sorts, there’s better ways to allocate time and resources. “It’s not their job to build a world-class data movement tool; it’s their job to make the data valuable,” said Duxbury. One of the advantages of StreamSets’ Data Collector software, according to Duxbury, is it allows users to build a data pipeline without code; it’s a graphical user interface (GUI). The software is heavy-duty and open source, made to integrate easily with other products, including Apache Kafka (an open-source stream processing platform) and Spark. StreamSets’ Data Collector deploys every way imaginable, on-prem, in the cloud or on the edge of clusters. It focuses on the initial movement and ingestion of the data and then lets the analytical tools, such as Spark, take over and provide the business value to the data. For large scale deployments, the company offers StreamSets Dataflow Performance Manager as a way to manage the dozens or hundreds of Data Collectors including a live data map of the data flow topologies and enforcement of Data SLAs. Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Share this session