Bryan Duxbury, Vice President of Engineering at StreamSets, sits down with Dave Vellante & George Gilbert at Spark Summit East 2017 at the Hynes Convention Center in Boston, Massachusetts.
#SparkSummit #theCUBE
https://siliconangle.com/2017/02/09/data-move-getting-value-numbers-sparksummit/
Data on the move: getting the most value out of the numbers | #SparkSummit
While most organizations understand the inherent value of big data — the more data, the better — there can be issues around managing and moving that data. The true value comes from the analysis of the data, not from static data itself. Many are leaning on Apache Spark (an open-source cluster computing framework) to reduce data management complexity, according to Bryan Duxbury (pictured), vice president of engineering at StreamSets Inc.
“We’re seeing a lot of interest in the Spark arena. People want to add their complex event processing or their aggregation and analysis, like Spark SQL [Apache Spark’s module for working with structured data],” Duxbury said.
He explained that these customers are looking for continuous workloads and moving away from batch. Customers are seeking analytics occurring almost simultaneously at the time of ingest, he said. To help with that, StreamSets is building integration via their Spark processor, making it possible to do the ingest and capture real-time analytics along the way.
Duxbury recently joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during Spark Summit East 2017 Boston, held in Boston, MA. (*Disclosure below.)
The topic of discussion included how data movement software maximizes the value of data, including the use of Spark, and why Duxbury believes it’s better for organizations to buy than to build solutions.
Building a data pipeline without code
While many companies will build their own internal tools to move their data, and make it a science project of sorts, there’s better ways to allocate time and resources. “It’s not their job to build a world-class data movement tool; it’s their job to make the data valuable,” said Duxbury.
One of the advantages of StreamSets’ Data Collector software, according to Duxbury, is it allows users to build a data pipeline without code; it’s a graphical user interface (GUI). The software is heavy-duty and open source, made to integrate easily with other products, including Apache Kafka (an open-source stream processing platform) and Spark.
StreamSets’ Data Collector deploys every way imaginable, on-prem, in the cloud or on the edge of clusters. It focuses on the initial movement and ingestion of the data and then lets the analytical tools, such as Spark, take over and provide the business value to the data. For large scale deployments, the company offers StreamSets Dataflow Performance Manager as a way to manage the dozens or hundreds of Data Collectors including a live data map of the data flow topologies and enforcement of Data SLAs.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit East 2017 | Boston. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Spark Summit East 2017 | Boston
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Spark Summit East 2017 | Boston.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Spark Summit East 2017 | Boston. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Spark Summit East 2017 | Boston
Please sign in with LinkedIn to continue to Spark Summit East 2017 | Boston. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Bryan Duxbury, StreamSets | Spark Summit East 2017
Bryan Duxbury, Vice President of Engineering at StreamSets, sits down with Dave Vellante & George Gilbert at Spark Summit East 2017 at the Hynes Convention Center in Boston, Massachusetts.
#SparkSummit #theCUBE
https://siliconangle.com/2017/02/09/data-move-getting-value-numbers-sparksummit/
Data on the move: getting the most value out of the numbers | #SparkSummit
While most organizations understand the inherent value of big data — the more data, the better — there can be issues around managing and moving that data. The true value comes from the analysis of the data, not from static data itself. Many are leaning on Apache Spark (an open-source cluster computing framework) to reduce data management complexity, according to Bryan Duxbury (pictured), vice president of engineering at StreamSets Inc.
“We’re seeing a lot of interest in the Spark arena. People want to add their complex event processing or their aggregation and analysis, like Spark SQL [Apache Spark’s module for working with structured data],” Duxbury said.
He explained that these customers are looking for continuous workloads and moving away from batch. Customers are seeking analytics occurring almost simultaneously at the time of ingest, he said. To help with that, StreamSets is building integration via their Spark processor, making it possible to do the ingest and capture real-time analytics along the way.
Duxbury recently joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during Spark Summit East 2017 Boston, held in Boston, MA. (*Disclosure below.)
The topic of discussion included how data movement software maximizes the value of data, including the use of Spark, and why Duxbury believes it’s better for organizations to buy than to build solutions.
Building a data pipeline without code
While many companies will build their own internal tools to move their data, and make it a science project of sorts, there’s better ways to allocate time and resources. “It’s not their job to build a world-class data movement tool; it’s their job to make the data valuable,” said Duxbury.
One of the advantages of StreamSets’ Data Collector software, according to Duxbury, is it allows users to build a data pipeline without code; it’s a graphical user interface (GUI). The software is heavy-duty and open source, made to integrate easily with other products, including Apache Kafka (an open-source stream processing platform) and Spark.
StreamSets’ Data Collector deploys every way imaginable, on-prem, in the cloud or on the edge of clusters. It focuses on the initial movement and ingestion of the data and then lets the analytical tools, such as Spark, take over and provide the business value to the data. For large scale deployments, the company offers StreamSets Dataflow Performance Manager as a way to manage the dozens or hundreds of Data Collectors including a live data map of the data flow topologies and enforcement of Data SLAs.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of the Spark Summit East 2017 Boston. (*Disclosure: TheCUBE is a media partner at the conference. Neither Databricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)