Holden Karau, Principal Software Engineer, IBM, talks with theCUBE's Jeff Frick & George Gilbert at Big Data SV 2017 at the historic Pagoda Lounge in San Jose, Ca.
Spark ML: getting closer to the edge to improve latency
https://siliconangle.com/2017/03/15/spark-ml-getting-closer-edge-improve-latency-bigdatasv/
Going mainstream in the data-driven enterprise is Apache Spark, the open-source analytics engine. As prominent industries move to the Internet of Things markets and machine learning technologies to capitalize on data, Spark ML (which provides a uniform set of high-level application program interfaces that help users create and tune practical machine learning pipelines) offers companies the ability to build real-time streaming solutions that provide fast, advanced analytics to gain insights that drive business.
“We are going to be focused on how to use structured streaming for machine learning. I think that is really interesting, because stream learning is something that people want to do but aren’t yet doing in production. So it’s always fun to talk to people before they’ve built their systems,” said Holden Karau (pictured), principal software engineer at IBM Corp.
Karau, who is a “Spark Committer” and noted authority on the platform, met with Jeff Frick (@JeffFrick) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during the BigData SV event in San Jose CA. (*Disclosure below.)
Machine learning: What is happening at the edge?
IoT and machine learning are consuming the technology industry. Apache Spark-structured streaming is making an impact in this technology. Karau noted, however, that certain aspects of Spark are not meant to be pushed out to the edge.
“Structured streaming for today, latency wise, is probably not something I would use [for IoT and real-time streaming]. It’s in the sub-second range, which is nice, but it’s not what you want for live surveying of decisions — like for your car. It’s just not going to be feasible,” Karau said.
She maintained that there is the potential to become faster and spoke about a renewed interest in Apache MLlib local, a scalable machine learning library that has the capacity to take models trained in Spark and push them out to the edge and apply the models to edge devices.
“I think for these IoT devices, it makes a lot more sense to do the predictions on the device itself,” Karau said.
Explaining that the models are only megabytes in size and do not need a cluster to do predictions on the models, using the cluster to train the models and pushing the prediction out to the edge node is a reasonable use case for Karau. Instead of using Spark to push the model, she recommends trying other tools.
“Spark is not very well suited to large amounts of internet traffic, but it is well-suited to the training. With MLlib local, it will be able to provide both sides, and the copy part is left to whoever is doing the work,” Karau advised.
The reason for moving the models to the edge is to improve latency. The question that many people are asking is: Will there be a different programming model at the edge?
“I don’t think the answer is finished yet, but I think the work is being done to make it look the same. … Spark has done a really good job of making things look very similar on single node cases to multi-node cases, and I think we can bring the same things to machine learning,” she said.
At IBM, open-source work on Spark is underway to simplify and improve programming languages that interoperate with the platform. Karau pointed out that Java is easy to use with Spark, but the aim of the project is to provide more comfortable experiences to increase adoption.
Predicting that the tools of the future will resemble the tools we have today, but with more options, Karau estimated that the experience will become more simplified.
“The main thing that we are lacking right now is good documentation — and of good books and good resources for people to figure out how to use these tools,” she said.
(*Disclosure: Some segments on SiliconANGLE Media’s theCUBE are sponsored. Sponsors have no editorial control over content on theCUBE or SiliconANGLE.)
#BigDataSV @IBM #IBM @SiliconANGLE theCUBE @theCUBE #theCUBE #BigData
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData SV 2017 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For BigData SV 2017 | San Jose
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for BigData SV 2017 | San Jose.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData SV 2017 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to BigData SV 2017 | San Jose
Please sign in with LinkedIn to continue to BigData SV 2017 | San Jose. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Holden Karau, IBM | Big Data Silicon Valley 2017
Holden Karau, Principal Software Engineer, IBM, talks with theCUBE's Jeff Frick & George Gilbert at Big Data SV 2017 at the historic Pagoda Lounge in San Jose, Ca.
Spark ML: getting closer to the edge to improve latency
https://siliconangle.com/2017/03/15/spark-ml-getting-closer-edge-improve-latency-bigdatasv/
Going mainstream in the data-driven enterprise is Apache Spark, the open-source analytics engine. As prominent industries move to the Internet of Things markets and machine learning technologies to capitalize on data, Spark ML (which provides a uniform set of high-level application program interfaces that help users create and tune practical machine learning pipelines) offers companies the ability to build real-time streaming solutions that provide fast, advanced analytics to gain insights that drive business.
“We are going to be focused on how to use structured streaming for machine learning. I think that is really interesting, because stream learning is something that people want to do but aren’t yet doing in production. So it’s always fun to talk to people before they’ve built their systems,” said Holden Karau (pictured), principal software engineer at IBM Corp.
Karau, who is a “Spark Committer” and noted authority on the platform, met with Jeff Frick (@JeffFrick) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during the BigData SV event in San Jose CA. (*Disclosure below.)
Machine learning: What is happening at the edge?
IoT and machine learning are consuming the technology industry. Apache Spark-structured streaming is making an impact in this technology. Karau noted, however, that certain aspects of Spark are not meant to be pushed out to the edge.
“Structured streaming for today, latency wise, is probably not something I would use [for IoT and real-time streaming]. It’s in the sub-second range, which is nice, but it’s not what you want for live surveying of decisions — like for your car. It’s just not going to be feasible,” Karau said.
She maintained that there is the potential to become faster and spoke about a renewed interest in Apache MLlib local, a scalable machine learning library that has the capacity to take models trained in Spark and push them out to the edge and apply the models to edge devices.
“I think for these IoT devices, it makes a lot more sense to do the predictions on the device itself,” Karau said.
Explaining that the models are only megabytes in size and do not need a cluster to do predictions on the models, using the cluster to train the models and pushing the prediction out to the edge node is a reasonable use case for Karau. Instead of using Spark to push the model, she recommends trying other tools.
“Spark is not very well suited to large amounts of internet traffic, but it is well-suited to the training. With MLlib local, it will be able to provide both sides, and the copy part is left to whoever is doing the work,” Karau advised.
The reason for moving the models to the edge is to improve latency. The question that many people are asking is: Will there be a different programming model at the edge?
“I don’t think the answer is finished yet, but I think the work is being done to make it look the same. … Spark has done a really good job of making things look very similar on single node cases to multi-node cases, and I think we can bring the same things to machine learning,” she said.
At IBM, open-source work on Spark is underway to simplify and improve programming languages that interoperate with the platform. Karau pointed out that Java is easy to use with Spark, but the aim of the project is to provide more comfortable experiences to increase adoption.
Predicting that the tools of the future will resemble the tools we have today, but with more options, Karau estimated that the experience will become more simplified.
“The main thing that we are lacking right now is good documentation — and of good books and good resources for people to figure out how to use these tools,” she said.
(*Disclosure: Some segments on SiliconANGLE Media’s theCUBE are sponsored. Sponsors have no editorial control over content on theCUBE or SiliconANGLE.)
#BigDataSV @IBM #IBM @SiliconANGLE theCUBE @theCUBE #theCUBE #BigData