01. Chuck Yarbrough, Pentaho, Visits #theCUBE!. (00:19)
02. Talk About The Changes Since The Merge With Hitachi. (00:40)
03. What Is The Next Chapter For Pentaho. (02:01)
04. What Does The Solution Look Like For Pipeline Analytics. (03:33)
05. Explain What Fill In The Data Lake Means. (05:17)
06. Do Some Of The New Data Types That Don't Have A Process. (09:21)
07. What About Getting Data Out. (10:47)
08. Is Data Cleanliness A Big Part Of This. (11:30)
09. Is The Lake Size An Issue. (16:22)
10. Explain What You're Working On At Pentaho. (17:24)
Track List created with http://www.vinjavideo.com.
--- ---
Don’t let your data lake turn into a data swamp | #HS16SJ
by Nelson Williams | Jun 28, 2016
Data does not move easily. This truth has plagued the world of Big Data for some time and will continue to do so. In the end, the laws of physics dictate a speed limit, no matter what else is done. However, somewhere between data at rest and the speed of light, there are many processes that must be performed to make data mobile and useful. Integrating data and managing a data pipeline are two of these necessary tasks.
To shed some light on the world of data preparation, John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, visited the Hadoop Summit US 2016 event in San Jose, California. There, they sat down with Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho (A Hitachi Group Company).
Managing the data pipeline
The discussion started with a look at Pentaho and what they do. Yarbrough took the hosts through a tour of the company’s history, saying that early on the founders looked at what data analytics was all about and what it would become. Its idea was to do data integration and do it right to prepare data for the analytic process. It had a vision to manage the entire data pipeline for an analytic purpose.
Yarbrough then explained the solution, stating that Pentaho enables high-scale, complex use cases that require the entire pipeline. That data can be highly varied, coming in from all over the place. Blending and processing that varied data on the fly is the key, and that’s where Pentaho delivers value.
Keeping the data lake clean
Throwing a bunch of data into one place creates a data lake, but if that information isn’t managed, the lake becomes a swamp. Yarbrough asked how does a company manage that data at scale? One load is simple, but 6,000 loads is something else. He described how Pentaho manages that data by leveraging the concept of metadata injection and making processes dynamic.
“Manage what you’re doing,” he said.
Yarbrough then stressed that it always comes down to use cases, what the company is trying to do with its data. Customers want to take data from their lakes and format it into something different. The blueprint Pentaho produced does just that, simplifying the process and allowing large, at-scale data movement.
#HS16SJ
#theCUBE
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2016 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Hadoop Summit 2016 | San Jose
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Hadoop Summit 2016 | San Jose.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Hadoop Summit 2016 | San Jose. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Hadoop Summit 2016 | San Jose
Please sign in with LinkedIn to continue to Hadoop Summit 2016 | San Jose. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Chuck Yarbrough, Pentaho | Hadoop Summit 2016 San Jose
01. Chuck Yarbrough, Pentaho, Visits #theCUBE!. (00:19)
02. Talk About The Changes Since The Merge With Hitachi. (00:40)
03. What Is The Next Chapter For Pentaho. (02:01)
04. What Does The Solution Look Like For Pipeline Analytics. (03:33)
05. Explain What Fill In The Data Lake Means. (05:17)
06. Do Some Of The New Data Types That Don't Have A Process. (09:21)
07. What About Getting Data Out. (10:47)
08. Is Data Cleanliness A Big Part Of This. (11:30)
09. Is The Lake Size An Issue. (16:22)
10. Explain What You're Working On At Pentaho. (17:24)
Track List created with http://www.vinjavideo.com.
--- ---
Don’t let your data lake turn into a data swamp | #HS16SJ
by Nelson Williams | Jun 28, 2016
Data does not move easily. This truth has plagued the world of Big Data for some time and will continue to do so. In the end, the laws of physics dictate a speed limit, no matter what else is done. However, somewhere between data at rest and the speed of light, there are many processes that must be performed to make data mobile and useful. Integrating data and managing a data pipeline are two of these necessary tasks.
To shed some light on the world of data preparation, John Furrier (@furrier) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, visited the Hadoop Summit US 2016 event in San Jose, California. There, they sat down with Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho (A Hitachi Group Company).
Managing the data pipeline
The discussion started with a look at Pentaho and what they do. Yarbrough took the hosts through a tour of the company’s history, saying that early on the founders looked at what data analytics was all about and what it would become. Its idea was to do data integration and do it right to prepare data for the analytic process. It had a vision to manage the entire data pipeline for an analytic purpose.
Yarbrough then explained the solution, stating that Pentaho enables high-scale, complex use cases that require the entire pipeline. That data can be highly varied, coming in from all over the place. Blending and processing that varied data on the fly is the key, and that’s where Pentaho delivers value.
Keeping the data lake clean
Throwing a bunch of data into one place creates a data lake, but if that information isn’t managed, the lake becomes a swamp. Yarbrough asked how does a company manage that data at scale? One load is simple, but 6,000 loads is something else. He described how Pentaho manages that data by leveraging the concept of metadata injection and making processes dynamic.
“Manage what you’re doing,” he said.
Yarbrough then stressed that it always comes down to use cases, what the company is trying to do with its data. Customers want to take data from their lakes and format it into something different. The blueprint Pentaho produced does just that, simplifying the process and allowing large, at-scale data movement.
#HS16SJ
#theCUBE