01. Chuck Yarbrough, Pentaho, visits #the CUBE!. (00:19)
02. The Pentaho Big Data Enhacement Release. (00:51)
03. Prioritizing New Solutions. (02:34)
04. The Pentaho Choice for Customers. (04:08)
05. Securing the Data Pipeline. (05:54)
06. The Challenge of Making Sense of Data. (07:11)
07. Is it a Data Lake or a Data Swamp?. (10:38)
08. Curating the Data Lake and Conforming the Data. (13:12)
09. How Pentaho Thinks About Solutions. (14:40)
Track List created with http://www.vinjavideo.com.
--- ---
How not to drown in your data lake | #BigDataNYC
by Bev Terrell | Sep 27, 2016
One of Hadoop’s most well-known concepts is the data lake, a storage repository that holds a large amount of raw data until it is needed. The challenge for companies becomes how to best access and analyze that data? If you’re not careful, your data lake can become a data swamp, where your data is underutilized or mismanaged.
Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho, A Hitachi Group Company, joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016, held at the Mercantile Annex in New York, NY. Yarbrough talked about Pentaho’s new tools for helping Hadoop users safely access and navigate the waters of their data lakes.
New release for Pentaho
Vellante kicked off the discussion by asking Yarbrough about what’s new at Pentaho.
“We just announced a big data enhancement release … [it] includes a whole bunch of things added into the platform to make Hadoop easier … additional SQL/Spark enhancement [to enable] that data pipline,” said Yarbrough. The unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformation at runtime.
As Hadoop can be a challenge around security, Pentaho is expanding its Hadoop data security integration to promote better Big Data governance, protecting clusters from intruders. These include enhanced Kerberos (network authentication protocol) integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets.
Hydrating and managing the data lake
Gilbert brought up the point where, for many clients, they look at their pipelines and see what data is being consumed and being brought into (or “hydrating”) the data lake. They then want to analyze and operationalized that data; how can they do that?
“Getting data into the data lake; that’s easy. … What people are asking for is, “Help me reduce the insanity, get a handle on what we do.’ [For example,] in the financial services space, [a client] had a problem where they needed to onboard data into the lake, quickly and efficiently. … You would have to create a data transformation for every file. … With metadata, it will apply to the transformation, and land the data in, in exactly the form you need in Hadoop,” explained Yarbrough.
Further regarding managing the data lake, he stated: “My team looks at how customers use our products, and how our products fit into the entire ecosystem. … Then we look for what’s repeatable, and what we can deliver as a solution, quicker, faster, cheaper than client building it themselves,” said Yarbrough.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData NYC 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For BigData NYC 2016 | New York
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for BigData NYC 2016 | New York.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
BigData NYC 2016 | New York. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to BigData NYC 2016 | New York
Please sign in with LinkedIn to continue to BigData NYC 2016 | New York. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Chuck Yarbrough, Pentaho A Hitachi Group Company - Big Data NYC 2016
01. Chuck Yarbrough, Pentaho, visits #the CUBE!. (00:19)
02. The Pentaho Big Data Enhacement Release. (00:51)
03. Prioritizing New Solutions. (02:34)
04. The Pentaho Choice for Customers. (04:08)
05. Securing the Data Pipeline. (05:54)
06. The Challenge of Making Sense of Data. (07:11)
07. Is it a Data Lake or a Data Swamp?. (10:38)
08. Curating the Data Lake and Conforming the Data. (13:12)
09. How Pentaho Thinks About Solutions. (14:40)
Track List created with http://www.vinjavideo.com.
--- ---
How not to drown in your data lake | #BigDataNYC
by Bev Terrell | Sep 27, 2016
One of Hadoop’s most well-known concepts is the data lake, a storage repository that holds a large amount of raw data until it is needed. The challenge for companies becomes how to best access and analyze that data? If you’re not careful, your data lake can become a data swamp, where your data is underutilized or mismanaged.
Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho, A Hitachi Group Company, joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016, held at the Mercantile Annex in New York, NY. Yarbrough talked about Pentaho’s new tools for helping Hadoop users safely access and navigate the waters of their data lakes.
New release for Pentaho
Vellante kicked off the discussion by asking Yarbrough about what’s new at Pentaho.
“We just announced a big data enhancement release … [it] includes a whole bunch of things added into the platform to make Hadoop easier … additional SQL/Spark enhancement [to enable] that data pipline,” said Yarbrough. The unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformation at runtime.
As Hadoop can be a challenge around security, Pentaho is expanding its Hadoop data security integration to promote better Big Data governance, protecting clusters from intruders. These include enhanced Kerberos (network authentication protocol) integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets.
Hydrating and managing the data lake
Gilbert brought up the point where, for many clients, they look at their pipelines and see what data is being consumed and being brought into (or “hydrating”) the data lake. They then want to analyze and operationalized that data; how can they do that?
“Getting data into the data lake; that’s easy. … What people are asking for is, “Help me reduce the insanity, get a handle on what we do.’ [For example,] in the financial services space, [a client] had a problem where they needed to onboard data into the lake, quickly and efficiently. … You would have to create a data transformation for every file. … With metadata, it will apply to the transformation, and land the data in, in exactly the form you need in Hadoop,” explained Yarbrough.
Further regarding managing the data lake, he stated: “My team looks at how customers use our products, and how our products fit into the entire ecosystem. … Then we look for what’s repeatable, and what we can deliver as a solution, quicker, faster, cheaper than client building it themselves,” said Yarbrough.