Robert Maybin, Principal Architect at Dremio sits down with Lisa Martin for the AWS Startup Showcase: Innovations with CloudData & CloudOps
#CubeOnCloudStartups #CloudInnovation #theCUBE
https://siliconangle.com/2021/03/24/dremio-offers-alternative-to-data-copies-and-data-warehouses-through-robust-lake-architecture-cubeoncloudawsstartups/
Dremio offers alternative to data copies and data warehouses through robust lake architecture
BY MARK ALBERTSON
Copies of large datasets tend to proliferate in many organizations because they reside in a data lake that lacks the ability to perform the work necessary to turn that information into business insight. Dremio Corp. has built a unicorn-level business by breaking through this barrier, allowing enterprises to run live, interactive queries against petabyte-scale data in lake storage.
Dremio’s value proposition is that eliminating data copies is a good thing. Bringing compute to the data eliminates unnecessary work by operationalizing data lake storage and accelerating analytics processing.
“What’s wrong with copies?” asked Robert Maybin (pictured), principal architect at Dremio. “Maybe they land in cloud storage, but before they can be queried, somebody has to go in and reformat those datasets, transform them in ways that make them more useful and more performant. Copies are a natural thing to do, but they come at a real cost.”
Maybin spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during the AWS Startup Showcase Event: Innovators in Cloud Data. They discussed Dremio’s vision for next-generation data lake architecture, recent key technology changes that significantly boosted the software’s capabilities, and why hybrid-focused enterprises are embracing the firm’s data lake solution. (* Disclosure below.)
Shift in thinking
Dremio’s growing acceptance presents an interesting scenario for the data warehouse industry. In its release last fall, the company now allows users to query a cloud object store through a business intelligence tool such as Tableau or Looker, with the same performance characteristics as if the information resided in a data warehouse.
“The real approach, and this is available today with the rise of cloud technologies, is we can shift our thinking,” Maybin said. “How can we take some of these features and capabilities that one would expect in a data warehouse environment and bring that directly to the data? It requires new technology to do this. That’s what we call the next generation data lake architecture.”
That next generation is based on an ability to separate and scale compute from storage. By running production business intelligence directly on cloud data lake storage, the need to move it to a data warehouse is diminished.
“We didn’t have the flexibility to scale compute and storage independently or the kind of networking we have today,” Maybin explained. “What we’ve got with some of the new cloud technology is to basically do away with that requirement. Now we can have very large, provisioned pools of data that can grow and grow without the limitations of nodes of hardware.”
One of the key new architectural elements announced by Dremio last fall was to cache data in the Apache Arrow format, a language-agnostic software framework for developing data analytics applications.
Dremio had been using its Reflections tool, an internally managed persistence of data, to accelerate queries. But the tool had to be created in Apache Parquet files first, which slowed down the process. Now Dremio is using the Arrow format directly, significantly accelerating query response times by as much as 10x, according to company officials.
“We can accelerate certain query patterns by creating Reflections,” Maybin said. “That’s the edge piece that gives us BI acceleration without having to use additional tools. The ability to create Reflections is certainly a differentiator.”
Querying multiple data sources
Another key adjustment made by Dremio was to offer scale-out query planning. This gave the platform concurrency, an ability to match the number of query coordinators with executors, thus allowing the software to support thousands of users.
“We’re in the business of building technology that allows users to query large data sets in a scale-out performant way directly on the data where it lives,” Maybin said. “We can also query not just one source of data, but multiple sources of data and join those together in the context of the same query.”
...
“All of these organizations either have a toe in the water or they’re halfway down the path of exploring how to take all of this on-premises data and processing and get into AWS,” Maybin said. “We provide a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate to the cloud. We’re ideally positioned for that story.”
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
AWS Startup Showcase: Innovations with Cloud Data. If you don’t think you received an email check your
spam folder.
Sign in to AWS Startup Showcase: Innovations with Cloud Data.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For AWS Startup Showcase: Innovations with Cloud Data
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for AWS Startup Showcase: Innovations with Cloud Data.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
AWS Startup Showcase: Innovations with Cloud Data. If you don’t think you received an email check your
spam folder.
Sign in to AWS Startup Showcase: Innovations with Cloud Data.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to AWS Startup Showcase: Innovations with Cloud Data
Please sign in with LinkedIn to continue to AWS Startup Showcase: Innovations with Cloud Data. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps
Robert Maybin, Principal Architect at Dremio sits down with Lisa Martin for the AWS Startup Showcase: Innovations with CloudData & CloudOps
#CubeOnCloudStartups #CloudInnovation #theCUBE
https://siliconangle.com/2021/03/24/dremio-offers-alternative-to-data-copies-and-data-warehouses-through-robust-lake-architecture-cubeoncloudawsstartups/
Dremio offers alternative to data copies and data warehouses through robust lake architecture
BY MARK ALBERTSON
Copies of large datasets tend to proliferate in many organizations because they reside in a data lake that lacks the ability to perform the work necessary to turn that information into business insight. Dremio Corp. has built a unicorn-level business by breaking through this barrier, allowing enterprises to run live, interactive queries against petabyte-scale data in lake storage.
Dremio’s value proposition is that eliminating data copies is a good thing. Bringing compute to the data eliminates unnecessary work by operationalizing data lake storage and accelerating analytics processing.
“What’s wrong with copies?” asked Robert Maybin (pictured), principal architect at Dremio. “Maybe they land in cloud storage, but before they can be queried, somebody has to go in and reformat those datasets, transform them in ways that make them more useful and more performant. Copies are a natural thing to do, but they come at a real cost.”
Maybin spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during the AWS Startup Showcase Event: Innovators in Cloud Data. They discussed Dremio’s vision for next-generation data lake architecture, recent key technology changes that significantly boosted the software’s capabilities, and why hybrid-focused enterprises are embracing the firm’s data lake solution. (* Disclosure below.)
Shift in thinking
Dremio’s growing acceptance presents an interesting scenario for the data warehouse industry. In its release last fall, the company now allows users to query a cloud object store through a business intelligence tool such as Tableau or Looker, with the same performance characteristics as if the information resided in a data warehouse.
“The real approach, and this is available today with the rise of cloud technologies, is we can shift our thinking,” Maybin said. “How can we take some of these features and capabilities that one would expect in a data warehouse environment and bring that directly to the data? It requires new technology to do this. That’s what we call the next generation data lake architecture.”
That next generation is based on an ability to separate and scale compute from storage. By running production business intelligence directly on cloud data lake storage, the need to move it to a data warehouse is diminished.
“We didn’t have the flexibility to scale compute and storage independently or the kind of networking we have today,” Maybin explained. “What we’ve got with some of the new cloud technology is to basically do away with that requirement. Now we can have very large, provisioned pools of data that can grow and grow without the limitations of nodes of hardware.”
One of the key new architectural elements announced by Dremio last fall was to cache data in the Apache Arrow format, a language-agnostic software framework for developing data analytics applications.
Dremio had been using its Reflections tool, an internally managed persistence of data, to accelerate queries. But the tool had to be created in Apache Parquet files first, which slowed down the process. Now Dremio is using the Arrow format directly, significantly accelerating query response times by as much as 10x, according to company officials.
“We can accelerate certain query patterns by creating Reflections,” Maybin said. “That’s the edge piece that gives us BI acceleration without having to use additional tools. The ability to create Reflections is certainly a differentiator.”
Querying multiple data sources
Another key adjustment made by Dremio was to offer scale-out query planning. This gave the platform concurrency, an ability to match the number of query coordinators with executors, thus allowing the software to support thousands of users.
“We’re in the business of building technology that allows users to query large data sets in a scale-out performant way directly on the data where it lives,” Maybin said. “We can also query not just one source of data, but multiple sources of data and join those together in the context of the same query.”
...
“All of these organizations either have a toe in the water or they’re halfway down the path of exploring how to take all of this on-premises data and processing and get into AWS,” Maybin said. “We provide a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate to the cloud. We’re ideally positioned for that story.”