Name: Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps
Uploaded: 2021-03-18T15:50:00.000Z
Duration: 25 min 45 s

Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps

Robert Maybin, Principal Architect at Dremio sits down with Lisa Martin for the AWS Startup Showcase: Innovations with CloudData & CloudOps #CubeOnCloudStartups #CloudInnovation #theCUBE https://siliconangle.com/2021/03/24/dremio-offers-alternative-to-data-copies-and-data-warehouses-through-robust-lake-architecture-cubeoncloudawsstartups/ Dremio offers alternative to data copies and data warehouses through robust lake architecture BY MARK ALBERTSON Copies of large datasets tend to proliferate in many organizations because they reside in a data lake that lacks the ability to perform the work necessary to turn that information into business insight. Dremio Corp. has built a unicorn-level business by breaking through this barrier, allowing enterprises to run live, interactive queries against petabyte-scale data in lake storage. Dremio’s value proposition is that eliminating data copies is a good thing. Bringing compute to the data eliminates unnecessary work by operationalizing data lake storage and accelerating analytics processing. “What’s wrong with copies?” asked Robert Maybin (pictured), principal architect at Dremio. “Maybe they land in cloud storage, but before they can be queried, somebody has to go in and reformat those datasets, transform them in ways that make them more useful and more performant. Copies are a natural thing to do, but they come at a real cost.” Maybin spoke with Lisa Martin, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during the AWS Startup Showcase Event: Innovators in Cloud Data. They discussed Dremio’s vision for next-generation data lake architecture, recent key technology changes that significantly boosted the software’s capabilities, and why hybrid-focused enterprises are embracing the firm’s data lake solution. (* Disclosure below.) Shift in thinking Dremio’s growing acceptance presents an interesting scenario for the data warehouse industry. In its release last fall, the company now allows users to query a cloud object store through a business intelligence tool such as Tableau or Looker, with the same performance characteristics as if the information resided in a data warehouse. “The real approach, and this is available today with the rise of cloud technologies, is we can shift our thinking,” Maybin said. “How can we take some of these features and capabilities that one would expect in a data warehouse environment and bring that directly to the data? It requires new technology to do this. That’s what we call the next generation data lake architecture.” That next generation is based on an ability to separate and scale compute from storage. By running production business intelligence directly on cloud data lake storage, the need to move it to a data warehouse is diminished. “We didn’t have the flexibility to scale compute and storage independently or the kind of networking we have today,” Maybin explained. “What we’ve got with some of the new cloud technology is to basically do away with that requirement. Now we can have very large, provisioned pools of data that can grow and grow without the limitations of nodes of hardware.” One of the key new architectural elements announced by Dremio last fall was to cache data in the Apache Arrow format, a language-agnostic software framework for developing data analytics applications. Dremio had been using its Reflections tool, an internally managed persistence of data, to accelerate queries. But the tool had to be created in Apache Parquet files first, which slowed down the process. Now Dremio is using the Arrow format directly, significantly accelerating query response times by as much as 10x, according to company officials. “We can accelerate certain query patterns by creating Reflections,” Maybin said. “That’s the edge piece that gives us BI acceleration without having to use additional tools. The ability to create Reflections is certainly a differentiator.” Querying multiple data sources Another key adjustment made by Dremio was to offer scale-out query planning. This gave the platform concurrency, an ability to match the number of query coordinators with executors, thus allowing the software to support thousands of users. “We’re in the business of building technology that allows users to query large data sets in a scale-out performant way directly on the data where it lives,” Maybin said. “We can also query not just one source of data, but multiple sources of data and join those together in the context of the same query.” ... “All of these organizations either have a toe in the water or they’re halfway down the path of exploring how to take all of this on-premises data and processing and get into AWS,” Maybin said. “We provide a really good path to solve some of their on-prem problems today and then give them a clear path as they migrate to the cloud. We’re ideally positioned for that story.”

Share this session