We just sent you a verification email. Please verify your account to gain access to
Snowflake Data Cloud Summit 2024. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Snowflake Data Cloud Summit 2024
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Snowflake Data Cloud Summit 2024.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Snowflake Data Cloud Summit 2024. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Snowflake Data Cloud Summit 2024
Please sign in with LinkedIn to continue to Snowflake Data Cloud Summit 2024. Signing in with LinkedIn ensures a professional environment.
>> Hey, welcome back to Snowflake Summit, everybody. My name is Dave Vellante, and we're here and really excited to welcome you to the great Iceberg debate with George Gilbert and Sanjeev Mohan. Guys, thanks for coming back on theCUBE. I'm going to set it up and then we'll get into it. So, yesterday, as you know, we had an extremely enlightening conversation. This is something I posted on LinkedIn with a little bit more detail. We had Christian Kleinerman and Snowflake's visionary founder, Benoit Dageville. And what I called somewhat of a convenient truth is that it's really hard to actually create standards across all the compute engines from a governance standpoint and a catalog standpoint. So, what Snowflake is doing is they're open sourcing Polaris, which is really just the technical metadata. So, the somewhat inconvenient or the convenient truth for Snowflake is that the standards for governing open table formats, like Apache Iceberg, are not only lacking, but they're extremely challenging due to you got to herd the cats of all the various compute engine players and agree and then align. Why would Snowflake take on something like that? So, Snowflake, they're open sourcing Polaris, but if you want the governed catalog, you got to go to Horizon, you got to come back into the Snowflake playground. Seems like a reasonable strategy. Yeah, well-thought-out. What's your take on this?>> So, if you look at Iceberg table spec, that's what it is, it's a table format, does not specify certain things like permissions. There are no permissions, there's no security. Security has to be applied above in a different catalog. So, Snowflake actually provides this Polaris catalog, which is a technical metadata catalog, that's it. But if you want to do , low-level security, column-level security, you want to do that kind of stuff, you need Horizon. That does not come. If you don't have Horizon, then every single compute engine, like if it's Spark or Dremio or Trino or Starbucks, they have to figure out how to apply data access governance onto Iceberg. So, that's why Horizon is so important.>> So, when Sanjeev talks about technical metadata, George, explain what that is and what's the value of that?>> Okay, so when you update a table, there's data that says, "This is all the Parquet files that are in the table. These are the columns in that table." It's just really basic. It's a way to say all this collection of row groups forms a table, that's it. And as Sanjeev said, you want to know permissions, that's somewhere else. You want to know the lineage, but the kernel that says the source of truth, what is the state of the table when you update it, when you write to that table? That's the technical metadata. And so, what I found out today was the reason why Polaris was open sourced is so that for Iceberg tables, you can read and write independent of a catalog where in Unity, which is trying to have both Delta and Iceberg and Hooty support, they have not only the technical metadata, but all the other... The lineage, the permissions, all the things that are in Horizon, but they're bundling it. So, in other words, if you wanted to read and write to the open table formats, you had to take the entire Databricks catalog. Now, by getting everyone to agree that Polaris or some other catalog is enough, then they can break that link. They can break that bundle. And then you're back to are you in the Horizon family for all the richness of governance or are you then into Unity for all the richness?>> This is why I feel like it was a well-thought-out strategy. It's like let's compete on the basis of our product and let the customers decide. Now, we should mention that just as Benoit Dageville was getting on stage today, Databricks dropped the press release that they were going to acquire Tabular, Wall Street Journal reported it. It was positioned as an AI announcement. We can talk about whether or not that's the case, we don't think it is. But the timing was not coincidental. What was your takeaway of that announcement in terms of what does it mean for the customer, but specifically, for the battle between Databricks and Snowflake?>> So, I don't think this is a good move for the customers. I, in fact, had lunch with some very large companies, financial services, and they said, "We put our eggs in Iceberg basket because we were getting an open standard file format. And now Hooty is actually not really as prevalent. There are only two Delta and Iceberg, and they're both owned by the same company." And so, that's how the customers feel. Now, the reality is that Iceberg, the bus has already left the station. It's open source. Everybody is doing Iceberg at this point. So, the question that begs is, well, how much impact does this have towards Iceberg if Iceberg has already been implemented?>> So, George, your thinking on this is the what we sometimes call the six data platform is not just about separating compute from storage, that's what Snowflake popularized with cloud data warehouses. It's really about separating compute from the data. So, any compute engine can operate on any data, and that's the vision that you've put forth. And so, how does that apply here?>> Okay, so on Breaking Analysis, we had Ryan from Tabular who's the target of this acquisition. And we did another show with Ryan and one of the architects of Starburst just recently on Road to .>> Is that Ryan Blue.>> Ryan Blue. So, the value they're trying to add now is, I think it might be fairly straightforward, or will be soon, to be able to just read and write, whether it's in Delta or Iceberg. The basic abstraction is the same, where all the work that Ryan and the Tabular group were doing was adding on the permissions and to go beyond just a authentication and role-based access control, they were trying to add a full policy engine, which is tag... This is the really advanced, where you control access to data based on its attributes. That's the full-blown stuff. That's what they're building. And the whole point is you can't separate compute from storage, as you guys were saying, until you put the security. But the security now it's no longer enough just to say, "Are you in this role?" The next level is can you apply this policy? And so, I think what Databricks is buying is not just the table interoperability, but someone who's building a policy engine because they tried to buy Immuta.>> Okay, but to simplify it, you're basically saying to get that governance with Snowflake, you got to go to Horizon. Whereas, you're saying in theory with Unity, you can bring that governance to the data, even though they're both proprietary. It's not a matter of open source or not. But isn't Unity built on top of Hive? So, the question then becomes how robust is that capability?>> So, talking about Hive, it just seems to me that we have regressed. We've gone->> Our team?>> Yes->> Is it really on Hive?>> Yes, it's built on Hive.>> Well, even if it is, they can port that. That's->> Of course. But that's the core, am I right?>> So, they took inspiration from Hive, but the situation that we are in now is a combination of Hive and Ranger.>> It reminds me of Redshift.>> So, Hive and Ranger, by the way, just to clarify, there are a lot of moving parts that are going on in parallel, and they're not talking to each other. For example, on stage today at keynote, Microsoft announced this bidirectional connection with Snowflake. They already announced it at Microsoft Build->> Interesting. That's a shot across Databricks' bow.>> Right. But the question is how is Microsoft doing this and what's the roll of Polaris? But this is what I'm saying. It's a separate project and that project is the XTable. So, what happens is Microsoft Fabric is built on one lake, which uses Delta, that's it. But when you write something into Delta, then using one house, open source, Apache XTable, they're converting the catalog entries into Iceberg. And that then->> That's the table abstraction, so that everyone can read and write the table. But the next step is to take the security part and attach it to the table. And that's what the Tabular guys were building. That's what Starburst guys are building, and that's what today is in Unity and Horizon. But basically I think Polaris was trying to break apart the technical metadata from Unity, but then Tabular... And the next effort is to apply security directly to the technical metadata to grow it out of the technical metadata. And then, Horizon and Unity would be value add metadata catalogs.>> But normally, this type of competition you would think is good for customers. But in this case, I think you're right. It's maybe not so good for customers because there's all these competing standards. We remember well, Hortonworks and Cloudera. And finally, the market said, "Well, just bring them together," but it was too late. The cloud had already disrupted them. How do these guys avoid that happening? But go ahead, make your point.>> Yeah, so I was just going to say, by the way, there's also AWS Glue catalog, which has also been adding features. For example, one of the things that catalog does is the catalog doesn't just have the technical metadata. It also knows the statistics, so you can do query planning.>> But then, there's other metadata in DataZones, so that's not unified.>> Which is like Horizon. So, DataZone is like Horizon, Glue catalog is like Polaris, and Unity is sort of in between.>> But you guys are all talking about the value add. We're trying to solve first so that it doesn't matter what the table format is, whether it's XTable or whether Ryan and the Delta guys fix what uniform was announced by Databricks. We're going to fix the table interoperability. So, it doesn't matter whether it's Iceberg or Delta, then it's what else are you adding onto that?>> Right. So, I think there's a basic problem. The basic problem is do you get interoperability with vendor lock-in or not?>> But depends what type of interoperability. If security is attached to the data, then you have interoperability for the data policy and the governance. But then, you go to the value add for all the lineage and the observability and the quality. So, I think what's happening is there's going to be a base core that is growing all the time in capability that and that core is a common->> But is that at Polaris or is that at Horizon? It cannot be at Polaris.>> Horizon is going to be the bigger value add. And what we don't know, Polaris->> So, Polaris cannot have security?>> No. Well, it's not that It can't, it does not now. But what I'm trying to say is that Databricks is trying to solve so that you can read and write Iceberg and Delta, and then the next thing that they can add is a policy engine for security.>> So, these are two orthogonal projects is what I'm saying.>> Yes.>> The XTable, the cross-connection is one project. Polaris is a separate project, and right now, nobody knows how these two projects will operate.>> What Benoit or Christian said is you can't ignore... Because I asked the question, "Irrespective of the technology challenges, is it your intent to bring over that security capability, the governance capabilities to Polaris?" And he said, "You can't ignore the technical challenges. You got to get all the compute engines to agree, and that's virtually impossible." So, why do you think that Databricks is going to be able to succeed at getting that agreement? Because it's all integrated in there?>> They're trying to add a policy engine. Starburst is adding a policy engine. Salesforce is adding a policy engine. That's the next thing beyond role-based access control->> Right. So, it's more stove-pipes?>> Actually, no, because if you have a policy engine, then the thing you want to standardize on, just like metrics on the semantic layer, the tags on the data that says this is PII, this is->> Harmonize that?>> Yes.>> Okay, yep.>> If you harmonize the tags, the policy engines respect that. So, then it's who's going to add policy?>> I want to share something else. I'm really curious to see how this plays out over time. Customers want it all. They want openness. They want no lock in. They want an integrated experience. They want their cake, they want to eat it too, and they don't want to gain weight, that's what they want.>> And they want it to cure cancer.>> And they want it to cure cancer. So, in my decades of following these markets, you may recall Unix used to be considered open. It was synonymous with open source. And so, I feel like this discussion is just advanced, but->> So, Dave, I think this was another nail in the coffin of open source, That's my... Because just this year, how many open source thing, Redis, Terraform, there's a couple of new ones I'm missing. So, you just see this all the time that these open source->> They get absorbed, right?>> Yeah. Oh, actually, no, I know which one. I'm thinking it was this Kafka Connect, which became Bentos, and then Redpanda bought it just last week. Took it out of->> Hats off to IBM. They've done a good job with Red Hat. I would give them credit, but what happens to Onehouse? Are they going to get acquired by Snowflake?>> Actually, I talked to Onehouse today.>> Really?>> Yeah, I had a call with them and they don't know. I don't think they knew this was coming.>> No, this is not... The Onehouse is->> A lot of people knew this was coming, but apparently->> Apparently Snowflake was in the bidding.>> Oh, Snowflake was in the bidding, Coalesce was in the bidding->> There was a rumor that Snowflake was going to buy Tabular.... >> Google Cloud.>> Oh, they all were?>> They were all in the bidding.>> Another nail in the coffin for open source. Go ahead, George. Give us the last word.>> I wouldn't lament the loss of open source too much because what we've seen from Amazon for 15 years was they expropriate open source projects, and that the real value is in making the open source run as a service. And so this->> As a managed a service, for sure.>> Yes, this is the same thing here. But what we're trying to do is just make sure that the APIs are standardized, the APIs for how to tag data, the APIs for how to define a policy for your data, and that's just growing. We're going to agree on the data table format, then we're going to agree on how to tag policies, and then we're onto the value add metadata. So->> We're getting thrown out of the park here, so we got to go. Thanks, you guys. Really appreciate it so much.>> Thank you so much.>> To be continued. This is a really interesting conversation. All right keep it right there. For more coverage, go to thecube.net tomorrow. siliconangle.com for all the news, go to thecuberesearch.com. My name is Dave Vellante for Rebecca Knight, we're out. We'll see you tomorrow at Snowflake Summit 2024 from Moscone.>> All right. Thank you.
>> Hey, welcome back to Snowflake Summit, everybody. My name is Dave Vellante, and we're here and really excited to welcome you to the great Iceberg debate with George Gilbert and Sanjeev Mohan. Guys, thanks for coming back on theCUBE. I'm going to set it up and then we'll get into it. So, yesterday, as you know, we had an extremely enlightening conversation. This is something I posted on LinkedIn with a little bit more detail. We had Christian Kleinerman and Snowflake's visionary founder, Benoit Dageville. And what I called somewhat of a convenient truth is that it's really hard to actually create standards across all the compute engines from a governance standpoint and a catalog standpoint. So, what Snowflake is doing is they're open sourcing Polaris, which is really just the technical metadata. So, the somewhat inconvenient or the convenient truth for Snowflake is that the standards for governing open table formats, like Apache Iceberg, are not only lacking, but they're extremely challenging due to you got to herd the cats of all the various compute engine players and agree and then align. Why would Snowflake take on something like that? So, Snowflake, they're open sourcing Polaris, but if you want the governed catalog, you got to go to Horizon, you got to come back into the Snowflake playground. Seems like a reasonable strategy. Yeah, well-thought-out. What's your take on this?>> So, if you look at Iceberg table spec, that's what it is, it's a table format, does not specify certain things like permissions. There are no permissions, there's no security. Security has to be applied above in a different catalog. So, Snowflake actually provides this Polaris catalog, which is a technical metadata catalog, that's it. But if you want to do , low-level security, column-level security, you want to do that kind of stuff, you need Horizon. That does not come. If you don't have Horizon, then every single compute engine, like if it's Spark or Dremio or Trino or Starbucks, they have to figure out how to apply data access governance onto Iceberg. So, that's why Horizon is so important.>> So, when Sanjeev talks about technical metadata, George, explain what that is and what's the value of that?>> Okay, so when you update a table, there's data that says, "This is all the Parquet files that are in the table. These are the columns in that table." It's just really basic. It's a way to say all this collection of row groups forms a table, that's it. And as Sanjeev said, you want to know permissions, that's somewhere else. You want to know the lineage, but the kernel that says the source of truth, what is the state of the table when you update it, when you write to that table? That's the technical metadata. And so, what I found out today was the reason why Polaris was open sourced is so that for Iceberg tables, you can read and write independent of a catalog where in Unity, which is trying to have both Delta and Iceberg and Hooty support, they have not only the technical metadata, but all the other... The lineage, the permissions, all the things that are in Horizon, but they're bundling it. So, in other words, if you wanted to read and write to the open table formats, you had to take the entire Databricks catalog. Now, by getting everyone to agree that Polaris or some other catalog is enough, then they can break that link. They can break that bundle. And then you're back to are you in the Horizon family for all the richness of governance or are you then into Unity for all the richness?>> This is why I feel like it was a well-thought-out strategy. It's like let's compete on the basis of our product and let the customers decide. Now, we should mention that just as Benoit Dageville was getting on stage today, Databricks dropped the press release that they were going to acquire Tabular, Wall Street Journal reported it. It was positioned as an AI announcement. We can talk about whether or not that's the case, we don't think it is. But the timing was not coincidental. What was your takeaway of that announcement in terms of what does it mean for the customer, but specifically, for the battle between Databricks and Snowflake?>> So, I don't think this is a good move for the customers. I, in fact, had lunch with some very large companies, financial services, and they said, "We put our eggs in Iceberg basket because we were getting an open standard file format. And now Hooty is actually not really as prevalent. There are only two Delta and Iceberg, and they're both owned by the same company." And so, that's how the customers feel. Now, the reality is that Iceberg, the bus has already left the station. It's open source. Everybody is doing Iceberg at this point. So, the question that begs is, well, how much impact does this have towards Iceberg if Iceberg has already been implemented?>> So, George, your thinking on this is the what we sometimes call the six data platform is not just about separating compute from storage, that's what Snowflake popularized with cloud data warehouses. It's really about separating compute from the data. So, any compute engine can operate on any data, and that's the vision that you've put forth. And so, how does that apply here?>> Okay, so on Breaking Analysis, we had Ryan from Tabular who's the target of this acquisition. And we did another show with Ryan and one of the architects of Starburst just recently on Road to .>> Is that Ryan Blue.>> Ryan Blue. So, the value they're trying to add now is, I think it might be fairly straightforward, or will be soon, to be able to just read and write, whether it's in Delta or Iceberg. The basic abstraction is the same, where all the work that Ryan and the Tabular group were doing was adding on the permissions and to go beyond just a authentication and role-based access control, they were trying to add a full policy engine, which is tag... This is the really advanced, where you control access to data based on its attributes. That's the full-blown stuff. That's what they're building. And the whole point is you can't separate compute from storage, as you guys were saying, until you put the security. But the security now it's no longer enough just to say, "Are you in this role?" The next level is can you apply this policy? And so, I think what Databricks is buying is not just the table interoperability, but someone who's building a policy engine because they tried to buy Immuta.>> Okay, but to simplify it, you're basically saying to get that governance with Snowflake, you got to go to Horizon. Whereas, you're saying in theory with Unity, you can bring that governance to the data, even though they're both proprietary. It's not a matter of open source or not. But isn't Unity built on top of Hive? So, the question then becomes how robust is that capability?>> So, talking about Hive, it just seems to me that we have regressed. We've gone->> Our team?>> Yes->> Is it really on Hive?>> Yes, it's built on Hive.>> Well, even if it is, they can port that. That's->> Of course. But that's the core, am I right?>> So, they took inspiration from Hive, but the situation that we are in now is a combination of Hive and Ranger.>> It reminds me of Redshift.>> So, Hive and Ranger, by the way, just to clarify, there are a lot of moving parts that are going on in parallel, and they're not talking to each other. For example, on stage today at keynote, Microsoft announced this bidirectional connection with Snowflake. They already announced it at Microsoft Build->> Interesting. That's a shot across Databricks' bow.>> Right. But the question is how is Microsoft doing this and what's the roll of Polaris? But this is what I'm saying. It's a separate project and that project is the XTable. So, what happens is Microsoft Fabric is built on one lake, which uses Delta, that's it. But when you write something into Delta, then using one house, open source, Apache XTable, they're converting the catalog entries into Iceberg. And that then->> That's the table abstraction, so that everyone can read and write the table. But the next step is to take the security part and attach it to the table. And that's what the Tabular guys were building. That's what Starburst guys are building, and that's what today is in Unity and Horizon. But basically I think Polaris was trying to break apart the technical metadata from Unity, but then Tabular... And the next effort is to apply security directly to the technical metadata to grow it out of the technical metadata. And then, Horizon and Unity would be value add metadata catalogs.>> But normally, this type of competition you would think is good for customers. But in this case, I think you're right. It's maybe not so good for customers because there's all these competing standards. We remember well, Hortonworks and Cloudera. And finally, the market said, "Well, just bring them together," but it was too late. The cloud had already disrupted them. How do these guys avoid that happening? But go ahead, make your point.>> Yeah, so I was just going to say, by the way, there's also AWS Glue catalog, which has also been adding features. For example, one of the things that catalog does is the catalog doesn't just have the technical metadata. It also knows the statistics, so you can do query planning.>> But then, there's other metadata in DataZones, so that's not unified.>> Which is like Horizon. So, DataZone is like Horizon, Glue catalog is like Polaris, and Unity is sort of in between.>> But you guys are all talking about the value add. We're trying to solve first so that it doesn't matter what the table format is, whether it's XTable or whether Ryan and the Delta guys fix what uniform was announced by Databricks. We're going to fix the table interoperability. So, it doesn't matter whether it's Iceberg or Delta, then it's what else are you adding onto that?>> Right. So, I think there's a basic problem. The basic problem is do you get interoperability with vendor lock-in or not?>> But depends what type of interoperability. If security is attached to the data, then you have interoperability for the data policy and the governance. But then, you go to the value add for all the lineage and the observability and the quality. So, I think what's happening is there's going to be a base core that is growing all the time in capability that and that core is a common->> But is that at Polaris or is that at Horizon? It cannot be at Polaris.>> Horizon is going to be the bigger value add. And what we don't know, Polaris->> So, Polaris cannot have security?>> No. Well, it's not that It can't, it does not now. But what I'm trying to say is that Databricks is trying to solve so that you can read and write Iceberg and Delta, and then the next thing that they can add is a policy engine for security.>> So, these are two orthogonal projects is what I'm saying.>> Yes.>> The XTable, the cross-connection is one project. Polaris is a separate project, and right now, nobody knows how these two projects will operate.>> What Benoit or Christian said is you can't ignore... Because I asked the question, "Irrespective of the technology challenges, is it your intent to bring over that security capability, the governance capabilities to Polaris?" And he said, "You can't ignore the technical challenges. You got to get all the compute engines to agree, and that's virtually impossible." So, why do you think that Databricks is going to be able to succeed at getting that agreement? Because it's all integrated in there?>> They're trying to add a policy engine. Starburst is adding a policy engine. Salesforce is adding a policy engine. That's the next thing beyond role-based access control->> Right. So, it's more stove-pipes?>> Actually, no, because if you have a policy engine, then the thing you want to standardize on, just like metrics on the semantic layer, the tags on the data that says this is PII, this is->> Harmonize that?>> Yes.>> Okay, yep.>> If you harmonize the tags, the policy engines respect that. So, then it's who's going to add policy?>> I want to share something else. I'm really curious to see how this plays out over time. Customers want it all. They want openness. They want no lock in. They want an integrated experience. They want their cake, they want to eat it too, and they don't want to gain weight, that's what they want.>> And they want it to cure cancer.>> And they want it to cure cancer. So, in my decades of following these markets, you may recall Unix used to be considered open. It was synonymous with open source. And so, I feel like this discussion is just advanced, but->> So, Dave, I think this was another nail in the coffin of open source, That's my... Because just this year, how many open source thing, Redis, Terraform, there's a couple of new ones I'm missing. So, you just see this all the time that these open source->> They get absorbed, right?>> Yeah. Oh, actually, no, I know which one. I'm thinking it was this Kafka Connect, which became Bentos, and then Redpanda bought it just last week. Took it out of->> Hats off to IBM. They've done a good job with Red Hat. I would give them credit, but what happens to Onehouse? Are they going to get acquired by Snowflake?>> Actually, I talked to Onehouse today.>> Really?>> Yeah, I had a call with them and they don't know. I don't think they knew this was coming.>> No, this is not... The Onehouse is->> A lot of people knew this was coming, but apparently->> Apparently Snowflake was in the bidding.>> Oh, Snowflake was in the bidding, Coalesce was in the bidding->> There was a rumor that Snowflake was going to buy Tabular.... >> Google Cloud.>> Oh, they all were?>> They were all in the bidding.>> Another nail in the coffin for open source. Go ahead, George. Give us the last word.>> I wouldn't lament the loss of open source too much because what we've seen from Amazon for 15 years was they expropriate open source projects, and that the real value is in making the open source run as a service. And so this->> As a managed a service, for sure.>> Yes, this is the same thing here. But what we're trying to do is just make sure that the APIs are standardized, the APIs for how to tag data, the APIs for how to define a policy for your data, and that's just growing. We're going to agree on the data table format, then we're going to agree on how to tag policies, and then we're onto the value add metadata. So->> We're getting thrown out of the park here, so we got to go. Thanks, you guys. Really appreciate it so much.>> Thank you so much.>> To be continued. This is a really interesting conversation. All right keep it right there. For more coverage, go to thecube.net tomorrow. siliconangle.com for all the news, go to thecuberesearch.com. My name is Dave Vellante for Rebecca Knight, we're out. We'll see you tomorrow at Snowflake Summit 2024 from Moscone.>> All right. Thank you.