In this #SnowflakeSummit 2025 segment, Amit Sangani, who leads Meta’s AI Platforms Engineering team, and Dwarak Rajagopal of Snowflake AI Research join theCUBE’s Dave Vellante and John Furrier to explore how Snowflake and Meta are accelerating enterprise-grade generative AI. The conversation spotlights the strategic partnership that brings Meta’s open-source Llama models directly into Snowflake Cortex AI, giving customers a secure, zero-copy path to build, fine-tune and deploy AI workloads on the AI Data Cloud.
Sangani unpacks the recent Llama 4 release and the enterprise use cases where the model excels, from content generation to intelligent agents. Rajagopal shares how customers such as TripAdvisor are already tapping Llama inside Cortex AI to deliver personalized recommendations and richer user experiences. The duo explains why open-source models resonate with enterprises – offering ecosystem flexibility, vibrant community support and full ownership of fine-tuned weights – while detailing Snowflake’s built-in trust and safety guardrails.
The discussion also dives into Snowflake AI Research’s SwiftKV optimizations, which double Llama throughput on high-end GPUs and cut inference costs by up to 75%. Viewers will learn how these advances move Cortex beyond analytics to real-time, interactive AI and what’s next as data engineering, DevOps and AI development converge on Snowflake’s platform. It’s a must-watch for builders seeking efficient, governed pathways to scale AI across structured and unstructured data.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Snowflake Summit 2025. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Snowflake Summit 2025
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Snowflake Summit 2025.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Snowflake Summit 2025. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Snowflake Summit 2025
Please sign in with LinkedIn to continue to Snowflake Summit 2025. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Amit Sangani, Meta & Dwarak Rajagopal, Snowflake
In this #SnowflakeSummit 2025 segment, Amit Sangani, who leads Meta’s AI Platforms Engineering team, and Dwarak Rajagopal of Snowflake AI Research join theCUBE’s Dave Vellante and John Furrier to explore how Snowflake and Meta are accelerating enterprise-grade generative AI. The conversation spotlights the strategic partnership that brings Meta’s open-source Llama models directly into Snowflake Cortex AI, giving customers a secure, zero-copy path to build, fine-tune and deploy AI workloads on the AI Data Cloud.
Sangani unpacks the recent Llama 4 release and the enterprise use cases where the model excels, from content generation to intelligent agents. Rajagopal shares how customers such as TripAdvisor are already tapping Llama inside Cortex AI to deliver personalized recommendations and richer user experiences. The duo explains why open-source models resonate with enterprises – offering ecosystem flexibility, vibrant community support and full ownership of fine-tuned weights – while detailing Snowflake’s built-in trust and safety guardrails.
The discussion also dives into Snowflake AI Research’s SwiftKV optimizations, which double Llama throughput on high-end GPUs and cut inference costs by up to 75%. Viewers will learn how these advances move Cortex beyond analytics to real-time, interactive AI and what’s next as data engineering, DevOps and AI development converge on Snowflake’s platform. It’s a must-watch for builders seeking efficient, governed pathways to scale AI across structured and unstructured data.
In this Snowflake Summit 2025 segment, Amit Sangani, who leads Meta’s AI platforms engineering team, and Dwarak Rajagopal of Snowflake AI research, join theCUBE’s Dave Vellante and John Furrier to explore how Snowflake and Meta are accelerating enterprise-grade generative AI. The conversation spotlights the strategic partnership that brings Meta’s open-source Llama models directly into Snowflake Cortex AI, giving customers a secure, zero-copy path to build, fine-tune and deploy AI workloads on the AI Data Cloud.
Sangani unpacks the recent Llama 4 rel...Read more
exploreKeep Exploring
What is the partnership between Meta and Snowflake and how is it helping companies move faster with AI?add
What are some of the challenges in achieving accuracy in text to SQL, and how does Cortex address these challenges while using Llama to improve text to SQL performance?add
What is Snowflake Intelligence and how does it work for business users?add
>> Good afternoon everyone, and welcome back to theCUBE's Live coverage of the Snowflake Summit 2025 here at the Moscone Center in San Francisco. I'm your host, Rebecca Knight, alongside my co-host and analyst, George Gilbert. I would like to welcome two new guests, the show, Dwarak Rajagopal, VP of AI Engineering and Research at Snowflake, welcome.
Dwarak Rajagopal
>> Glad to be here.
Rebecca Knight
>> And Amit Sangani, Senior Director of AI Platform Engineering at Meta. Thank you both so much for coming on the show.
Amit Sangani
>> It's an honor.
Rebecca Knight
>> An honor, I love it. So I actually have this question for both of you, but I'll start with you first, Amit. And just talk a little bit about how Meta and Snowflake are working together and how this partnership is really helping companies move faster with AI.
Amit Sangani
>> Yeah, absolutely. Snowflake has been our partner ever since day zero. So when we launched Llama 2 with commercial license, Snowflake has been one of our top partners. We have partnered with Snowflake on Llama 2, Llama 3. Now just we recently launched Llama 4 models and Snowflake has been our partner. We are working very closely, it's integrated in their Cortex AI platform where we are basically making sure that the models are working very well across the agentic systems. And then Dwarak can talk more about how it's being used inside the Cortex platform.
Dwarak Rajagopal
>> Yeah, absolutely. Our customers use Llama via Cortex platform, which is fully governed in a secure platform. The customers use it for doing Cortex search, Cortex analysts across the platform. One big example here is, for example, TripAdvisor, they've been using Llama models via the Cortex to actually power personalized recommendation, travel recommendations for their users. And have seen big both positive customer engagement, but also big business impact as well. So that's a great way for us to actually be using Llama.
Rebecca Knight
>> Excellent.
George Gilbert
>> So this is interesting that you talk about recommendations because historically, pre-generative AI, that was the most important machine learning application and it was its own set of algorithms. How do you enhance that use case with generative AI?
Dwarak Rajagopal
>> I think with our Cortex search and Cortex analyst where you could use GenAI to generate text to SQL and directly actually answer both structured and unstructured insights from your data and using that to kind of have personalized recommendations is how TripAdvisor have been using it.
Rebecca Knight
>> So as you both have just mentioned that Llama 4 just has launched and you've been working together since Llama 2, I believe, what should enterprise teams know about this new model and what it's good at? And are there any use cases that are really starting to shine?
Amit Sangani
>> Yeah, absolutely. So the Llama 4 models uses the mixture of experts models. This is the new era of new architecture within the LLM space. These models are highly specialized in terms of the way they run, they are compute efficient. So the way mixture of expert works there is a standard expert or basically a router, which depending on the query routes your request to different experts. The Llama Scout model, which is a smaller model, has 16... It's basically 17 billion parameter model, 16 experts, and then Maverick has 17 billion and around 128 experts. These models are generalist models, so they can support all the basic use cases, plus they are super compute efficient. They can be run on a single H100 node and they are used for pretty much any use cases. We have seen applications from healthcare to telecommunication for finance and various different purposes.
George Gilbert
>> So diving into how a developer would use one of these, talk about how the text to SQL works because that's been... Snowflake's been talking about this for a long time and we know that it's been a struggle across the industry to get the accuracy up both for a complex query and a complex database schema. How does Cortex handle that? This is now on the analyst side. And how does Llama help?
Dwarak Rajagopal
>> Yeah, so absolutely. So one of the key challenges with text to SQL is understanding the business metrics to the actual information in the tables. And one of the things that we use Cortex is to kind of ingest some of these business metrics through dashboards and through other mechanisms where then you could use the agentic system to kind of power how you can orchestrate between agentic reasoning on top of text to SQL and the recent breakthrough that what we had with our text to SQL R1 model where we basically use post-trained Llama to kind of improve our text to SQL performance. And that was something which is very critical to show, we are basically the leaders in the BERT SQL model benchmarks as well.
George Gilbert
>> Okay. So in other words, that now enables the business user who's not familiar with how to build a dashboard, who's not familiar with SQL, talks in natural language, the results come back as a table. Can it support the follow on questions? And it understands the context of dialogue?
Dwarak Rajagopal
>> Yeah, absolutely. So the Snowflake Intelligence, which we released today for the private preview and will be soon available for public preview, is basically a product for business users where a user can have a chat-based conversations and actually ask questions in natural language and it can result in dashboards or tables where the user can actually ask the next questions and the questions subsequent based on the context that are available. The model actually already understands this through the context and also the memory, which is across the orchestration, and that is all covered by Cortex.
Rebecca Knight
>> So what does that unlock for the business user, who, as you said, doesn't know how to build a dashboard, but then all of a sudden can get these insights back in a table?
Dwarak Rajagopal
>> Yeah, one example is, say for example, I'm a marketer and I want to understand how my consumer segments are, and for them to actually do it in a normal way, they have to go talk to their data engineers, data scientists to build the pipelines, and it takes multiple days or if not weeks, and multiple people to actually go and get that information. While with analysts and Snowflake Intelligence, you could basically use natural language processing and ask this question, and this is all powered by models running on native platforms and kind of do the overall insights directly to your desktop immediately. And you can also not just receive information but also act on it, that you could basically act like, okay, I need to send an email based on this information to my product owners and do corresponding action items. So that's some of the key insights that is being unlocked by the AI models and the AI systems that we are building.
George Gilbert
>> Did I understand you to say that you guys did some fine-tuning with the Llama models to make this work better? Or can you essentially prompt any of the models in the model garden?
Dwarak Rajagopal
>> So we use a mixture of these things, so since Llama is open source, we have the ability to do kind of fine-tune, or even post-train these models based on some of the patterns that we see. So we've done that for Llama models and that kind of unlocks... Because the open source model gives you the flexibility to kind of do more things and you can also not just fine-tune, but also maybe reduce the size of the models because to kind of handle the speed and the latency based on the use cases, so that's one thing that unlocks because of the Llama models.
George Gilbert
>> So is this something... I know distillation is something that the closed models offer, but what I thought I heard was it sounds like there's a compound system, a collection of models that's working this system. Where does Llama fit? And what are the other models?
Dwarak Rajagopal
>> I think so for example, another product we released today is AI SQL, where you bring basically AI to the analyst. So for example, every data engineer or data analyst can be now having a superpower for AI. What it does is it basically can do unstructured analysis on structured tables. The way to do it is to decide which model to actually bring in, and there is a cascade of models that is powered by the query engine and we decide which model to bring in based on the customer use case. If a model needs to be... If the response needs to be faster, you could kind of do it with a smaller model, and oftentimes we try to fine tune some of the smaller models. If the model requires a much longer context and much bigger things, we then call a bigger model. So it's very dynamic and the query engine basically optimizes such that the user doesn't have to worry about these things.
Rebecca Knight
>> I'm interested in how you described it as it really supercharges their career in the sense of, I mean, not just about the enterprise, but about the individual and what the individual can accomplish in his or her career. Can you talk a little bit more about that in terms of bringing people along and getting them excited about using this technology?
Amit Sangani
>> Yeah, absolutely. So the whole purpose of open sourcing these models was to make sure we democratize a developer sitting anywhere in the world to be able to get access to the same infrastructure, which somebody... They couldn't do it. And so giving them access gives them a lot of the features as part of this model. So let's say Lama 4 models come with 10 million context window, that eliminates many of the rag use cases. It is multimodal, so it supports 12 different languages. So if you're building an application in different languages, you are able to do that. And all of this is available to any developer in the world, they can download these models, they can fine tune the models, like you said, they can distill the model, they can prune the models, quantize the models, and then they can use it in their own applications or they can upload it to partner sites like Hugging Face and then others can use it. That's the power. You're basically elevating the innovation across the world.
George Gilbert
>> So you're in a position that the model capabilities are advancing very, very fast and on many vectors, the pre-training, the post-training, the inference time, and so when we're back here next year, what sort of capabilities will we see, even if it's not Llama 5, but it's more post-training and extended test time compute, inference time. What will that enable the Cortex analyst, Cortex search, and the agent orchestration to do when we're here?
Amit Sangani
>> Yeah, it's a great question. So I think one of the biggest innovation which is happening in the industry is around agent existence, where a complex workflow is broken down into multiple steps, and the model is the foundation of how you do that. The model needs to make sure it does not hallucinate. The model needs to make sure it is able to understand the responses from the first step in the workflow and then use that as an output to... As an input to the second step. And that multi-step process is very complex today. None of the models actually do that successfully. So by next year, if we are able to reduce the hallucination significantly, and models can do multi-step process successfully, I think that's a big win for the industry because then that opens up tremendous opportunities for AI coworkers building... Like, if I have an AI agent sitting beside me who can do multitask, run debugging of the code, write code, do financial analysis and bunch of stuff, then that can amplify what humans can do, and that's what I'm super excited about.
Dwarak Rajagopal
>> Cool. Yeah, trust is super important for us, and all of these improvements are going to be like music to our ears and our customers ears. I'm super excited to partner with Meta on some of these things.
Rebecca Knight
>> As you said, it's all about building that trust, making sure that customers feel that they can experiment safely within this environment. Absolutely.
Amit Sangani
>> Absolutely.
Rebecca Knight
>> Absolutely. Well, thank you both so much, Amit, Dwarak, it's been a pleasure having you on the show.
Dwarak Rajagopal
>> Thank you so much.
Amit Sangani
>> Thank you so much.
George Gilbert
>> Thanks guys.
Rebecca Knight
>> I'm Rebecca Knight for George Gilbert, stay tuned for more of theCUBE's live coverage of the Snowflake Summit 2025. You're watching theCUBE, the leader in enterprise news and analysis.