theCUBE + NYSE Wired: Physical AI & Robotics Leaders | Rodrigo Liang, SambaNova

Clips
More from theCUBE + NYSE Wired: Physical AI & Robotics Leaders

Rodrigo Liang

CEO

SambaNova Systems

play_circle_outline Rodrigo Liang of SambaNova: Building the Future of AI Infrastructure through Compute, Storage, Networking, and Data Innovations

play_circle_outline Evaluating SambaNova's Generation Four RDU: A Performance Comparison with NVIDIA in Neural Network Efficiency and Power Usage

play_circle_outline Enhancing Enterprise Data Security: Exploring Private AI and On-Prem Infrastructure in Hybrid AI Solutions for Customer Needs

play_circle_outline Training custom models based on private data while maintaining ownership with SambaNova.

play_circle_outline Overview of seamless integration from cloud to on-prem deployment for enterprises.

Info
Transcript

Rodrigo Liang, SambaNova

Rodrigo Liang

CEO SambaNova Systems

Rodrigo Liang, chief executive officer at SambaNova Systems Inc., joins theCUBE’s John Furrier and Dave Vellante at theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025 event. Their discussion explores the rapid evolution of AI infrastructure and SambaNova’s role in redefining performance benchmarks at scale.

Liang highlights the debut of SambaNova’s Gen Four RDU, built to meet the demands of high-efficiency neural network processing. The conversation covers data sovereignty, energy constraints and the enterprise need for secure, on-prem AI ... Read more

explore Keep Exploring

What is the current state of momentum in the AI infrastructure market, particularly regarding constraints such as compute, storage, networking, and data? add

What is the significance of Generation Four and its capabilities in relation to neural networks? add

What are the current challenges organizations face in leveraging their hybrid data, particularly regarding data privacy and security, model size, and data center power capabilities? add

What are the benefits of using open source models on the SambaNova platform for enterprises regarding data ownership and security? add

What is the process for using SambaNova cloud to experiment with models before deploying them on-premises? add

bolt Powered by CUBE AI

Rodrigo Liang, SambaNova

search

>> Hello, welcome to theCUBE here at our Palo Alto studios. I'm John Furrier, host here with Dave Vellante, my co-host, co-founder. I'm excited to have this week a great presentation, Robotics and AI Leaders. This is our first one-year anniversary in partnering with the NYSC, forming the NYSC Wired community, which is a combination of theCUBE and the NYSC groups coming together. Rodrigo Liang is here. He is the CEO of SambaNova. Rodrigo, great to have you back on theCUBE. Good to see you.

Rodrigo Liang

>> Thanks for having me.

>> So this AI infrastructure thing is hot, okay? And we're seeing that constraint is compute, storage, networking, and data. I don't say database, but there's multiple databases. So you're in the hottest area right now. Talk about the momentum you guys have in context to where the market is.

Dave Vellante

>> Right.

>> Because right now, the agents are waiting to be unleashed on the world.

Dave Vellante

>> Right.

>> But that's going to be predicated on what horsepower is behind it.

Rodrigo Liang

>> Yeah, so I mean, this is really exciting. I think the last few years, we saw the world train models and go into this experimentation and trying to figure out what models run to now, over the last 12 months, what we're seeing is everyone's going to inferencing. How do you deploy these models to drive value into the business? How do we do it across a broad range of industries, right, and governments? And so it's incredibly exciting. Every day, I wake up, I see some new use case, and I think the world's ready for this transition.

>> It's been great to follow your momentum. I notice you've got a prop there. I'd love to take a look at that.

Rodrigo Liang

>> Yeah, this is our-

>> Show it up and explain what it is....

Rodrigo Liang

>> this is our Gen Four. This is our Generation Four is an RDU, a reconfigurable data flow unit. What someone over there decided to do is we decided to create silicon that matches the way that these neural nets want to run. The neural nets are data flow by construct and so this was an idea that was really researched by my co-founders at Stanford and we've now commercialized it after eight years and so we're really excited. The net net of it, as these models are getting bigger, we're able to run this 10 times faster than an NVIDIA chip at 1/10th the power, 1/10th the power. And we'll talk more about power because I do think that as AI grows, that's going to be the constraint for how do you scale AI?

Dave Vellante

>> And the use case is primarily inference? Is that right?

Rodrigo Liang

>> Well, a lot of people use this for inference because the power efficiency for this product is significantly better.

Dave Vellante

>> Right.

Rodrigo Liang

>> But fine-tuning, if you have private data, you want training to use these models, fantastic for that as well.

>> So just saw the use case you guys are using that chip for your own cloud and are you selling it to others to embed in to other systems?

Rodrigo Liang

>> Today, we sell them in racks, these fully integrated racks. So you can take... if you're an enterprise and you have data privacy, data security concerns, you don't want your problems going out into the public domain. Somewhat of a we'll roll the racks in into your secure environment, into the same subnet that you already have all certified with your security and we're running the best models inside your own .

>> You got all the hot buttons for us because you hit so many things we cover: private AI, on-prem infrastructure. Startups are trying to get in there. But the big thing I want to get with you right now first, before we get in some of those software side of, is that with the advances in first token out and then with reasoning and now reinforced learning with human feedback, you're starting to see the demand for tokens for the application layers so high that that's causing enterprises and even the big clouds to look at architecture change.

Rodrigo Liang

>> Right.

>> This is where you fit in because it's energy bounded. The success of these large scale data centers is bounded by energy. So the energy is the constraint.

Rodrigo Liang

>> 100%.

>> Talk about why this fits into that and then also, the data flows, how that matches into the how tokens will be used.

Rodrigo Liang

>> Yeah, 100%. What we're seeing here is there's a disconnection between the user base and the people who are offering those services because their user base, they're using more and more tokens as you discussed. Here's an interesting data point. When chat started, chat started, you put a prompt, and it responds. 1x token's about 3,000 tokens per on the average. Once you get to reasoning, it's not 10x more. It's 100x more. People are generating 20, 30 page documents. And what's scarier is agentic systems, another 10 to 100x more per prompt, right? And so this is kind of what we have to-

>> And the consequences of that is that the large scale system demand and more power is what has to deliver. Is that right?

Rodrigo Liang

>> Exactly. The KPI you need to think about is tokens per user, per watt. That's those are the three things that you have to... how many tokens you can produce, how quickly, for how many users can concurrently use it for an energy unit. So SambaNova, we've created this new architecture to handle a very large number of users concurrently for very, very high throughput because as these token counts increases, you need to do it really, really efficiently and then drive the power way, way down because ultimately, we're going to see that the data centers and the power at data centers are going to be the constraint.

Dave Vellante

>> And to your point, it's not 10x. It's 100x. We've heard Jensen say we underestimated demand by 100x. A lot of that was training. Does that ripple through to inference as well as you do reasoning and other use cases, whether it's on-prem or at the edge?

Rodrigo Liang

>> Well look, the next three to five years, I think you're going to see inference become 90 plus percent of everything we do, right? Training is, in the world of search, training is creating a new search algorithm. Inferencing is using the search, right? And so all of us will be doing inferencing and so that's 95%. And I think you're going to see the use cases accelerate and as these models and agents get deployed, I think we're going to start losing track of where it's all .

Dave Vellante

>> And you're building systems for the cloud. You could build them up for on-prem. One of the trends that we've seen in talking to a lot of the banks and other companies, the big New York-based companies and others is they're not going to put all their data into the cloud. They're going to bring the AI to the data, but they don't have an on-prem AI stack, so they're building them. Presumably, they're building with folks like you and they need to build out an entire stack as today, much of it is just hardware.

Rodrigo Liang

>> Right.

Dave Vellante

>> So what are you seeing in terms of that trend?

Rodrigo Liang

>> That's fantastic because I think the three blockers that you have today is they had the data privacy and security problems and it's hybrid anyway. They have a lot of very critical data on-prem and so they need to figure out how to monetize on that data. What's public is public. You got lots of options. But what's private, you have three problems. One, you need a large model, right? And today, OpenAI and some of these folks have large models, but the open source models are getting really, really good. And SambaNova, we're number one on the very big models. Second is their data centers don't have enough power.

Dave Vellante

>> Right.

Rodrigo Liang

>> If you look at Go On-Prem, they don't have a gigawatt data center. Even the largest banks don't. And so SambaNova, deploying a rack at 10 kilowatts compared to 140 kilowatts of NVIDIA, 10 kilowatts goes into most of your existing data centers. And then the third thing that you have is then you have this ability to do multi-tenancy, which allows an enterprise, a large enterprise to be able to, with a very small footprint, host hundreds if not thousands of users concurrently.

Dave Vellante

>> And those data centers are air-cooled, for the most part. The question that people are asking is do we want to put in the CapEx to go water-cooled? Now, what's the story here? Do you run a hybrid? Do you run water-cooled, air-cooled, a combination?

Rodrigo Liang

>> 100%. It's 100% standard air-cooled. And so 10 kilowatts, you roll in. It's a standard 19-inch rack with 42 RU, standard Ethernet, standard Kubernetes, standard Red Hat Linux. It's all standard, standard, standard because we designed this for the enterprise. And if I look at the data center in there, most people don't want to upgrade the enterprise with a huge amount of CapEx. And so what they want is they want latest AI, but in the environment they already have, and so that's what we did this for and-

>> Well, the other issue is existing infrastructure. The time it takes to refactor a data center to handle a water and cooling solution that's not on an OCP rack.

Rodrigo Liang

>> Right.

>> That's crushing to the time to value....

Rodrigo Liang

>> that's right. Exactly right.

>> That's a huge issue.

Rodrigo Liang

>> It's not every deployment would do this. We had one customer where from the time that the machine landed on site to the time they were actually able to actually chat to a very large model was in minutes, 45 minutes in that particular case. And so it's not every case that's like that, but the ability for you to actually roll something in, deploy it, and now, you have the largest models in the world running in your own data center? It's time to value.

Dave Vellante

>> How are organizations testing all this on-prem AI stuff? They're building out their own stack. Do they have the skill sets? Are they cobbling together sort of the rest of the stack? How are you helping there?

Rodrigo Liang

>> Yeah, I mean, I think what you've seen is a lot of people have, over the years, bought their own NVIDIA cluster and their training models and things like that. One of the things that SambaNova were doing today is we give you access to cloud.sambanova.ai. You got a cloud.sambanova.ai. You get all of the experimentation, all the models, some the best open source models, time to first token that you're talking about, all of those things. Once you're comfortable and you've done a lot of the software work, that same image just get transplanted and dropped into your own private data center. So what happens there is that time to market, your ability to actually lower your risk in the the software development on our cloud, on our dime before you have to make that investment becomes a greater enabler.

>> So take me through this because I see you guys. I've been following you guys, been loving the progress, but you're hitting on a new kind of dynamic. The agents, I mean, if I was an enterprise, I'd be like, "Okay, you're my agent infrastructure." I can see that. So take me through how I would do that because ease of use is also another issue of helping the startup understand where to play in the stack. I have an ecosystem, if I'm an enterprise. I just want to get this done. I got this POC backlog going on. I'm busy. You've got the suite. You've got a studio model. How do I integrate? Take me through the integration because I want to go fast. What's the playbook?

Rodrigo Liang

>> Yeah, so I mean, this is straightforward. So SambaNova, if you go to SambaNova cloud, you'll already see the best open source models there. Why do we do open source is because most enterprises want to actually train their private data into it, but that resulting model they want to own in perpetuity, right? If I don't have a model that I can give to you, you don't get to own the resulting model of your private data. So that's number one, ownership, model ownership.

>> And that's secure to them?

Rodrigo Liang

>> Secure to them because we put that-

>> In your cloud....

Rodrigo Liang

>> in our cloud and once we move on-prem, that's forever in their own environment.

Dave Vellante

>> But no leakage in your cloud. You fenced that off. That's their data.

Rodrigo Liang

>> Completely secure. Their data, their agents train there, and they own that model in perpetuity. And so the first thing is we give you 95% of what it takes to actually create an agent, all pre-done on SambaNova because we take those open source models and make it all ready. Then we give you the ability to then take your private data, fine tune it into it. Now, you have a very custom model that understands your business better than anything.

>> You just built the algorithms in the software.

Rodrigo Liang

>> That's right. And now, we have an orchestration layer that allows you to then host hundreds of agents within a single rack and swaps in a millisecond.

>> Oh, you went too fast. Okay, so I get the cloud. I'm onboarding. I'm training. I'm building my IP around my private data. Now I then buy a rack of SambaNova systems? What am I doing on-

Rodrigo Liang

>> So the first thing most people will do is they'll come to SambaNova cloud. So you go cloud.sambanova.ai. You go on there and you can see all the open source models. You've trained your private models. Now you have them. Next thing they do is they use the SambaNova cloud and start experimenting. Let me string these agents together in this way because I want to make sure that this workflow actually does what I want it to do because it's calling various different models at any given prompt. And so you get to come on SambaNova cloud in a very secure instance of it and you do all of that work. Once you're happy with it, then what happens is, "Okay, well because of regulation, because of certification, because of my policies, I have to run it on-prem. I don't get to operate this ."

>> So they buy a system from-

Rodrigo Liang

>> And they buy a system or they subscribe to SambaNova. And so SambaNova, two ways of .

>> On the cloud side....

Rodrigo Liang

>> on-prem too. You can come in and say, "I want to do SambaNova suite," which is hardware, software included on a monthly subscription. Just roll n number of racks. I'll operate and SambaNova will manage all of the models on my behalf.

>> Yeah, that's good.

Dave Vellante

>> And spin it up in the cloud and then bring it into on-prem.

>> So I'm buying a rack. I got to make room in the data center for a rack or just the boxes or what? Take me through this.

Rodrigo Liang

>> A single 19-inch rack that you see and so the chips are all fully integrated in there. And if you air-cooled, air-cooled, you plug in Ethernet, standard Ethernet. There's no custom networking. It's standard Kubernetes, standard Red Hat Linux. And so your sys admin people know how to actually connect to your data systems, all that. And then what you do is then you pull up the same SambaNova cloud environment and you fire up your agent and the agent then points to all .

>> You're building an agent factory.

Rodrigo Liang

>> Exactly. It's the same. Whether that's a token factory for very large models or the agentic factory for these enterprises, basically the construct is the same. You need to provide these customized models and very, very efficiently deploy them in production.

Dave Vellante

>> And the appeal is you can get there fast. You don't have to retool your data center. What's your go-to-market? What's your route to market? How can you describe that?

Rodrigo Liang

>> Yeah, yeah. So I mean, a set of different customers. So the first thing that most people do, the developers, whether you are a SaaS company in Silicon Valley or you are an AI development department inside a large bank, it doesn't matter. Most people come and come to SambaNova's cloud and they start developing their own user applications, so that's very quick. And so very, very quickly, you can start then consuming tokens. You can come to SambaNova cloud and say, "Well, I would like to do a consumption-based model." So you purchase tokens and you start using tokens on our cloud for the computing and that allows you a very low cost of entry because you can use pay as you go. Very quickly, though, as you go into production, people start realizing, "Well, that's for data privacy, security concerns, whether national sovereignty concerns." In some countries, you cannot expose that data to parts outside the country. For whatever reason, they start having needs to deploy their own cluster. And that's when we come in and we say, "Okay, we'll deploy the cluster." The experience is identical, right? But now, I've solved the security, the privacy, the sovereignty issues because I will deploy wherever you want.

>> You're a private AI solution. You're also an agentic pathway for the agentic infrastructure as well as general purpose tokens for any kind of .

Rodrigo Liang

>> For large models. Yeah, exactly. What we want to do is we want to take enterprise. Enterprise is... everyone's going through this transition, but they don't want to invest the large, large machine learning expertise that, say, a Meta or an Apple has, right? And so these companies want to get the benefit of AI.

>> their own specialty models for their business. What's the relationship with the bigger models because you said you can get a custom model with open source. Are they fine-tuning off them? Are they distilling off them? Because the small models, we heard Jamie Dimon at Databricks last week saying, "Hey, we're doing a ton of small models too." So they just want their own stuff.

Rodrigo Liang

>> Yeah, yeah, yeah. Well, I think, look, here's what I think the small models are great and great for agents. We do those and we can host thousands of these agents in a single rack and swap them, so that's an efficiency thing. But what you're seeing with these reasoning models is the bigger the model, the better. What's the best reasoning model today? It's OpenAI's GPT-4 with 1.6 trillion parameters, right? These models now you see with the Llama models, the very large models, the DeepSeek models, as these models are getting bigger, the quality of them are getting really, really good. And so I can give you a base open source model that is reasoning model, a very large, say, 6, 700 billion parameters and I'll let you fine tune that. Now, that fine-tuned version of it, it's a knowledge base that knows your business more than anybody else. Anybody else.

Dave Vellante

>> And that data's never going to seep into the open internet, the LLMs. So is the strategy to scale the SambaNova cloud or is it to use the SambaNova cloud as a test bed, bring it to sovereign cloud, or both?

Rodrigo Liang

>> Our view of it is we want to allow you to interact with us wherever your needs are. What we're seeing is where our fit is the most is when people start on a cloud just to understand it, but very quickly solve their needs for data privacy and concern, which is often on-prem or in a sovereign cloud.

>> All right, so I have to ask, this is a good thread. By the way, thanks for sharing. That's a new master class. I think you're perfect for the agent wave, just kind of been focused on that. But the question of workloads comes up, Rodrigo, so talk about that because now, I've got models.

Rodrigo Liang

>> Right.

>> I got my proprietary model, but now I want to test it against a workload. Now, I have to factor in security, governance, all those things, and my partner ecosystem that I want to plug into it. Now, we're in the enterprise IT kind of motion.

Rodrigo Liang

>> That's right.

>> I mean, quotes around that because it's more AI motion now, but what's your vision on that? How do you see that playing out? What are your customers doing? I mean, I can see them ingesting everything, all the documents and becoming a learning model for the enterprise, but now I've got to run a workload.

Rodrigo Liang

>> Right.

>> How do you play in that?

Rodrigo Liang

>> Well, I think what you're going to see is, well, two things. One, we partner with a lot of the application. For Cogent, for example, we have partners that come in and actually create a great work gen workflow, but behind it, it's calling our best models, right? And so what we do with the ecosystem, we sign on as many different users and partners as possible and we give them the best model. Their trust on us is the next model that shows up next month, SambaNova takes care of it and you have access to it. MAMA 5 or DeepSeek 2. Whatever comes up, if you want the best model and the fastest, we'll provide it underneath. Their application, their customer base doesn't need to know about it.

>> So you're basically taking the abstraction of owning the infrastructure layer. You're not focused, optimizing on that with the tooling with Studio to handle the models. And so if things like MC peak rises, you really leverage that.

Rodrigo Liang

>> That's right. We're all open source and so what we want to do is be able to give you an API interface and that API interface can be at the prop level or it can be down at the PyTorch level, but these are going to be open source tools that we leverage because our customers and our partners all want to use them.

>> But you're not dependent upon anything above you. You will leverage whatever's happening in the market that's getting momentum, whether it's MCP, which has been a great win this year for the industry.

Rodrigo Liang

>> That's right.

>> And better tooling that comes out from some cogeneration or some sort of cool thing.

Rodrigo Liang

>> Yeah, that's right. And you look at these agentic platforms that are coming too, right? You look at kind of these different open source platforms that have come that people are developing on. We want to lean into all of those things because it's what the world wants to use.

Dave Vellante

>> And as it pertains to the models, you said the best models, you are responsible for vetting those models, for making sure they're safe and, right? You do that work. How-

Rodrigo Liang

>> Well, so what we do here at SambaNova is give people option. And so one of the things that you'll see on SambaNova cloud is people always ask me, "Well, which are the most popular models that people want?" Well, the-

Dave Vellante

>> It depends on what data is....

Rodrigo Liang

>> well, yeah, it depends what data is. But SambaNova, we're agnostic. So if you look at SambaNova and people say, "There are 25 models up there. Why those?" It's survival of the fittest, right? We put the models up there based on what people want, but we want to see users use it. If people aren't using it, we are pulling that model down, putting the next one up because we want to make sure that our customers get full visibility, not from a guess, but empirically driven. This is what people are using.

Dave Vellante

>> So you want to give them as much choice for the ones that are being consumed.

Rodrigo Liang

>> That's right. That's right. And then as they see these are the popular models, what happens is the ecosystem behind them, people build applications using those models are also following. So to some extent, they get trust for the fact that what we are offering is really a representation of the market.

>> Yeah. Rodrigo, business is good, sounds like. Give us some updates on revenue, headcount, valuation, money raise. No, I mean, you can talk about revenue, but I mean, if you could share, share the momentum on the business front because I think this is at a tipping point now. People want to know who's got the tailwind and actually is performing. There's a lot of evaluation going on. You guys have a good solution. What are some of the stats on the business front and what's your goal for the year? What are you optimizing for? What's your focus?

Rodrigo Liang

>> Yeah. Well, we're definitely focused on building these inference clouds across the world. We're in 15 countries today and you see us deploying in some of the densest environments. So earlier this year, we did deploy the sovereign cloud in Tokyo with our partners in Softbank. And this is the beauty of it. We can actually deliver these racks to our partners. This is our partner's cloud, but we give them a lot of the infrastructure that's required to build a sovereign service. But here's what you get. As soon as our Japanese partners are actually deploying the cloud, the comfort level of Japanese enterprise to run that is significantly .

>> They're localized.

Rodrigo Liang

>> They're localized. It's in Japanese. It's somebody that they know. It's an enterprise that they've been around in the market for a long time. And then the thing that we did that's really, really important because our power efficiency is so much better than everybody else. We dropped it in downtown Tokyo. Downtown Tokyo. I don't need to go in the boonies looking for a gigawatt data center. I want to deploy inference cloud for mission-critical applications where your latency requirements are low, where you have your user bases. I want to deploy where that is and turns out, most of those data centers in very, very dense places are energy constrained, right? And so that becomes-

>> Your breakthrough is the energy piece, the chip, the suite, the cloud onboarding, ease of use, the whole package. That right?

Rodrigo Liang

>> Yeah, exactly. We can give your inference cloud in a very energy efficient environment wherever you want. And so now, your hybrid world anyway. You're going to have your hyperscalers and do those things in a hyperscale environment, but for the needs that you have, which is private, secure, and you want to own, we can give you a very efficient platform to run that.

Dave Vellante

>> And that example in Tokyo, essentially, you're white labeling SambaNova to your partner. Is that right?

Rodrigo Liang

>> Well, in that particular case, it's actually SambaNova Japan.

Dave Vellante

>> It is.

Rodrigo Liang

>> Yeah, SambaNova Japan, partner with SoftBank. But you've got some of the best Japanese enterprises. I mean, large enterprises coming on and using it and developing. I mean, it's just an amazing environment. Of course, we're already in the Middle East and we're building a really exciting project. We're just starting here in Texas and so several things that we're doing that are allowing people to get access very, very quickly.

Dave Vellante

>> Well, we should talk about theCUBE Club.

Rodrigo Liang

>> Yeah. Yeah, we should. We should.

Dave Vellante

>> Rodrigo, thank you for coming on theCUBE and again, congratulations. I love what you guys are doing. Again, around you, everyone else is innovating. We've heard some storage folks with high bandwidth memory and solid state. They're lowering the power envelope by creating closer to the chips. So you're starting to see everyone around you. So it's only going to be a matter of time before we have one in our home.

Rodrigo Liang

>> Yeah, yeah, absolutely. Super exciting time.

>> In the basement. Thanks for coming on. Yeah, there's power there. I'm John Furrier, Dave Vellante here in Palo Alto for our three days of Robotics and AI Leaders sharing the breakthroughs that are going to make the difference between success and failure in this new modern AI era. As the agents and the softwares and the infrastructure players are delivering the value, they're all here with us. Thanks for watching.