theCUBE + NYSE Wired: The AI Factory - Data Center of the Future | Charlie Boyle, NVIDIA

theCUBE.net

- Links
  
  theCUBE Network SiliconANGLE Media
info Help

Charlie Boyle, NVIDIA

In this segment from theCUBE + NYSE Wired’s “AI Factories – Data Centers of the Future” series, theCUBE’s Dave Vellante sits down with Rob Biederman, managing partner at Asymmetric Capital, to unpack a disciplined approach to early-stage investing amid AI-scale infrastructure shifts. Biederman explains Asymmetric’s founder-first model: writing $1–$10M checks (often via SAFEs), joining boards as they form and helping operators with go-to-market, operations, finance and strategy (not product/engineering). He shares why the firm avoided 2021’s lofty SaaS multiples in favor of backing proven builders earlier (single-digit pre-money), and highlights portfolio execution such as a cash-efficient LATAM e-commerce company scaling from ~$1-2M to about $50M in revenue. The discussion also explores Asymmetric’s subscale buy-and-build plays (e.g., pool cleaning in San Diego, sleep apnea clinics in Houston), where density, tech-enabled services and platform ops expand margins and enterprise value. Biederman weighs in on AI economics as enterprises race to “AI factories,” cautioning that not every AI workload creates ROI and that overbuilt compute assumptions could face a reckoning. He argues that winners will prove a clear 10× value equation and avoid scaling go-to-market before product-market fit. Additional insights include early liquidity discipline (returning $0.20 on the dollar before the fund’s third anniversary), portfolio survivability (34 of 35 companies still operating; three positive exits), and guidance to founders: make your value proposition relevant, credible and differentiated. Tune in for candid perspective on how capital efficiency, ownership discipline and anti-thematic sourcing intersect with a world where GPU-dense data centers and AI-scale software are reshaping enterprise infrastructure and economics.

Share this session

Clips
More from theCUBE + NYSE Wired: The AI Factory - Data Center of the Future

Charlie Boyle

VP, DGX

NVIDIA

play_circle_outline Vera Rubin platform and new rack designs enabling AI factories, lowering token delivery costs

play_circle_outline Storage reimagined: STX reference architecture with BlueField‑4, CX9, and Vera processors

play_circle_outline Offloading data movement to BlueField DPUs and using RDMA to bypass CPU bottlenecks

play_circle_outline NVIDIA's Gigawatt Data Center: Validating DSX and Max‑Q Dynamic Power Designs to Boost Power Utilization and Tokens per Watt

play_circle_outline Multi-Rack Architecture and Tokenized Monetization: Enabling Latency-Tiered Services, Instant Features, and Freemium Upgrades

Info

Charlie Boyle, NVIDIA

Charlie Boyle

VP, DGX NVIDIA

search

Dave Vellante

>> Hi, everybody. Welcome to theCUBE Special Coverage on the ground here at GTC San Jose 2026. My name is Dave Vellante. I'm here with Charlie Boyle, who is the vice president of DGX at NVIDIA. Big week, Charlie. Congratulations on all-

Charlie Boyle

>> Thanks, Dave....

Dave Vellante

>> all the announcements and the innovation. For people who weren't able to watch all the coverage, what's the most important takeaway that you want to leave people with from GTC?

Charlie Boyle

>> I think the biggest takeaway is the Vera Rubin platform, all the new racks that we've announced, all that working together in an AI factory to dramatically lower the cost of delivering tokens, because the more tokens that you have, the more business you can create, the more opportunities that you can have out there. And with this explosion of the Claw ecosystem, the new applications that you could only ever imagine, I can now build. And now with the power of Vera Rubin and Vera itself, I can now deliver those as a business to drive brand new sources of revenue that were never possible before.

Dave Vellante

>> I think, obviously, everybody's familiar with Moore's Law, but I don't think people appreciate the trajectory of the curve that we're on now. You're basically delivering orders of magnitude improvement and performance literally every year. Is that right? How do you do that? How should we be thinking about the importance of that going forward?

Charlie Boyle

>> It's totally accurate and just a testament to not only the great hardware designers that we have, but it's a ton of software work. And part of that, we said Vera Rubin is going to be 35x faster than Grace Blackwell. And we fully believe that. And even in the keynote we were talking about when Jensen put up 35x last year and everyone said, "Ah, you're never going to get that." Independent tests showed we were 50x. And part of that is because the software innovation, we get better every year, but that 35x allows customers, allows companies to do things they couldn't do before. If you can think it, you can build it. And now with OpenClaw, just you and I having an English conversation to a prompt can build an agent to get real work done for us. And that real work can turn into innovation, can turn into new products, can turn into new revenue for folks.

Dave Vellante

>> So we've gone from sort of density on transistors on a chip to extreme co-design is really the ... Is that the enabler that allows you to be on this 12-month cadence?

Charlie Boyle

>> It is. I mean, it's an absolutely enormous lift. And Jensen a few years ago said, "We're going to go on a one-year cadence." And everyone's like, "Are you sure?" We did it. We do it every year. We're going to keep doing it every year. And a big part of that is because the software is compatible. This is the 20-year anniversary of CUDA. This is the 10-year anniversary of DGX. Software that was running on that original DGX-1 back in 2016 runs on Rubin today. And it's that compatibility, that software layer that allows us to stay on that 12-month cycle because as a developer, you just intersect what system you have in that generation and you know it's going to be faster, it's going to drive down your cost, but your software is the same. Of course, you can take advantage of new software capabilities, but that allows you to say, "I'm going to invest in this platform. The stuff I did two years ago still works. The stuff I'm going to do two years from now is still going to work."

Dave Vellante

>> Why are you so excited and NVIDIA, generally, Jensen specifically about OpenClaw? Peter Steinberger, he was on Lex Fridman Podcast, three hours of going deep as to how he did. It's just absolutely remarkable. But why is that so exciting to you and what should people know about it?

Charlie Boyle

>> So all of us, we've all been in technology business for a lot of years. I'm sure a lot of your audiences too, every single person I'm sure of viewing this has had an idea. I wish I had an application that did X. I wish I had something that did Y. But most of us as business leaders, even ones that have software developers working for us, you look at that and go, "Oh, I'm not going to bother them. It's going to take a while." But now with a thought, I could say, "Oh, I'm going to ask the Claw to do something for me, build me something." And that may change the way I work. Now, I may ask it to build me or report something. We build in ship systems all over the world, analyze a bunch of this data for me. But then I might show that to my colleagues and they say, "Well, hey, everyone in sales and marketing needs that, so now we need to scale it." So that one little thing that I came up with, I still need software developers because now I got to optimize that. I got to drive the cost down. I got to drive the token economics up on that. And it just makes it possible for ideas that you thought of and through awakes, you're like, "I don't know if it's going to work." Now, I can try that idea and that one idea could change the direction of your business.

Dave Vellante

>> NVIDIA's always kind of been a household name, Charlie, because as Jensen said, we as parents funded our kids to get fast GPUs for their gaming. But take us back 20 years to the launch of DGX. When you first started with the company, what's that journey been like and help the audience understand where you've come from and where you're going, how you got here.

Charlie Boyle

>> Yes. I'm not that old. It's the 20th anniversary of CUDA, 10-year anniversary of DGX.

Dave Vellante

>> Yeah, sorry.

Charlie Boyle

>> No, it's all got a lot of numbers, a lot of memories. In this very San Jose convention center, 10 years ago, we showed the world the first DGX-1 behind velvet ropes, we don't want anyone to touch it. But back then 10 years ago, and it's funny, the number one customer question is, "What could I possibly need eight GPUs for?" We listen to that now, that's hysterical. You don't want to hire a software developer unless it's like, "Hey, I got to have 100 GPUs." But the vision then and the reason we built it then is still the reason that we're building DGX now and our systems is to take the latest NVIDIA technology, build a vertically integrated stack on that, make that easy to use for customers and share that with all of our partners as a reference design so they can expand and scale with us out there. And it's true today, back then it took our partners a year or more to get to market. Now, every single partner that we have is time to market. Same day that we're shipping software, they're shipping systems. And that's what Jensen was talking about. All of our partners are building gigawatts worth of infrastructure rolling off their factory lines every single month.

Dave Vellante

>> What's the enabler there? You're very obviously transparent about your roadmap. You talk about how you're vertically integrated, but also horizontally open. Explain that.

Charlie Boyle

>> So part of that vertically integrated piece is you have to build it all together so that you know it works. One of the things that you never see behind the scenes, before we ship one Vera Rubin rack, I've invested, Jensen has invested billions of dollars for us to build our own clusters internally for all of our engineers. They beat on that, they develop software on that. So that day one, when we ship that first production rack, when Dell ships that first production rack when Supermicro does, we know it works. And that extreme co-design, all the engineers having access to all that infrastructure, building testing at scale with networking, with fabric, with storage, means that when we tell you we can get you an AI factory in three weeks, in a month, it's because we've already built it. We know the design, we know the recipe, then it's just humans putting things in data centers at that point and running tests.

Dave Vellante

>> Mellanox may turn out to be one of the greatest, if not the greatest acquisition in the history of .

Charlie Boyle

>> Jensen feels that way. Yes. I feel that way too.

Dave Vellante

>> And I've seen many of them. I won't bore you with my top 10, but I think I would put Mellanox right up there. Can you explain the importance of that acquisition generally, but specifically the fabric and how you've evolved the fabric?

Charlie Boyle

>> And so fabric is so important. And funny enough, the original DGX-1 had Mellanox in them. The thing we shipped in 2016 had four InfiniBand cards and then up to eight and everything. Back in 2016, 2017, nobody was clustering anything. But 2018, 2019, we started having the first SuperPODs. We always believed in fabric, because we always want any application, any resource you have to be able to get to anything else in your AI factory. And that's the power of fabric. The thing is, almost every developer that's developing on an NVIDIA GPU today, whether it's on-prem or in the cloud, is using a fantastically powerful fabric that they never see because it's the NVIDIA software, it's CUDA, it's NCCL, it's all of those things that optimize and take advantage of the fabric, but you never see it. And that's the best fabric out there is it's giving you the power that you need, but it gets out of your way to let you do the work that you want. And that's why Mellanox has been great. We recently, not only do we have InfiniBand, but we have Spectrum-X so that we can scale out an enormous AI Ethernet data center, and that's what next generation fabrics are getting built on.

Dave Vellante

>> So 2023, 2024, we kind of went from request retrieve chatbots, then reasoning came in, which really kind of drove demand. And now, we're going into the agentic age, multiple agents taking action. What does that mean from an infrastructure standpoint? What do you have to do to design, to serve that? And then what does it mean ultimately for the end customers?

Charlie Boyle

>> Yeah. I've been thinking a lot about what analogies are like these new types of agents. I just got done building a house a few years ago and that first request, that first idea that you had, well, that kind of creates your general contractor agent, but then that agent has to spawn sub-agent, specialty agents. They all need access to memory. They all need storage. They all need a sandbox when we talked a lot about safety. On that same AI factory, you may be running thousands of agents, but you want to know that your agent is running securely, that my agent's running securely, and then unless we explicitly agree that we're going to talk to each other, that all of our data is separate. Now, that used to be humans. Humans used to create data, we create files, save files, but now agents are creating that data quickly, creating quickly, storing, and then also quickly destroying. So not only do we need the agent, the general contractor agent, all the sub-agents, all the security, but we need the storage, the data pass on that to support that machine, speed, creation, ingestion, monitoring and destruction of data.

Dave Vellante

>> So you guys nailed the compute, obviously. The InfiniBand piece of it, Spectrum-X, you've got scale up, you've got scale out, you've got scale across. Now, you've taken another step and gone deeper into the stack with storage. You're sort of re-imagining storage, re-imagining the storage hierarchy. You've got a new reference architecture. You're not competing with the storage industry. You're not selling storage per se. You're enabling the industry to think differently about how storage should work in an accelerated computing environment. Explain what you've done, and I'd like to ask you a few questions about what the implications are for the future.

Charlie Boyle

>> Sure. And I think you hit some of the highlights there. As we think about the agentic workflow, all the storage partners that you saw supporting the release, like I started talking to them about STX kind of end of last year. They're all in, because this doesn't replace anything we're doing in system of record. It doesn't replace high performance storage, it doesn't replace object storage. It's what does an agent need to do with storage? Now, part of that is, just like many things, Jensen loves to invest in something that he thinks is going to be big. And so that's why actually I'm the pick of that program for NVIDIA. That's part of the reason we're talking. But as he thought about that, he said, "We need to build one reference. We need to have something that is super powerful on the network fabric. That's your BlueField-4, that's your CX9s and storage needs an AI capable processor and that's Vera." And so in that storage box, not only did you have drives, not only do you have CX9s and BlueField, but you have Vera. And that enables that agentic workflow, because it's the same Vera, same agents, but now we're moving a lot of that processing a lot closer to the physical data. And so that's why all of our storage partners are super excited about it because now they can take their stack and put it on that brand new reference architecture. They didn't have to do the R&D on it. We're doing it. All the manufacturing partners are building it, but now they have a new way, a new business opportunity and a new way to help their customers with agentic workflow. Because as we said, agent's Claw, it's all about security. All of our storage partners have been securing access controls. All that information already exists in their storage systems. Now, I can plumb that in through NemoClaw, through OpenShell, to the new STX storage with their software running on top of it. And now, as a developer, I don't have to go re-ask for all the permissions that IT has already given me over the last 10 years. It's encoded in the storage and that's why we're super excited to tie storage so closely to this new agentic workflow.

Dave Vellante

>> So with the trajectory that you're on with GPUs, networking became the bottleneck. You addressed that. Storage then became the bottleneck. You kind of reimagine that. You got BlueField-4 DPUs allow you to offload data movement tasks, which normally was the CPU. So that created a historical bottleneck. You've got Spectrum-X, Ethernet, you've got your SuperNICs, and you're now being able to bypass bottlenecks and go right to RDMA. You've got kind of a reference architecture for how you scale KV cache at rack scale. All of that is new. And as you say, the ecosystem is sort of leaning into it. What does that allow us to do that we wouldn't be able to do previously?

Charlie Boyle

>> A big part of it, and the big jump even between Blackwell and this generation, Blackwell, when we thought about an AI factory, it's just how many Blackwell racks you have. This generation, we've created a number of new rack architecture. We talked about the Vera rack, because you need a lot more CPU for your agentic worker. We talked about the storage rack. We talked about the networking. What Vera Rubin and that whole multi-rack architecture allows us to do is to really drive down that cost of token economics. And one of the things that Jensen talked about in the keynote, but maybe not everyone heard, you talked about dynamic power and Max-Q and maybe those are a lot of new concepts. The simple way to think about that in a data center today is almost every data center, and we've been in the IT business for a long time, you provision power based on what the server said it took, and then you add a little factor to that. But in a giga scale data factory, you're wasting so much power. On average, today's data center, if you want a gigawatt of provision power, you're probably only using 600 megawatts. You're only using 60% of the power. So one of the things we talked about, it showed up as a beautiful simulation on screen, but it's actual physical thing. The DSX data center design, we are going to be able with Vera Rubin with all those new racks with hardware controls, software controls built into that, that you can, as a data center, as a customer, set a power limit and we're going to stay within and hit that power limit. Allowing you to use every single watt you pay for to generate tokens. Because when you go in a co-location environment, you go in a data center environment, you ask for a hundred megawatts, whether you use it or not, you pay for it. With this new technology and this new rack architecture, we're making it so much more efficient, but also bringing AI into that infrastructure. So it's not humans and phone calls to turn knobs. It's agent turning those knobs, making it 100% utilized in that data center, but doing it safely. And that's why we're operating our own gigawatt data center to prove to the world it works and to show our customers how to build that design.

Dave Vellante

>> So you're saying the grid is over-provisioned and you're able to now provide an SLA and that's going to give much better power utilization.

Charlie Boyle

>> Yeah. I mean, that's classic electrical grid. You always have safety margins. You talk to any data center provider and you're like, "Tell them you want to run 100%." They'll look at it, and it's scary. But that's what we've got to change, because we've got to take wasted power out. Everything is power limited. My house is power limited. The data center is power limited. My data center being power limited and I'm paying all that cost, I want to use every watt that I can. And it's only possible with the Vera Rubin architecture because we put controls all the way down to the chip level that can react faster than humans possibly can. Data in the loop, processing automatically, all the way up to AI, all the way up to human in the loop things, but it takes care of all of that for you. And that also drives ... You think about it, I can now get at the same power payment, 40% more GPUs in a data center. How many more tokens can I get out? How much more does that lower my token economics? It's just fantastic.

Dave Vellante

>> Okay. So you're optimizing obviously for power. What other kind of constraints are you optimizing for? Where are we at with GPU utilization? Where are we with network utilization? Other things that we should be thinking about that you're working on?

Charlie Boyle

>> I mean, the biggest thing, I mean, Jensen talked about it, an underutilized GPU is a waste. And so that's why we've added the Vera rack in the environment, because some tasks, you got to wait for the CPU. Well, I'm just going to offload those so that the GPU can be running at 100% of the time. I can pack more GPUs in the rack as possible from ever before. And one of the things we talked about in that DSX architecture, Max-Q. You run the GPUs a little slower for most of your workload, which gives you the ability to run 30%, 40% more GPUs in that same power envelope, but a given workload isn't always peaking your GPU power. So I can actually boost the number of GPUs that I'm running because I'm aware of what the application's actually using. I'm never hitting the peak of GPU power, so I'm going to run more GPUs to get more workload done, optimizing the entire AI factory.

Dave Vellante

>> One of the slides Jensen showed, which he said in our private meeting was the most important slide. And he kind of joked. He said, "When the audience gets quiet, that's when I'm telling you my secrets." And they want to hear, "Oh, tell me about the next GPU." And rather he wants to talk about this sort of new model. So I'd like to sort of describe that. So he showed in the vertical axis, it was throughput. On the horizontal axis, I guess you'd call it latency or experience or responsiveness, I guess is the better way to think about it. And then he showed Hopper, Blackwell and Rubin, Pareto curves for each, dramatic improvements in 35x. Historically, that vertical axis was the spend more, save more, or spend more, make more when you're training models. Now, we're entering the inference era and a new business model is emerging and so on. That horizontal axis, he has a freemium model, sort of a consumer for pay model. And then to the right, a much more intense, maybe it's for coding, it's for guys like Anthropic and OpenAI and others that are really able to monetize at scale and you're supporting that. That is actually a new revenue dimension that's coming up. I wonder if you could explain that in some detail.

Charlie Boyle

>> Yeah. And brand new business opportunity there. And part of what makes that possible, you saw in the keynote, you saw the curve, a rack we didn't talk about, our LPX rack, the great technology that we got from Groq all working together, because at that very end of the scale where I need tremendous throughput and extremely low latency, it's a more expensive operation to run. But the value that you get out of that, whether that's creating new software to make a quarter end for your customers, whether it's solving your CEO's problem, whether it's making sure everything absolutely ships. There's some tasks that we do that you're like, "I don't care how much it costs. I just want to get it done." And that notion, now that in the same AI factory, you can have both. I can serve people at a free tier and I can go all the way to the other end of the tier with the Vera Rubin plus the LPX rack to solve the hardest coding problems in real time. That opens up new business opportunities that was never before. One of the big messages at GTC is it's not only new technology, it's not only about token economics, it's there's new business things that you could only think of before that weren't possible that now with this multi-rack architecture, now with LPX at the far end of that, I can do things that I can only dream of that are now economical to do.

Dave Vellante

>> One of the things that we talk about at theCUBE and theCUBE Research is at the macro level, if you look at the percent of organization's revenue that is spent on technology, it's about 4%. And we've made the prediction that, that's going to double or even triple over the next 10 years. And as you basically scale with less labor, you're going to be tapping tokens, intelligence through tokens, through APIs, and you're going to happily spend more money on that. And so a lot of the things that you guys are talking about really kind of resonate with us in that regard.

Charlie Boyle

>> And that's the thing, and Jensen talked, I can't remember which forum he talked about in, but everyone used to talk about CapEx and as a CEO, as a CFO, CIO, that was a bad thing. But now, it's a good thing. I know if I'm spending more money and I'm getting great utilization out of it, every dollar that I spend, my revenue goes up, but it used to be every dollar I spent on CapEx like, "Oh, I got to do a refresh. It's classic IT." It's no longer that we've broken that model. And so people want to spend more. They want to spend more. They want to hire more agents. They want to get more tokens, because they know when they do that, their business is going to change. Their customer experience is going to change. They're going to grow more customers. So of course you want to invest. I'm only limited by space, power, shell, availability. When you break down those things, once you have that, every dollar that I invest, I'm getting way more than a dollar back. So of course, you want your CapEx to grow up your infrastructure investment to go up, and I see no end insight on that one.

Dave Vellante

>> What Jensen said that really, again, resonated with us is that every CEO on the planet needs to understand where they fit on that Pareto curve. Are you monetizing with throughput? Are you monetizing with customer experience and responsiveness, or both?

Charlie Boyle

>> Yeah.

Dave Vellante

>> Because that is going to be the future of your revenue and you're going to happily spend to access that intelligence. That is your future business model.

Charlie Boyle

>> Yeah. I mean, and you even look at software and software, creating software in this as a business, how valuable is it that if you're asking your software provider ... We've all asked this before like, "Hey, could you do this feature for me?" You normally heard like, "Six months, it's in the next release." But if you could say, "Well, if you write me a check for this much money, you can have it tomorrow, because I'm using the supercharge agent to do that." That's brand new revenue that didn't exist before. That's just one idea of millions of ideas out there of how today's companies can transform and create great new streams of revenue out of existing products they have.

Dave Vellante

>> We said in 2024 that GTC 2024 was the most important conference in the history of the computer industry. I think you keep upping your game every year, 2026.

Charlie Boyle

>> Yeah, every year.

Dave Vellante

>> How do you top this, Charlie?

Charlie Boyle

>> It gets bigger and better. We take over the city. Every session I go in is packed, everyone's so pumped up. Because not only are we bringing out new technology, but I think everyone is seeing the explosion of how they can do something with AI. Every person I've talked to, every customer is telling me what they did last year and how excited they are with all this new technology of how much more they're going to be able to do now and try new things and really grow their business and grow their teams.

Dave Vellante

>> Well, Charlie, thanks for spending some time with us. We at theCUBE are pumped up. We started on Sunday in our Palo Alto Studio with about six panels on the future of the AI factory. We had an evening event hosted by John Furrier and Brian Baumann. Thank you for watching. This is Dave Vellante for the entire CUBE team from GTC 2026 in San Jose.

Charlie Boyle, NVIDIA

search

Dave Vellante

>> Hi, everybody. Welcome to theCUBE Special Coverage on the ground here at GTC San Jose 2026. My name is Dave Vellante. I'm here with Charlie Boyle, who is the vice president of DGX at NVIDIA. Big week, Charlie. Congratulations on all-

Charlie Boyle

>> Thanks, Dave....

Dave Vellante

>> all the announcements and the innovation. For people who weren't able to watch all the coverage, what's the most important takeaway that you want to leave people with from GTC?

Charlie Boyle

>> I think the biggest takeaway is the Vera Rubin platform, all the new racks that we've announced, all that working together in an AI factory to dramatically lower the cost of delivering tokens, because the more tokens that you have, the more business you can create, the more opportunities that you can have out there. And with this explosion of the Claw ecosystem, the new applications that you could only ever imagine, I can now build. And now with the power of Vera Rubin and Vera itself, I can now deliver those as a business to drive brand new sources of revenue that were never possible before.

Dave Vellante

>> I think, obviously, everybody's familiar with Moore's Law, but I don't think people appreciate the trajectory of the curve that we're on now. You're basically delivering orders of magnitude improvement and performance literally every year. Is that right? How do you do that? How should we be thinking about the importance of that going forward?

Charlie Boyle

>> It's totally accurate and just a testament to not only the great hardware designers that we have, but it's a ton of software work. And part of that, we said Vera Rubin is going to be 35x faster than Grace Blackwell. And we fully believe that. And even in the keynote we were talking about when Jensen put up 35x last year and everyone said, "Ah, you're never going to get that." Independent tests showed we were 50x. And part of that is because the software innovation, we get better every year, but that 35x allows customers, allows companies to do things they couldn't do before. If you can think it, you can build it. And now with OpenClaw, just you and I having an English conversation to a prompt can build an agent to get real work done for us. And that real work can turn into innovation, can turn into new products, can turn into new revenue for folks.

Dave Vellante

>> So we've gone from sort of density on transistors on a chip to extreme co-design is really the ... Is that the enabler that allows you to be on this 12-month cadence?

Charlie Boyle

>> It is. I mean, it's an absolutely enormous lift. And Jensen a few years ago said, "We're going to go on a one-year cadence." And everyone's like, "Are you sure?" We did it. We do it every year. We're going to keep doing it every year. And a big part of that is because the software is compatible. This is the 20-year anniversary of CUDA. This is the 10-year anniversary of DGX. Software that was running on that original DGX-1 back in 2016 runs on Rubin today. And it's that compatibility, that software layer that allows us to stay on that 12-month cycle because as a developer, you just intersect what system you have in that generation and you know it's going to be faster, it's going to drive down your cost, but your software is the same. Of course, you can take advantage of new software capabilities, but that allows you to say, "I'm going to invest in this platform. The stuff I did two years ago still works. The stuff I'm going to do two years from now is still going to work."

Dave Vellante

>> Why are you so excited and NVIDIA, generally, Jensen specifically about OpenClaw? Peter Steinberger, he was on Lex Fridman Podcast, three hours of going deep as to how he did. It's just absolutely remarkable. But why is that so exciting to you and what should people know about it?

Charlie Boyle

>> So all of us, we've all been in technology business for a lot of years. I'm sure a lot of your audiences too, every single person I'm sure of viewing this has had an idea. I wish I had an application that did X. I wish I had something that did Y. But most of us as business leaders, even ones that have software developers working for us, you look at that and go, "Oh, I'm not going to bother them. It's going to take a while." But now with a thought, I could say, "Oh, I'm going to ask the Claw to do something for me, build me something." And that may change the way I work. Now, I may ask it to build me or report something. We build in ship systems all over the world, analyze a bunch of this data for me. But then I might show that to my colleagues and they say, "Well, hey, everyone in sales and marketing needs that, so now we need to scale it." So that one little thing that I came up with, I still need software developers because now I got to optimize that. I got to drive the cost down. I got to drive the token economics up on that. And it just makes it possible for ideas that you thought of and through awakes, you're like, "I don't know if it's going to work." Now, I can try that idea and that one idea could change the direction of your business.

Dave Vellante

>> NVIDIA's always kind of been a household name, Charlie, because as Jensen said, we as parents funded our kids to get fast GPUs for their gaming. But take us back 20 years to the launch of DGX. When you first started with the company, what's that journey been like and help the audience understand where you've come from and where you're going, how you got here.

Charlie Boyle

>> Yes. I'm not that old. It's the 20th anniversary of CUDA, 10-year anniversary of DGX.

Dave Vellante

>> Yeah, sorry.

Charlie Boyle

>> No, it's all got a lot of numbers, a lot of memories. In this very San Jose convention center, 10 years ago, we showed the world the first DGX-1 behind velvet ropes, we don't want anyone to touch it. But back then 10 years ago, and it's funny, the number one customer question is, "What could I possibly need eight GPUs for?" We listen to that now, that's hysterical. You don't want to hire a software developer unless it's like, "Hey, I got to have 100 GPUs." But the vision then and the reason we built it then is still the reason that we're building DGX now and our systems is to take the latest NVIDIA technology, build a vertically integrated stack on that, make that easy to use for customers and share that with all of our partners as a reference design so they can expand and scale with us out there. And it's true today, back then it took our partners a year or more to get to market. Now, every single partner that we have is time to market. Same day that we're shipping software, they're shipping systems. And that's what Jensen was talking about. All of our partners are building gigawatts worth of infrastructure rolling off their factory lines every single month.

Dave Vellante

>> What's the enabler there? You're very obviously transparent about your roadmap. You talk about how you're vertically integrated, but also horizontally open. Explain that.

Charlie Boyle

>> So part of that vertically integrated piece is you have to build it all together so that you know it works. One of the things that you never see behind the scenes, before we ship one Vera Rubin rack, I've invested, Jensen has invested billions of dollars for us to build our own clusters internally for all of our engineers. They beat on that, they develop software on that. So that day one, when we ship that first production rack, when Dell ships that first production rack when Supermicro does, we know it works. And that extreme co-design, all the engineers having access to all that infrastructure, building testing at scale with networking, with fabric, with storage, means that when we tell you we can get you an AI factory in three weeks, in a month, it's because we've already built it. We know the design, we know the recipe, then it's just humans putting things in data centers at that point and running tests.

Dave Vellante

>> Mellanox may turn out to be one of the greatest, if not the greatest acquisition in the history of .

Charlie Boyle

>> Jensen feels that way. Yes. I feel that way too.

Dave Vellante

>> And I've seen many of them. I won't bore you with my top 10, but I think I would put Mellanox right up there. Can you explain the importance of that acquisition generally, but specifically the fabric and how you've evolved the fabric?

Charlie Boyle

>> And so fabric is so important. And funny enough, the original DGX-1 had Mellanox in them. The thing we shipped in 2016 had four InfiniBand cards and then up to eight and everything. Back in 2016, 2017, nobody was clustering anything. But 2018, 2019, we started having the first SuperPODs. We always believed in fabric, because we always want any application, any resource you have to be able to get to anything else in your AI factory. And that's the power of fabric. The thing is, almost every developer that's developing on an NVIDIA GPU today, whether it's on-prem or in the cloud, is using a fantastically powerful fabric that they never see because it's the NVIDIA software, it's CUDA, it's NCCL, it's all of those things that optimize and take advantage of the fabric, but you never see it. And that's the best fabric out there is it's giving you the power that you need, but it gets out of your way to let you do the work that you want. And that's why Mellanox has been great. We recently, not only do we have InfiniBand, but we have Spectrum-X so that we can scale out an enormous AI Ethernet data center, and that's what next generation fabrics are getting built on.

Dave Vellante

>> So 2023, 2024, we kind of went from request retrieve chatbots, then reasoning came in, which really kind of drove demand. And now, we're going into the agentic age, multiple agents taking action. What does that mean from an infrastructure standpoint? What do you have to do to design, to serve that? And then what does it mean ultimately for the end customers?

Charlie Boyle

>> Yeah. I've been thinking a lot about what analogies are like these new types of agents. I just got done building a house a few years ago and that first request, that first idea that you had, well, that kind of creates your general contractor agent, but then that agent has to spawn sub-agent, specialty agents. They all need access to memory. They all need storage. They all need a sandbox when we talked a lot about safety. On that same AI factory, you may be running thousands of agents, but you want to know that your agent is running securely, that my agent's running securely, and then unless we explicitly agree that we're going to talk to each other, that all of our data is separate. Now, that used to be humans. Humans used to create data, we create files, save files, but now agents are creating that data quickly, creating quickly, storing, and then also quickly destroying. So not only do we need the agent, the general contractor agent, all the sub-agents, all the security, but we need the storage, the data pass on that to support that machine, speed, creation, ingestion, monitoring and destruction of data.

Dave Vellante

>> So you guys nailed the compute, obviously. The InfiniBand piece of it, Spectrum-X, you've got scale up, you've got scale out, you've got scale across. Now, you've taken another step and gone deeper into the stack with storage. You're sort of re-imagining storage, re-imagining the storage hierarchy. You've got a new reference architecture. You're not competing with the storage industry. You're not selling storage per se. You're enabling the industry to think differently about how storage should work in an accelerated computing environment. Explain what you've done, and I'd like to ask you a few questions about what the implications are for the future.

Charlie Boyle

>> Sure. And I think you hit some of the highlights there. As we think about the agentic workflow, all the storage partners that you saw supporting the release, like I started talking to them about STX kind of end of last year. They're all in, because this doesn't replace anything we're doing in system of record. It doesn't replace high performance storage, it doesn't replace object storage. It's what does an agent need to do with storage? Now, part of that is, just like many things, Jensen loves to invest in something that he thinks is going to be big. And so that's why actually I'm the pick of that program for NVIDIA. That's part of the reason we're talking. But as he thought about that, he said, "We need to build one reference. We need to have something that is super powerful on the network fabric. That's your BlueField-4, that's your CX9s and storage needs an AI capable processor and that's Vera." And so in that storage box, not only did you have drives, not only do you have CX9s and BlueField, but you have Vera. And that enables that agentic workflow, because it's the same Vera, same agents, but now we're moving a lot of that processing a lot closer to the physical data. And so that's why all of our storage partners are super excited about it because now they can take their stack and put it on that brand new reference architecture. They didn't have to do the R&D on it. We're doing it. All the manufacturing partners are building it, but now they have a new way, a new business opportunity and a new way to help their customers with agentic workflow. Because as we said, agent's Claw, it's all about security. All of our storage partners have been securing access controls. All that information already exists in their storage systems. Now, I can plumb that in through NemoClaw, through OpenShell, to the new STX storage with their software running on top of it. And now, as a developer, I don't have to go re-ask for all the permissions that IT has already given me over the last 10 years. It's encoded in the storage and that's why we're super excited to tie storage so closely to this new agentic workflow.

Dave Vellante

>> So with the trajectory that you're on with GPUs, networking became the bottleneck. You addressed that. Storage then became the bottleneck. You kind of reimagine that. You got BlueField-4 DPUs allow you to offload data movement tasks, which normally was the CPU. So that created a historical bottleneck. You've got Spectrum-X, Ethernet, you've got your SuperNICs, and you're now being able to bypass bottlenecks and go right to RDMA. You've got kind of a reference architecture for how you scale KV cache at rack scale. All of that is new. And as you say, the ecosystem is sort of leaning into it. What does that allow us to do that we wouldn't be able to do previously?

Charlie Boyle

>> A big part of it, and the big jump even between Blackwell and this generation, Blackwell, when we thought about an AI factory, it's just how many Blackwell racks you have. This generation, we've created a number of new rack architecture. We talked about the Vera rack, because you need a lot more CPU for your agentic worker. We talked about the storage rack. We talked about the networking. What Vera Rubin and that whole multi-rack architecture allows us to do is to really drive down that cost of token economics. And one of the things that Jensen talked about in the keynote, but maybe not everyone heard, you talked about dynamic power and Max-Q and maybe those are a lot of new concepts. The simple way to think about that in a data center today is almost every data center, and we've been in the IT business for a long time, you provision power based on what the server said it took, and then you add a little factor to that. But in a giga scale data factory, you're wasting so much power. On average, today's data center, if you want a gigawatt of provision power, you're probably only using 600 megawatts. You're only using 60% of the power. So one of the things we talked about, it showed up as a beautiful simulation on screen, but it's actual physical thing. The DSX data center design, we are going to be able with Vera Rubin with all those new racks with hardware controls, software controls built into that, that you can, as a data center, as a customer, set a power limit and we're going to stay within and hit that power limit. Allowing you to use every single watt you pay for to generate tokens. Because when you go in a co-location environment, you go in a data center environment, you ask for a hundred megawatts, whether you use it or not, you pay for it. With this new technology and this new rack architecture, we're making it so much more efficient, but also bringing AI into that infrastructure. So it's not humans and phone calls to turn knobs. It's agent turning those knobs, making it 100% utilized in that data center, but doing it safely. And that's why we're operating our own gigawatt data center to prove to the world it works and to show our customers how to build that design.

Dave Vellante

>> So you're saying the grid is over-provisioned and you're able to now provide an SLA and that's going to give much better power utilization.

Charlie Boyle

>> Yeah. I mean, that's classic electrical grid. You always have safety margins. You talk to any data center provider and you're like, "Tell them you want to run 100%." They'll look at it, and it's scary. But that's what we've got to change, because we've got to take wasted power out. Everything is power limited. My house is power limited. The data center is power limited. My data center being power limited and I'm paying all that cost, I want to use every watt that I can. And it's only possible with the Vera Rubin architecture because we put controls all the way down to the chip level that can react faster than humans possibly can. Data in the loop, processing automatically, all the way up to AI, all the way up to human in the loop things, but it takes care of all of that for you. And that also drives ... You think about it, I can now get at the same power payment, 40% more GPUs in a data center. How many more tokens can I get out? How much more does that lower my token economics? It's just fantastic.

Dave Vellante

>> Okay. So you're optimizing obviously for power. What other kind of constraints are you optimizing for? Where are we at with GPU utilization? Where are we with network utilization? Other things that we should be thinking about that you're working on?

Charlie Boyle

>> I mean, the biggest thing, I mean, Jensen talked about it, an underutilized GPU is a waste. And so that's why we've added the Vera rack in the environment, because some tasks, you got to wait for the CPU. Well, I'm just going to offload those so that the GPU can be running at 100% of the time. I can pack more GPUs in the rack as possible from ever before. And one of the things we talked about in that DSX architecture, Max-Q. You run the GPUs a little slower for most of your workload, which gives you the ability to run 30%, 40% more GPUs in that same power envelope, but a given workload isn't always peaking your GPU power. So I can actually boost the number of GPUs that I'm running because I'm aware of what the application's actually using. I'm never hitting the peak of GPU power, so I'm going to run more GPUs to get more workload done, optimizing the entire AI factory.

Dave Vellante

>> One of the slides Jensen showed, which he said in our private meeting was the most important slide. And he kind of joked. He said, "When the audience gets quiet, that's when I'm telling you my secrets." And they want to hear, "Oh, tell me about the next GPU." And rather he wants to talk about this sort of new model. So I'd like to sort of describe that. So he showed in the vertical axis, it was throughput. On the horizontal axis, I guess you'd call it latency or experience or responsiveness, I guess is the better way to think about it. And then he showed Hopper, Blackwell and Rubin, Pareto curves for each, dramatic improvements in 35x. Historically, that vertical axis was the spend more, save more, or spend more, make more when you're training models. Now, we're entering the inference era and a new business model is emerging and so on. That horizontal axis, he has a freemium model, sort of a consumer for pay model. And then to the right, a much more intense, maybe it's for coding, it's for guys like Anthropic and OpenAI and others that are really able to monetize at scale and you're supporting that. That is actually a new revenue dimension that's coming up. I wonder if you could explain that in some detail.

Charlie Boyle

>> Yeah. And brand new business opportunity there. And part of what makes that possible, you saw in the keynote, you saw the curve, a rack we didn't talk about, our LPX rack, the great technology that we got from Groq all working together, because at that very end of the scale where I need tremendous throughput and extremely low latency, it's a more expensive operation to run. But the value that you get out of that, whether that's creating new software to make a quarter end for your customers, whether it's solving your CEO's problem, whether it's making sure everything absolutely ships. There's some tasks that we do that you're like, "I don't care how much it costs. I just want to get it done." And that notion, now that in the same AI factory, you can have both. I can serve people at a free tier and I can go all the way to the other end of the tier with the Vera Rubin plus the LPX rack to solve the hardest coding problems in real time. That opens up new business opportunities that was never before. One of the big messages at GTC is it's not only new technology, it's not only about token economics, it's there's new business things that you could only think of before that weren't possible that now with this multi-rack architecture, now with LPX at the far end of that, I can do things that I can only dream of that are now economical to do.

Dave Vellante

>> One of the things that we talk about at theCUBE and theCUBE Research is at the macro level, if you look at the percent of organization's revenue that is spent on technology, it's about 4%. And we've made the prediction that, that's going to double or even triple over the next 10 years. And as you basically scale with less labor, you're going to be tapping tokens, intelligence through tokens, through APIs, and you're going to happily spend more money on that. And so a lot of the things that you guys are talking about really kind of resonate with us in that regard.

Charlie Boyle

>> And that's the thing, and Jensen talked, I can't remember which forum he talked about in, but everyone used to talk about CapEx and as a CEO, as a CFO, CIO, that was a bad thing. But now, it's a good thing. I know if I'm spending more money and I'm getting great utilization out of it, every dollar that I spend, my revenue goes up, but it used to be every dollar I spent on CapEx like, "Oh, I got to do a refresh. It's classic IT." It's no longer that we've broken that model. And so people want to spend more. They want to spend more. They want to hire more agents. They want to get more tokens, because they know when they do that, their business is going to change. Their customer experience is going to change. They're going to grow more customers. So of course you want to invest. I'm only limited by space, power, shell, availability. When you break down those things, once you have that, every dollar that I invest, I'm getting way more than a dollar back. So of course, you want your CapEx to grow up your infrastructure investment to go up, and I see no end insight on that one.

Dave Vellante

>> What Jensen said that really, again, resonated with us is that every CEO on the planet needs to understand where they fit on that Pareto curve. Are you monetizing with throughput? Are you monetizing with customer experience and responsiveness, or both?

Charlie Boyle

>> Yeah.

Dave Vellante

>> Because that is going to be the future of your revenue and you're going to happily spend to access that intelligence. That is your future business model.

Charlie Boyle

>> Yeah. I mean, and you even look at software and software, creating software in this as a business, how valuable is it that if you're asking your software provider ... We've all asked this before like, "Hey, could you do this feature for me?" You normally heard like, "Six months, it's in the next release." But if you could say, "Well, if you write me a check for this much money, you can have it tomorrow, because I'm using the supercharge agent to do that." That's brand new revenue that didn't exist before. That's just one idea of millions of ideas out there of how today's companies can transform and create great new streams of revenue out of existing products they have.

Dave Vellante

>> We said in 2024 that GTC 2024 was the most important conference in the history of the computer industry. I think you keep upping your game every year, 2026.

Charlie Boyle

>> Yeah, every year.

Dave Vellante

>> How do you top this, Charlie?

Charlie Boyle

>> It gets bigger and better. We take over the city. Every session I go in is packed, everyone's so pumped up. Because not only are we bringing out new technology, but I think everyone is seeing the explosion of how they can do something with AI. Every person I've talked to, every customer is telling me what they did last year and how excited they are with all this new technology of how much more they're going to be able to do now and try new things and really grow their business and grow their teams.

Dave Vellante

>> Well, Charlie, thanks for spending some time with us. We at theCUBE are pumped up. We started on Sunday in our Palo Alto Studio with about six panels on the future of the AI factory. We had an evening event hosted by John Furrier and Brian Baumann. Thank you for watching. This is Dave Vellante for the entire CUBE team from GTC 2026 in San Jose.