Google Cloud Next 2025 | Nirav Mehta, Google Cloud Compute

Clips
More from Google Cloud Next 2025

Nirav Mehta

Senior Director, Product Management

Google Cloud Compute

play_circle_outline Revolutionizing AI Infrastructure: Discover the Power of Workload-Optimized AI Hypercomputers at Google Cloud Next

play_circle_outline Overview of Google's TPU strategy, including Ironwood

play_circle_outline Block storage optimized for AI with Hyperdisk ML and Exapools

play_circle_outline Introduction of Anywhere Cache for optimized storage access

play_circle_outline Empowering regulated industries with Google Distributed Cloud for on-premises AI solutions

Info
Transcript

Nirav Mehta, Google Cloud Compute

Nirav Mehta

Senior Director, Product Management Google Cloud Compute

Exploring Google Cloud's AI-Powered Infrastructure at Google Cloud Next 2025

Nirav Mehta, Senior Director of Product Management at Google Cloud, joins theCUBE at Google Cloud Next 2025 to explore the latest advancements in AI infrastructure. As the event highlights breakthroughs in AI enhancements and distributed computing, Mehta provides an insider perspective on Google's innovative strategies and technologies revolutionizing the cloud landscape.

In this insightful conversation, Mehta discusses Google's workload-optimized infrastructure, em... Read more

explore Keep Exploring

What is the new workload optimized infrastructure for AI called? add

What are some of the key features of the TPU hardware and software that make it optimized for scale, especially in B2C scenarios with millions of consumers requiring real-time decisions? add

What unique features does Hyperdisk ML and Exapools offer in terms of block storage optimization for AI? add

What is the importance of Anywhere Cache in an AI world and how does it function similarly to proximal storage? add

What is the product called Google Distributed Cloud and how does it work with both hardware and software? add

bolt Powered by CUBE AI

Nirav Mehta, Google Cloud Compute

search

>> My IFP.

>> Welcome back everyone to theCUBE's live coverage. I'm John Furrier, host of theCUBE with Dave Vellante, Savannah Peterson, Paul Nishwadi, breaking down all the action at Google. Next, it's been a whirlwind of announcements and product enhancements and breakthroughs around AI infrastructure and distributed computing that is the cloud in a hybrid environment, AI infused everywhere, all at once, all going down here at Google Cloud Next. Nirav Mehta is here. CUBE Alumni from 2016, senior director, product management, Google Cloud. Great to have you on again. It's been a while.

Nirav Mehta

>> Yeah.

>> Thanks for coming on.

Nirav Mehta

>> John, really good to be here. Thanks for having me.

>> So, obviously, the innovation in the AI side on the infrastructure is huge deal. Other companies call it AI factories. I call it the God box. I actually used that term back in 2016. It's not a box anymore, it's a system. And what's going on right now, as you can see from what you guys have announced, we covered that on our breaking analysis, keynote analysis segment. You're checking the boxes, product improvements off down the app stack. Okay, cool. Middle layer, call it the middleware layers data and all this stuff going on there, but all the action is in the AI infrastructure. CapEx commitment of 65 billion. You got the TPUs coming and a lot of breakthroughs around Pathways, roll of storage. This has been all the action, all the major infrastructure shows. Supercomputing, Nvidia GTC, what's going on around the chips, all these subsystems, I use that word intentionally because it's a system now, collection of clusters. So, this is where the action is. So, give us the update. What is the most important thing that people should pay attention to in this announcement as it relates to the infrastructure, because that sets the table for the acceleration of the innovation above it. Agents, agent to agent, all the stuff. What's your take?

Nirav Mehta

>> You bet. I think the infrastructure's right in the center of it all again. And if there's one thing I want to really sum up is that we are really about workload optimized infrastructure. And I love that you used the word system because customers are expecting us to take all the Google know-how of the last 20 years, built custom hardware, optimize software, commercial models, and all the at-scale inference application experience and deliver a system because it's all about faster time to value. And the instantiation of this workload optimized infrastructure for AI is what we call AI Hypercomputer. And you mentioned a few things there, TPU, Pathways, VLLM, XLA compilers, up to, we can talk about workload scheduling, flexible pricing. All of this is coming together wonderfully. I love that you call it God box. Once again, we have one and this time it's a system.

>> Yeah, the terms, the cliches in the tech industry are legendary, holy grail, God box, but we're entering a new era of IT. And one of the themes coming out of Google Next, and again this rhymes with all the consistency of other events, is that the old ways of how things were working, how compute worked, accessing a database, accessing an application are all changed with Geneva because now you have, I won't say omnidirectional Pathways, but just the software construct is completely different above the compute. So, what has to change on the infrastructure side under the Hypercomputer, which is the system that's going to empower everything. What are some of the key requirements you guys have to look at that people should know about as they look at refactoring, how they do software development, how they handle their data, and that's important. Cyber security now is embedded in everything. So, what's the impact to the requirements you guys have to build?

Nirav Mehta

>> Yeah, and let me start with some things that customers will never see, which is how are we building our data centers? Because it's not about how much cash you can put out to build these data centers, but do you have enough power? And so, it starts with highly sustainable components. You might have seen we've innovated optical circuit switches. Down to the very lowest unit of our computing, we want to ensure it's low power devices, but also extremely easy to fail over. You don't want humans walking around swapping cables when switches go down. So, optical switches, then what we do in the Silicon, the custom Silicon, of course, TPU and then Titanium, which is our offload technology. And then up to how do we optimize software for the Agentic workflows that we are seeing today. And so, what is Pathways, for example, Google research that we are bringing to everyone and what does it do? It really allows you to take these somewhat asynchronous workflows, separating pre-fill from decode so you can really optimize the use of your infrastructure. And to put it very simply, if you are a developer using VLLM with an XLA compiler running on a TPU, they don't have to worry about much. You just use your Python, use the PyTorch framework. XLA will take care of how to use the TPU with the best possible.

>> Explain VLLMs for those who don't know the definition, what is that specifically?

Nirav Mehta

>> Yeah, it's a way for people to use basically like PyTorch and JAX, a way to approach programming in a way that is very familiar and don't have to relearn any of the Python that you're going to use to program these GPUs and TPUs.

Nirav Mehta

>> Can we talk about the seventh generation TPU? I feel like I should be more up to speed now. We're in the seventh generation.

>> Ironwood.

Nirav Mehta

>> So, yeah, let's talk about Ironwood. Help us understand the motivation for TPU, generally. Ironwood, I know, bigger, faster, better, more improved. But the strategy, how would you describe Google's Silicon strategy? Specifically, is it designed for cost optimization, better performance, better exploitation of that stack, training, inference? Help us understand that.

Nirav Mehta

>> Yeah. In the continuum of the TPU families, yeah, as you've said, we are one going after multi-dimensional scale, obviously. Why scale? So, of course, very large training jobs in the early years of the TPU was heavily targeted at the foundational model builders. But now, we are the precipice of massive explosion in inference and we already have a lot of traditional enterprises using TPUs at massive scale to do centralized inference. So, optimizing for scale for especially B2C scenarios where you're engaging millions of consumers, real-time decisions, how fast can you process them, every second counts. You mentioned how to use some of the TPU infrastructure, so that it is actually optimized for pre-filled decode. That is also a heavily co-designed hardware and software co-designed element of the Ironwood because it's even more important for us to optimize for the software framework. We don't have expert users. We now have at-scale data scientists in traditional enterprises using it. Security, reliability, responsibility layers, they're, of course, quite the mainstay for how Google has approached, Vertex, Gemini. And then also on top of TPU, all of these are available. And lastly, reliability and security. You obviously want to run these clusters at extreme scale with the same logic we use when we build Google search, which is commodity systems wired together with very high bandwidth, apart fails, no big deal. We're going to just automatically fail over, the optical switch I mentioned, rapid mirror redirection. You don't really have to stop. So, all of this has come together.

>> And you mentioned inference, but you have customers also using TPUs for training, correct?

Nirav Mehta

>> Yes, very much.

>> So, it's a multi-tool piece of Silicon. Okay. And it's Arm-based design, correct?

Nirav Mehta

>> It's actually proprietary-

Nirav Mehta

>> Your design. I get it, but it's not Arm-based?

Nirav Mehta

>> No.

>> Oh, I don't know why I thought it was Arm-based because everything's Arm-based these days. Okay, okay. So, how would you recommend customers? Because you offer choice, I get any sort of compute I want out of GCP. How would you juxtapose that versus say Nvidia GPUs, also your strategy relative to other hyperscalers? How would you differentiate?

Nirav Mehta

>> Yes. First of all, we are great partners with Nvidia. We build the same kind of end-to-end systems with GPUs as we do with TPUs. And we think there's room for a lot of these choices. Choices super critical for the industry right now for a variety of reasons, including supply chain health. But the main driver for us having innovated so much on the TPU is we learned from building massive scale AI for Google. That certain scale can only come together if you build it ground up and co-design everything. So, it's not so much, for us, GPU versus TPU. It's the right thing for the right job. We have many architectures where they're being used together, maybe inference happening at the edge on a GPU or a CPU, training happening on TPU.

Nirav Mehta

>> Now, I want to ask you about the rapid storage Hyperdisk. I think your names are better than Amazon's names, but what is it? And Exapools. These are great names.

Nirav Mehta

>> Hyperdisk, my product, my baby.

>> Which one?

>> Hyperdisk and Exapools. These are really evocative names. Excellent. Explain the architecture, what's different about them? Why is it modern? Take us through that.

>> And what's different, too, because storage is changing. That's clear.

Nirav Mehta

>> Yes. Well, look, we are the only hyperscaler that's offering block storage optimized for AI. So, there's this product called Hyperdisk ML, which is highly optimized for inference. So, very simply put, you can have up to 1200 compute virtual machines connect to a single Hyperdisk ML block storage instance. It'll load models very fast up to 12 times faster than some alternatives like object storage. And then all of these VMs simultaneously have access to it. No other block storage offers that. Exapools, you mentioned Exapools. That is highly optimized for very large training jobs. It's one exabyte or higher capacity delivered in a highly compact placement. So, you get this massive storage device that is used for you would load HDFS file system and then perform AI jobs, HPC jobs on top of that. No one's doing this with block storage, that you asked me what's different and then I can talk about rapid storage, which is actually object storage.

>> But before you go on, so the benefit of object storage is you got to put get syntax simple. The advantage of block storage is like you say, you get better performance. Is there complexity involved or have you abstracted that? How have you abstracted that?

Nirav Mehta

>> Because of well-established ways to use HDFS and other such file systems, it is not that hard, right?

>> Yeah.

Nirav Mehta

>> We had a customer go to more than two exabytes within a couple months.

>> But you've got to have the business value because it's more expensive.

Nirav Mehta

>> But at scale, the cost of the training, it drops them down.

>> Yeah, it crosses over. No question.

Nirav Mehta

>> It like think of what it takes. From eight hours, you're now training in one hour. That's a huge business value.

>> Right, right. You're fringing money at that point.

Nirav Mehta

>> Exactly. And then rapid storage, that one is more object store brought closer to your zone where your computer is. So, really proximal placement of storage. There also, we would expose it as a file system. You could use a GCS fuse.

>> Nvidia at their conference talked about the role of storage. In fact, it had its own slide in the keynote. So, obviously, that jumps out at us because we've been following storage and the changes there. And also in other slides, when these AI factories or Hypercomputer architectures get presented, the configurations of things are laid out and there are systems. So, what is the role of storage? You mentioned it's approximated to the zone. Is it changing because one, the latency requirements, the scale? What about the storage capabilities and attributes of how it's configured are designed, I should say, in the system? Because if it's a system, it reminds me of the '90s, memory management was a big deal. If we remember back in the PC days, server days, now it seems like memory management in air quotes, HBM, you got solid state looking like it's going to be a capacity tier. These are going to be laid in there, too. So, what does that enable and what does that do to storage? Because there still needs storage.

Nirav Mehta

>> Oh, very much. Yeah. I think what we are seeing is the need for loosely coupled compute and storage versus the big honking single hyperconverged-type devices because you're seeing so much churn in how you want to scale up and when you want to scale up your compute versus your storage. So, this is where it's really clicking for us. We brought block storage, object storage in different form factors depending on what you want to optimize for latency versus cost. And also, scale, by the way, where can you just, within a day, spin up an exabyte of storage? So, a lot of our customers, very large customers, are saying the fact that you can do that, you're unique already and you can do it with adjacent placement, optimized latency to the compute platform.

>> And what about Anywhere Cache? Another great name.

Nirav Mehta

>> Yeah.

>> All your names are evocative, but explain what's behind Anywhere Cache and why it's important in an AI world.

Nirav Mehta

>> So, it's also similar in a way to proximal storage. So, bringing object storage closer to your compute instance and it almost operates like a cache from the OS. We are able to use it like a cache. So, just different ways in which when your gets and puts to remote object storage is just not enough, customers are driving us, by the way, to optimize and say, if only I had this available rapidly from iOS as a cache layer. And so, you can see tiers now, when do you need that versus rapid storage versus Hyperdisk ML? And it can sometimes get confusing, but frankly, it's all driven by a workload and scenario. So, we are helping you choose very quickly. GPU, TPU, when do you use Hyperdisk Exapools? Customers are super savvy right now. Some of the foundational model builders are telling us exactly what to build.

>> Yeah. If you look back at all the history of theCUBE videos and even before we started theCUBE, the holy trinity, the triangle was compute, storage, networking.

Nirav Mehta

>> Yes.

>> Okay. You mentioned hyperconverged. That was a big part of that. So, now, you guys talk about compute, storage, and software, but then, there's a ton of networking announcements and you mentioned this distributed cloud is networking based. You got the open WAN, cloud WAN. So, okay, let's keep the three, the holy trinity, I call it, compute, storage, networking, but the role of software is critical.

Nirav Mehta

>> Yes.

>> And all of these AI systems, I like to call them the God box, but the mainframe, what do we want to call it? It's a big iron's back and you guys, I'll use the word iron. Ironwood like how they got iron in there, because big iron used to be a term for the big machine that ran everything. What runs on the hardware?Because it's not like your traditional OS. Back in the old days, mainframes had proprietary OSs and NASs and so, now you had what's running the machine? What's running the AI hypercomputer? Is it like a homegrown OS? Is it the Pathway combination with the VLLMs? Is it all this connective tissue software? What is the core operating system?

Nirav Mehta

>> Highly optimized, specialized microcode down to the hardware layer. And yes, it's not like a traditional system that is down to every chip is a computer and then the compiler is doing some really advanced things that you wouldn't see in your traditional C++ compiler. It knows, for example, our compilers are aware of things like our optical switching or network layers. So, it can really do the thinking for you how to use it all.

>> Talk about the compilers. This is not talked about in the industry much. I want to just, I know it's in the weeds, but I want to go there real quick. What's the role of the compiler as you have more microcode that's really specific, intentional connective tissue to coordinate, run, schedule, and run off all these things because you're running as a clusters, they're clusters, they're talking to each other. There's also latency issues around how you lay it out. Having compilers that are well-written and proprietary. I'll use that and that's a good thing in this case. Why is that so important? What does that give the benefit of?

Nirav Mehta

>> I think it's, again, delivering a system where the customer doesn't have to worry about the underlying data center and the computer infrastructure. Because again, like I mentioned, nothing trumps time to value. There's a race out there. Every bank, every retailer I'm talking to is saying, "Please, I trust you. You guys know how to build these things. Don't make me go learn another language. I'm going to use this framework, get me going tomorrow."

Nirav Mehta

>> I was struck in the keynote, I think it was Thomas, but maybe it was somebody else, maybe it wasn't Thomas who said roughly paraphrasing, not all customers are going to put their data into the cloud. So, we have solutions. And it was during the Nvidia Jensen conversation, Dell got a little mentioned, I don't know if you caught that. So, you essentially have solutions to bring your AI stack on . So, a lot of, you mentioned banks, a lot of the banks we're talking to are saying, "Look, we got these systems. They never moved into the cloud. We're not going to move them into the cloud. It's too expensive. We want to keep them right there. But we want to bring the AI to the data." What's your play there?

Nirav Mehta

>> Yes. We have this product called Google Distributed Cloud. Think of it as both hardware and software that allows you to take a piece of Google Cloud, run it, either air-gapped within a company or connect it back to us in a distributed edge. What you saw in the keynote is that we announced ways in which we can run this software part of this on third-party hardware including the Nvidia DGX and then run on top of our stack a version of Gemini, Gemini Flash. So, what we are hearing from all these governments, banks, and others is look, "Google, I want to work with you for the AI parts of it. How can I get that in a sovereign way?" And so, they get Gemini with the same long context window and all of that.

Nirav Mehta

>> Yeah, the regulated industry's going to eat this up. So, you basically, I get your system in an air-gapped if I want.

Nirav Mehta

>> Exactly right.

>> Or in a hybrid-

Nirav Mehta

>> Hybrid connection like McDonald's is using, we have hundreds of locations. Fascinating use cases there, by the way. Same with Wendy's. I see your pattern.

>> Acceleration or deployments, everything else. It's great to have you on. It's been a while since 2016. Thanks for sharing. Last question on a personal note, you've been in the industry, you've been doing a lot of engineering, and building a lot of products. You've seen many cycles of innovation. We're all kind of historians at the end of the day. When we've seen what's happened before us, now we see what's in front of us. What is it like right now as an engineer, as a leader, building out AI infrastructure? What's it like and what are some of the things that goes on that you're like, you look at and say, "Oh my God, this is amazing," because we're kind of living in kind of a historic time. It's a generational shift in computing, how software's built, a younger generation's coming up through the system, and they're all doing enterprise startups. I've never seen anything like it before. So, it's like you've got all this stuff coming together. What's it like?

Nirav Mehta

>> Well, first of all, isn't it just amazing to be in the infrastructure space? And I know you've done a lot of this.

>> We'd love it.

Nirav Mehta

>> Yes, super cool, again, but here's the thing. You asked me about history. I'm in a time warp because in a given week, I will have a meeting on mainframe modernization, on AI infrastructure, and also explain why we are getting ahead of quantum with simulation. And sometimes, I go home, I go, wait, which past, present, future?

>> Which decade am I in? It's a hyperspace.

Nirav Mehta

>> It is. And you know what? It really is happening. I love that we are standing on the shoulders of giants and everything we learned from the mainframe from X86 AI and I respect our industry and everything, all our colleagues have done.

>> And what's really good too about the infrastructure advancements, I think, is that not only is it fun because we've been talking about it, but it's changing.

Nirav Mehta

>> Yes.

>> But it's enabling social change, it's enabling betterment for society. The human intelligence, everything in the stack is pointing towards something that's the value curve, as Dave pointed out yesterday. You can see value fast. You're up the value curve before you even pay in some cases. We had one of your partners on site, we're on a freemium model and they get value before they even pay.

Nirav Mehta

>> For sure.

>> It's like this is a speed game at just not just hardware and speeds and feeds, but the value.

Nirav Mehta

>> We are touching humans faster with the new tech. Immediately, we see the impact in hospitals, in retailers with what we are doing.

>> Well, we've been covered like a blanket and we're going to continue. Thanks for coming on theCUBE again.

Nirav Mehta

>> Great to see you, man.

>> Thanks for coming on. Love the work you guys are doing. Again, Google's doing the work, 65 billion in CapEx this year, just this year in commitment to spending to keep the advancements going. It's not just-

Nirav Mehta

>> I think it's higher.

>> I think it might be 65.

Nirav Mehta

>> Okay.

>> Well, 65, 78.

>> Oh, 78. 78, 75. 75 is the right number. Thank you.

>> You're an analyst, like a slice of salami, as you said.

>> We got the real-

>> Pushing the envelope. AI infrastructure enabling rapid change. This is theCUBE. We're documenting it every day. Thanks for watching theCUBE here at Google Next.