In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane state from etcd to Spanner for massive scale, and how new Kubernetes capabilities like Dynamic Resource Allocation, in-place pod resizing, Vertical Pod Autoscaling and improved cluster autoscaling are helping customers run AI on Kubernetes and manage Kubernetes with AI.
The conversation also explores how hardware limits and efficiency are reshaping cloud-native design, from power and cooling innovations seen at Supercomputing to squeezing more capacity into every data center. Holterhoff shares how Kubernetes, AI conformance efforts and projects like OpenTelemetry (OTel) are coming together to support AI agents and complex workflows with strong community backing and observability. Looking ahead, Macleod points to a future of millions of accelerators on Kubernetes clusters and better “graceful degradation” as systems hit scale ceilings, while Singh envisions true platform agents that can auto-size and reshape pods so developers simply deploy and let the platform optimize.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for GKE 10 Year Anniversary Exclusive
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for GKE 10 Year Anniversary Exclusive.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to GKE 10-Year Anniversary Exclusive
Please sign in with LinkedIn to continue to GKE 10-Year Anniversary Exclusive. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Jago Macleod, Gari Singh, Google & Kate Holterhoff, RedMonk
In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane state from etcd to Spanner for massive scale, and how new Kubernetes capabilities like Dynamic Resource Allocation, in-place pod resizing, Vertical Pod Autoscaling and improved cluster autoscaling are helping customers run AI on Kubernetes and manage Kubernetes with AI.
The conversation also explores how hardware limits and efficiency are reshaping cloud-native design, from power and cooling innovations seen at Supercomputing to squeezing more capacity into every data center. Holterhoff shares how Kubernetes, AI conformance efforts and projects like OpenTelemetry (OTel) are coming together to support AI agents and complex workflows with strong community backing and observability. Looking ahead, Macleod points to a future of millions of accelerators on Kubernetes clusters and better “graceful degradation” as systems hit scale ceilings, while Singh envisions true platform agents that can auto-size and reshape pods so developers simply deploy and let the platform optimize.
Jago Macleod, Gari Singh, Google & Kate Holterhoff, RedMonk
Savannah Peterson
Principal Analyst & HostSiliconANGLE Media, Inc.
HOST
Jago Macleod
Director Of Engineering, KubernetesGoogle
Kate Holerhoff
Senior Industry AnalystRedMonk
Gari Singh
Product Manager, Google CloudGoogle
In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane s...Read more
exploreKeep Exploring
What event is being covered in this text and who is introducing the segment?add
What is the significance of scalability in the context of Kubernetes and its community?add
What has been the evolution and current state of scale in the Kubernetes and GKE communities?add
What recent trends in CPU technology and resource management are influencing performance improvements in hardware?add
Jago Macleod, Gari Singh, Google & Kate Holterhoff, RedMonk
search
Savannah Peterson
>> Good evening and welcome back to surprisingly chilly Atlanta, Georgia. We're here coming to the conclusion of day one of our KubeCon coverage. I am very excited to be bringing you third of our six VIP exclusive segments with Google Cloud, celebrating a decade of the Google Kubernetes engine, the GKE engine. Excuse me there. My name is Savannah Peterson with three awesome guests, two prior victims, and one fabulous new face. And I will start with first, Kate, Jago and Gari, thank you so much for being here.
Jago Macleod
>> Great to be here.
Gari Singh
>> Thanks for having us.
Kate Holerhoff
>> Pleasure.
Savannah Peterson
>> It's always a lot of fun. This is going to be even more fun. I love that we're getting a little bit of a party this time around. Gari, last time we were hanging out, we did not have this many nodes.
Gari Singh
>> Nope.
Savannah Peterson
>> How many nodes are we at?
Gari Singh
>> 130,000.
Savannah Peterson
>> So I just want to bring this because it was only a year ago, I still have my sticker here, 65k. They have doubled the number of nodes in 12 months. That's pretty impressive. So what are these nodes doing?
Gari Singh
>> What are these nodes doing? A lot. No, I mean in general, the workloads typically, most people aren't going to go to 130,000 nodes, of course, but although more people are starting to creep up on this.
Savannah Peterson
>> So what types of instances end up at that level? Because it's a ton.
Gari Singh
>> It's usually in the massive training jobs, massive AI jobs that need a lot of compute. Typically, with nodes you can use the entire GPU, right? There's typically a match of a node to a GPU. So you'll end up saying, "I need 130,000 GPUs to train whatever these massive models," Entropic, Gemini, whatever it may be, all in sort of parallel. So you need to quickly provision those up. Yeah, I mean I think for us it was also more than we keep pushing people to say that we could do more. I think that we do do pretty good due diligence on this and the fact that, yeah, we don't want to take two years to start them up. There's a lot of factors that go in there. Can we actually start these up in timely fashion, scale them down and can things actually communicate through them? So it's not just the same thing of, "Hey, I ran 130,000 nodes, clapping it back." We actually have make sure-
Savannah Peterson
>> Like what does that actually mean.
Gari Singh
>> Yeah, exactly.
Savannah Peterson
>> And what does that do for me, what's the solution.
Gari Singh
>> Can we schedule that right? Can Kubernetes, the control plane actually know about those nodes as updates are coming, health checks and all of that. So some of the same work that we worked on last year, scaling into SED, we've now replaced SED with Spanner internally for sort of massive scale. Some people don't realize that YAML or JSON is very verbose we'll put it. And so imagine you have a hundred-
Savannah Peterson
>> That is such a generous way of saying that.
Gari Singh
>> So you take 130,000 and you multiply that times 1k, 3K, whatever, you've now got this massive string of records that you have to store. So even small things like that, we've been working super hard on to get this.
Jago Macleod
>> And it does challenge a lot of the underlying assumptions. These workloads have one pod per node, and if the pod goes down, the whole workload has to stop and you have to do some hard work-
Savannah Peterson
>> Everybody's waiting the queue go do their thing.
Jago Macleod
>> And hardware's super expensive and you don't want to be down for long. So this scalability isn't just about the number of nodes, it's also about the number of pods and node pools and all these other aspects that fit inside that envelope.
Savannah Peterson
>> And their dependencies upon each other. Yeah.
Jago Macleod
>> Definitely. And even though we did shift to Spanner internally, we still mostly contribute to etcd upstream. And the scalability is through the whole control plane.
Gari Singh
>> 30 something thousand nodes maybe released.
Jago Macleod
>> It really does percolate through the whole ecosystem, this scalability improvements.
Savannah Peterson
>> The reason I started with the nodes is this is all about scale. So this is a conversation about scale because we're really achieving, I think, a moment within the Kubernetes community, within the GKE community and open-source community where we're tipping over, it's all at scale. It's hot right now. Everyone's talking about us. We were kind of the transparent nerds who tried to help each other learn for a long time, and now everybody's like, "Hey, that community kind of figured it out." Kate, you've watched the evolution here a little bit. Did you anticipate that we would be where we are today and the scale of GKE, Kubernetes and generally just container management?
Kate Holerhoff
>> Yeah, I mean, I'd say that the scale is something that is, I wouldn't say that we've been anticipating it so closely. I mean there's always element of mystery. But yeah, I mean in general, knowing that we were going to have to expand the capabilities to allow for inference at this level is something that I think we've been preparing for. I mean, it's great to be in this particular conference center. So I'm local to Atlanta and I was here for the Supercomputing conference.
Savannah Peterson
>> I was there last year as well.
Kate Holerhoff
>> Amazing. Okay. Well, one of the huge takeaways that I had was just all these different cooling things and the fact that all of these server solutions we're now trying to find ways to ensure that they were able to run these workloads.
Savannah Peterson
>> It's one of the great challenges of our time right now. Yeah, I mean seriously though, access to water, access to space, it matters-
Jago Macleod
>> Electricity.
Savannah Peterson
>> Yeah, it's not as sexy, but it's like basic utilities are actually driving a lot. I didn't mean to interrupt you. Keep going, Kate.
Kate Holerhoff
>> Absolutely. No, it's good. It was just remarkable to me because I went there expecting to see all the fun of quantum chandeliers and things, but instead what I took away-
Savannah Peterson
>> .
Kate Holerhoff
>> Deeply cool. But the fact that I learned that all these folks who used to be plumbers are now turning their attention to making sure that they can make cooled server doors.
Savannah Peterson
>> And they're making some coin doing that, liquid cooling is fascinating.
Kate Holerhoff
>> I was so into it, I had no idea. So yeah, it was a lot of fun. I mean, I was a front-end engineer back when I was a practitioner, but I tell you what, the hardware stuff, I mean that is the coolest part. I mean, for me, that's where the excitement is. So yeah.
Savannah Peterson
>> So you've been anticipating it but watching it, and I like that even as an analyst, consistently delighted and surprised by some of the cool stuff in our world.
Kate Holerhoff
>> For sure. Yeah, you won't get very far as an analyst unless you're deeply curious and super excitable I would say. This is something where we get to hear about the coolest stuff that's happening in the industry right now and get a sneak peek and get to ask the smartest people questions about it. Yeah, absolutely.
Savannah Peterson
>> Well, actually, on that note, Jago, could you talk to us a little bit about the announcements, the what's going on with Axion? It's been a big year for you guys.
Jago Macleod
>> It has been, and I think hardware really has gotten fun again recently with ARM and all the CPUs are doing interesting. It kind of seems-
Savannah Peterson
>> There's ARM inside my glasses right now....
Jago Macleod
>> a little bit faster every year, but nothing really groundbreaking for about 20 years in the CPU space. And now we've been working a lot with Intel and NVIDIA and others on using some of the mechanisms in Kubernetes to get much closer to the hardware so that folks can squeeze that last bit of performance out of it.
Savannah Peterson
>> How do you do that? Talk to me about that.
Jago Macleod
>> So the DRA, Dynamic Resource Allocation, has now gone GA, and of course the AI workloads are the real motivating factor, but it inspired a lot of really cool conversations with the SCEDMD folks behind Slurm and their next round of adopters already run Kubernetes and are now adopting Slurm, but don't want to learn how to run it on VMs or bare metal or how does that all work. And so I've been really pleased with the collaboration with both the SCEDMD folks on Slurm and the AnyScale folks on Ray. And these communities are really working well together. But the HPC world has 20 years of experience doing these very low-level interactions with the CPU, and they think it's madness that we just let the Linux scheduler figure out where to put a workload on a node with 300 cores. That's so cute. So we've been learning a lot and it's worked really well. So we've been improving the performance of all of those things on top of Kubernetes, and it's awesome to see.
Savannah Peterson
>> What are some fun examples of ways in which that will translate into benefit for the world? How are those solutions going to help? I mean, obviously there's costs, but what are some of the other benefits of squeezing out every little inch of that?
Jago Macleod
>> Well, I think it's capacity and capacity-chasing is the challenge of our time. It's the electricity and the more you can shrink and be more efficient, the more workloads you can fit in a data center. So it really is out of necessity. And this necessity is inspiring all kinds of creativity in that space. So we're learning from wherever we can learn from. And it's no longer that a node is a node is a node with more less CPU or memory. There's all this super-specialized hardware, and the end-user community sometimes complains about the complexity, but I mean, it's a complex space. So we do need to build higher-level control for the scheduler and the autoscaler to make use of this new information and therefore save the end user from some of this complexity. But it's-
Savannah Peterson
>> It's distributing the complexity.
Jago Macleod
>> But it's really in there. And if you hide it, then you just lose the efficiency.
Savannah Peterson
>> Well, and it's that complexity that allows these solutions to be so customized and to be so on point for whatever the solution is.
Jago Macleod
>> Yes, exactly.
Savannah Peterson
>> Which I think is one of the trickier parts. Gari, we got to kick off the year talking.
Gari Singh
>> Yep.
Savannah Peterson
>> Has the year played out the way in what you thought it would, what's excited you in between January and now?
Gari Singh
>> Yeah, actually the year has pretty much played out.
Savannah Peterson
>> I love that you're giving this genuine thought. I can see you, "January, February, March."
Gari Singh
>> What did I tell you back then? I guess I would say I'm pleasantly, I mean, not pleasantly surprised, but yeah, I mean, most of this stuff, when we last talked, a lot of things that I was excited about were this next generation of scales. Some of the stuff that we are working on, things that were coming in Kubernetes, so interesting things like in-place pod resizing, Vertical Pod Autoscaling, working with that, improvements that we've made in cluster autoscaling, both in GKE and in open source and then not related directly to running AI. I think we talked a lot about, I was fascinated with, I thought that we were going to see a lot of AI operations coming soon, more Agentic AI for running Kubernetes and managing Kubernetes. So I think we've got both those things going on now. Right? Bobby would probably say we have running AI and running AI on Kubernetes and running Kubernetes with AI. And I think both of those are going extremely well so far this year.
Savannah Peterson
>> I mean, it's been quite the catalyst for the community. Kubernetes had been going like this, and then I just feel like we've gone like this in terms of adoption awareness application. Are you seeing that too, Kate?
Kate Holerhoff
>> Yeah, I mean, I would say, I think one of the big takeaways I've had from this conference is absolutely how we're pairing these technologies and that using the AI conformance, a lot of these initiatives are preparing us to be using Kubernetes as part of that workflow to make sure that we can create AI agents and run these workflows in a way that makes sense, that has a lot of community support, and that is going to be performant.
Savannah Peterson
>> Yeah, absolutely. Jago, what's your favorite thing that's powered by GKE? It's coming to all of you, so don't worry.
Jago Macleod
>> Yes, that's a tricky one.
Savannah Peterson
>> I know you don't have to pick a favorite kid, but you can tell me one of your favorite kids.
Jago Macleod
>> I'm not good at not telling people things, so I'm trying to figure out what I'm allowed to say and what I'm not allowed to say. I can share that there are a couple of hundred, few hundred Google products that run on GKE and even some that I didn't even realize were running on GKE. So when there's a customer that adopts your product and you don't know until many, many months or years later, and then they suddenly have a feature request and that's-
>> Yeah, exactly. So I love those surprises and that's probably my favorite.
Savannah Peterson
>> I think you're going to see a lot more of that in the couple of years.
Jago Macleod
>> And Google is a very demanding customer, I've realized.
Savannah Peterson
>> Really?
Jago Macleod
>> Yeah. It turns out.
Savannah Peterson
>> We've never worked with them. I don't know what you're talking about.
Jago Macleod
>> I know. I think we avoided early on trying to get any internal customers for that reason, and now we don't even know and they adopt it. So that's pretty cool to see.
Savannah Peterson
>> That is pretty cool. I bet across the business. Gari, what about you?
Gari Singh
>> I guess I would say I have two categories of favorites aside for the AI. One thing that AI maybe left, I don't know, we have been talking a lot about more Kubernetes in the traditional enterprise, and then AI happened and we win there. But I'd say it has been pretty fascinating that, and now they are coming together, of course, but we see a lot more banks and whatever, and you can see with the reference is that are now using Kubernetes at scale. And that's pretty interesting. And then even skipping to starting talk about running agents on there and this next generation. So to me, that's impressive because having worked with them for years through Jake Toomey, I'm that old, up till now.
Savannah Peterson
>> You're that young, Gari, don't worry.
Gari Singh
>> It is great. And I guess my other favorite one still, I think we can mention it, you guys can edit it out, I guess, if not. There's a customer called AirAsia. It's still one of my favorites of all time. They run on autopilot. There's literally one dude who runs, not the whole airline or whatever, but they have just one guy. So I still love the fact that we can run these massively complicated workloads, but at the same time, we build something simple enough for a small team to power their application. So to me, I'm an extremist, as you know, so I like having built those going.
Savannah Peterson
>> Yes, I do. But I like that because when we talk about in the early communities, there's always decreasing complexity, how do we decrease complexity for this, let people do things faster. If one person can manage the heavy lift for an airline, which is wildly complex, ton of sensitive data, obviously some lives at stake casually-
Gari Singh
>> I mean, to be fair, there's a bigger team, but you get-
Savannah Peterson
>> No, no, but I mean, of course, of course. But that tells you that you've actually achieved that goal, in my opinion. If one person can do that, that means it doesn't take 75 people just to understand what's going on or how a stack is working. It's honestly pretty compelling. All right. My last question, and I would love for all three of you to answer this, is when we are hanging out at KubeCon, either in Amsterdam or in Salt Lake next fall, what do you hope to be able to say then that you can't yet say today? Or in your case, Kate, what are you most excited about or hoping the industry pulls through? A slightly different lens for you. But Jago, I'm going to start with you.
Jago Macleod
>> Millions of accelerators on Kubernetes clusters, I think you were sharing about doubling in the last year. I think that acceleration is real and it is continuous acceleration. And the next milestone is not another incremental improvement, but another order of magnitude. And I think we have the path to get there and we're pretty close so.
Savannah Peterson
>> My next sticker is going to be a million nodes?
Jago Macleod
>> A million nodes, two and a half million nodes.
Savannah Peterson
>> I love this.
Jago Macleod
>> That's my next target.
Savannah Peterson
>> I look forward to when we can play that back and I'm holding the sticker sitting next to you.
Jago Macleod
>> Other one I want to talk about which is we've consistently raised the ceiling. We have not put enough attention on graceful degradation when you hit the ceiling. And these enormous clusters are very much white gloves set up with extreme care. And we're also working hard on the self-service up to 15,000 and beyond with no one touching, no one knowing, and me being surprised by someone saying, "So we're running 25,000 nodes and we had a question," and me being surprised. So that's my other...
Savannah Peterson
>> I'm really glad you brought that up. It is something people don't talk about enough. All right. All right. We're going to hold you to that. Looking forward to 2.5 million nodes. Kate, what are you hoping to see?
Kate Holerhoff
>> Well, just in terms of Kubernetes or in terms of the CNCF, what are we?
Savannah Peterson
>> Whatever your heart sings right now.
Kate Holerhoff
>> Oh my gosh. Well, I don't know. I've been very interested in the OTel project, so I guess I've been interested in some of their SIGs in terms of how they're thinking about the front-end community and things like that. So that's a pet project for me. But yeah, I don't know. I am always following OTel and seeing where that lands. So yeah, something about the SIG for that, but I know it's a little niche, but...
Savannah Peterson
>> Hey, we're all niche nerds here. You're safe.
Kate Holerhoff
>> All right. All right.
Savannah Peterson
>> Gari, what about you? What are we going to be talking about next year?
Gari Singh
>> I'll go still on the AI, but I'm more on the, I think there's going to be more Agentic. I mean, so what I'd like to see, and again, maybe this hasn't fully come true from our last talk, but I think there's two things, I think that we really can have platform agents that we start talking about platform engineering, really start talking about true platform agents doing a lot of that work. So I think we'll see a lot of that in the community. Hopefully a lot of that coming from us as well. And then I think selfishly on the pure GKE side, I think we're closing in, lots of open-source stuff came out with, again, the in-pod in-place resizing, pod snapshotting, all these things that we have I think that I'd love to be able to say, "Hey, you deploy this pod and we literally just figure out how to size it for you and resize it and you don't do anything and we just move it around." And we're not far, you could build that yourself today, but I'd love for somebody to just deploy it and then we could add things like DRA or whatever, here's what I need-
Savannah Peterson
>> It can shape-shift, do whatever it needs to do to be optimized.
Gari Singh
>> The big problem for people. I don't know how many resources my pod needs and maybe my app does really well like can you just scale for me. And we have all the building blocks, but I love to just show, "Hey, you can deploy a pod on the show here and we'll just start running some crazy load and see what happens."
Savannah Peterson
>> I love that.
Gari Singh
>> Maybe we should try that.
Savannah Peterson
>> I know. I was just going to say, "Now we know what we're doing next show, Gari." That's a great idea actually. I think that could be pretty cool. That would be a neat way to really bring the audience in too. Thank you. This has been really fun as always. I hope that you continue to have a fabulous KubeCon and I hope all of you are having as much fun as we're having here in Atlanta, Georgia at KubeCon. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.