In this KubeCon + CloudNativeCon North America segment from Atlanta, theCUBE’s Savannah Peterson sits down with Google’s Akshay Ram, Kelsey Hightower and Eddie Villalba to explore how GKE is evolving to handle modern AI inference workloads. They break down GKE’s inference-focused innovations, from the Inference Gateway and GKE Inference Quickstart to new Kubernetes inference API primitives and CRDs that better handle unpredictable LLM request patterns, day-zero performance tuning and accelerator-aware load balancing. The discussion also touches on Google’s work with open-source communities such as Ray and vLLM, and the push toward more hardware-agnostic model serving across GPUs and TPUs.
The conversation goes deeper into Kubernetes as an extensible workload API and “infrastructure framework,” where CRDs turn real-world practices like inference into reusable APIs instead of one-off engineering efforts. Ram, Hightower and Villalba share practical advice for teams just getting started with Kubernetes and AI, emphasizing core principles such as understanding resource types and contracts, leaning on community support and GKE quickstarts, and recognizing the late-mover advantage of inheriting a decade of hard-won patterns. Looking ahead to 2026, they imagine a world where inference is “just another microservice,” customers remix Google’s building blocks in unexpected ways, and advanced optimizations from research labs and large AI builders flow quickly down to startups, enterprises and regulated industries at their own scale.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for GKE 10 Year Anniversary Exclusive
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for GKE 10 Year Anniversary Exclusive.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to GKE 10-Year Anniversary Exclusive
Please sign in with LinkedIn to continue to GKE 10-Year Anniversary Exclusive. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Akshay Ram, Kelsey Hightower & Eddie Villalba
In this KubeCon + CloudNativeCon North America segment from Atlanta, theCUBE’s Savannah Peterson sits down with Google’s Akshay Ram, Kelsey Hightower and Eddie Villalba to explore how GKE is evolving to handle modern AI inference workloads. They break down GKE’s inference-focused innovations, from the Inference Gateway and GKE Inference Quickstart to new Kubernetes inference API primitives and CRDs that better handle unpredictable LLM request patterns, day-zero performance tuning and accelerator-aware load balancing. The discussion also touches on Google’s work with open-source communities such as Ray and vLLM, and the push toward more hardware-agnostic model serving across GPUs and TPUs.
The conversation goes deeper into Kubernetes as an extensible workload API and “infrastructure framework,” where CRDs turn real-world practices like inference into reusable APIs instead of one-off engineering efforts. Ram, Hightower and Villalba share practical advice for teams just getting started with Kubernetes and AI, emphasizing core principles such as understanding resource types and contracts, leaning on community support and GKE quickstarts, and recognizing the late-mover advantage of inheriting a decade of hard-won patterns. Looking ahead to 2026, they imagine a world where inference is “just another microservice,” customers remix Google’s building blocks in unexpected ways, and advanced optimizations from research labs and large AI builders flow quickly down to startups, enterprises and regulated industries at their own scale.
In this KubeCon + CloudNativeCon North America segment from Atlanta, theCUBE’s Savannah Peterson sits down with Google’s Akshay Ram, Kelsey Hightower and Eddie Villalba to explore how GKE is evolving to handle modern AI inference workloads. They break down GKE’s inference-focused innovations, from the Inference Gateway and GKE Inference Quickstart to new Kubernetes inference API primitives and CRDs that better handle unpredictable LLM request patterns, day-zero performance tuning and accelerator-aware load balancing. The discussion also touches on Google’s wo...Read more
exploreKeep Exploring
What event is taking place in Atlanta, Georgia?add
What is a summary of the discussion regarding the inference gateway and customer journey related to serving workloads?add
What is the approach being taken to improve customer experience and support in using GKE for inference workloads?add
What are the challenges faced when using Kubernetes with different types of requests in the context of LLMs (Large Language Models) compared to traditional web services?add
What is the learning environment like for newcomers to Kubernetes, and how are questions perceived within the community?add
>> Good afternoon, open source community and welcome back to beautiful Atlanta, Georgia. We're here on day one of three days of coverage at KubeCon, bringing you an exclusive and very special series celebrating 10 years of GKE. My name's Savannah Peterson with a absolutely brilliant panel to my left. Akshay, Kelsey and Eddie-
Eddie Villalba
>> Hi.
Akshay Ram
>> Hi.
Eddie Villalba
>> Good to see you you again....
Savannah Peterson
>> CUBE alum and VIP in Atlanta again nonetheless.
Eddie Villalba
>> I know. Twice. I love it.
Savannah Peterson
>> This is becoming a thing for us.
Eddie Villalba
>> Well, I want you to come to Austin's though, so got to figure out how to do that.
Savannah Peterson
>> I'm all about that. we'll talk about that.
Eddie Villalba
>> Yeah. Yeah. All right. I like it.
Savannah Peterson
>> We'll stick Bobby on it. We'll find out what to do next. But we had a lot of fun last time. I'm really excited to dig even deeper talking about inference today. I'm going to open it up with you, because this has been a big year of GKE inference-focused announcements, development and collaboration. Can you give us a little recap?
Eddie Villalba
>> Yeah. Sure. So if you think about from when we started talking about all this, so inference gateway, starting at the top of the stack. And I think we looked at the customer journey. What is that customer journey for this new kind of serving workload. So we talked a little bit about that inference gateway and then even going further, okay, great, I've got a gateway that lets me get to what I need to and more performance, better performance, but where do I even start? How do I know what accelerator to use, what model to use? So then working with our team on making sure that we have good referenceable benchmarks that a customer can use and then compare that to what they're doing with our GKE Inference Quickstart and then going through all the announcements that we did with our partnership with Ray and working in the open source with vLLM. So I think if you look at the whole stack, it's us looking at what is that customer's journey to learn what it means to inference a workload in this new world of serving and then figure out what does that pathway look like and then what do we bring as Kubernetes and as a contributor to Kubernetes and as GKE to bring to the fold to make it the best solution for the customers then.
Savannah Peterson
>> Go for it. I can see you're .
Akshay Ram
>> Yeah. I just wanted to jive off what Eddie is saying, is actually in KubeCon EU, we launched inference API primitives and Kubernetes in the open source and they're working backwards from that. As we spoke to customers and traditionally in web services, which Kubernetes is really built for, the request response pattern is quite predictable. You hit a request endpoint. It's a web service. It reads from a database and you respond and then you get predictable request response patterns. But then with LLMs, you can have a request saying just like FAQ, how's the weather kind of request, which is pretty predictable. Or you can say, "Summarize this document with a ginormous document." So the load on the backend varies a lot, so we heard from customers and they're like, "Hey, there's a lot of variance here ."
Savannah Peterson
>> I was just going to say there's a lot of different form factors of what these workloads-
Akshay Ram
>> Exactly.
Eddie Villalba
>> Yeah. Yeah....
Savannah Peterson
>> use cases look like. For you building it, that's a lot of complexity as the creators of this foundation as well.
Eddie Villalba
>> Absolutely.
Akshay Ram
>> Exactly. So then we went to first principles 101. We're like, "Hey, load balancing this round robin, which expects predictability, let's just have a custom metrics lead, a custom metric from the model server," like something like KV cache, which is a good proxy and then route. And then we're like, "Hey, there's benefit there." And that is effectively what Eddie was saying, also is really important, is what customers really care about from inference is performance. "So if there's a performance benefit in terms of lower latency, higher throughput, sign me up for it." That's effectively what they would say.
Savannah Peterson
>> It's what makes it real.
Akshay Ram
>> Exactly.
Eddie Villalba
>> Yeah.
Akshay Ram
>> What makes it real.
Savannah Peterson
>> To the real world, it's inference that's making something instant, or Automagic-
Akshay Ram
>> Exactly.
Eddie Villalba
>> Right....
Savannah Peterson
>> or 10 times faster or 100 times faster than it used to be. And that increases the pressure and visibility of everything that you're working on.
Akshay Ram
>> 100%.
Eddie Villalba
>> Yeah.
Akshay Ram
>> And one more fun fact is here, customers-
Savannah Peterson
>> We want all the fun facts.
Akshay Ram
>> Yeah.
Eddie Villalba
>> We're all about fun facts. Yes.
Akshay Ram
>> Some customers, they don't, if you think about traditional Kubernetes like a web service or a database, you get deployed and you take your time and then you optimize it. Here, it's redlining on day zero, because the whole point is to maximize the throughput on the accelerator. So there's no more optimizations on day one or day two. It's day zero performance maximization. So from that perspective also, it was a big learning in that customers really, really want to get the most out of it. And then you got to assemble the Avengers, the best optimizations you've got.
Savannah Peterson
>> .
Eddie Villalba
>> I love the Avengers. Yeah. That's it. Yeah. Yeah.
Akshay Ram
>> The best optimizations you've got to get the most performance of the hardware.
Savannah Peterson
>> So in the Avengers, what role did Kelsey play?
Eddie Villalba
>> Oh. He was the originator. He's Captain America. Come on.
Savannah Peterson
>> Exactly. Exactly. I was giving you a compliment.
Akshay Ram
>> Yeah. Yeah. Captain America. It's that one. Yeah. .
Savannah Peterson
>> It's Cap. Yeah.
Eddie Villalba
>> Come on. I started with Kubernetes up and running. That's how I learned it. So yeah, we talk about how when we first started. Actually, we had a question yesterday at one of our events and it was someone very new to Kubernetes and they said, Well, we thought Kubernetes was supposed to be only for stateless applications. I'm like, "Well, back then when it first started, sure." But the world has changed and stateful apps have been around now for quite some time right after we changed the name and we finally found a name that stuck, but now it's like this is now even more complex, because we're going down to the entire stack has to be completely integrated, but composable, and that's the hardest thing. If I can be opinionated about a stack from my software and networking layer all the way down to the metal, I can build whatever anybody wants. But when I want it to be so composable and which is Kubernetes supposed to be, that's a lot of variability that has to happen at performance.
Savannah Peterson
>> An incredible amount of variability.
Kelsey Hightower
>> Right.
Akshay Ram
>> Yup.
Eddie Villalba
>> So the amount of work that Akshay's team and our amazing engineering team and the stuff that we're doing, it's literally building for something that is not just for today, but for tomorrow and potentially all these new workloads that are coming in that don't meet the pattern today, but we want to make sure that there's something there for that next pattern. And then what Kelsey built and the team built back in the day was this is something that can orchestrate all that with these principles and look, now we're at this new level and it's still working. Yes, we still have to do work. We have to-
Savannah Peterson
>> Why?...
Eddie Villalba
>> chop wood and carry water to get there, but it works and we're getting there, so yeah,
Savannah Peterson
>> Yeah. Yeah. Yeah. It really does really work there. Kelsey, did you anticipate the robustness of GKE? I know you had predicted that we would be at this stage, but to your point, we're anticipating, and I almost think of it as melting wax or clay. You've got to get into these crevices before they're even fully defined yet. How do you build for a future like that?
Kelsey Hightower
>> Most platforms are a snapshot on the 10 years previous. OpenStack is a snapshot on how people thought about infrastructure 10 years previous. And we think about all the platforms that have come in between. There are always snapshots on how one company or one community has been practicing things. When Kubernetes comes out, there was a critical decision. Brendan Burns, one of the pioneers of Kubernetes, I remember sitting next to him in the Seattle office and we doubled down on this concept called custom resource definition. Kubernetes is a workload API platform and we knew we were never going to figure out every type of workload, so we started with the simple web apps. There was no volumes, no configurations, no secrets. And those things got added over time for the workloads that we understood, but we always knew that there's going to be workloads that we wouldn't be able to define, because we don't know about them yet, so we put a first class extension point. And what that meant was that Kubernetes always gives a type system to a practice. So the community practices things. Inference is still something people are trying to perfect. We've seen people try to do it the old way and in better ways, but what Kubernetes allows you to do is take the practice and turn it into an API. And so I think a lot of the things you've seen Google release over the years have been saying, "What are people actually doing in production and can we codify it so the next generation of people that are trying to do it for the first time, they're not starting over from scratch?" So that's what Kubernetes is and I think that decision to make sure that was going to be extensible was one of the things that allows us to do this without starting over from scratch.
Akshay Ram
>> 100%.
Eddie Villalba
>> .
Akshay Ram
>> Inference gateway is Kubernetes CRD, which effectively allows us to extend and now, we are now in the world of inference.
Eddie Villalba
>> That's right.
Akshay Ram
>> Yeah.
Savannah Peterson
>> Yeah. I think that's a really good point and I love that you just brought up you're never going to know everything you don't know yet about how something's going to be used. I think a lot of people like to pretend like they're going to know and it just doesn't work like that. You never know what's going to happen until something's out in the wild. You said something that was really interesting when we were getting warmed up about being hardware-agnostic. That is a challenge. How are you doing that internally?
Akshay Ram
>> So we're actually working with a lot-
Savannah Peterson
>> I say that with love. Yeah.
Eddie Villalba
>> No. It is hard. .
Savannah Peterson
>> Yeah.
Akshay Ram
>> I think we are working a lot with the open source model server communities. This has been an extension point to Kubernetes, because before, again, in the web service world or in the stateful world, applications were quite decoupled from the hardware, but we said everything is coupled, yeah, entirely.
Savannah Peterson
>> Almost entirely in a lot of cases.
Akshay Ram
>> Yeah. Yeah.
Eddie Villalba
>> Yeah.
Akshay Ram
>> So now we had to extend the communities we work with and some of the communities that are really important now is, for example, the vLLM community. They're an open source model server, which works across both GPUs and TPUs. So we're working with a lot of communities to really think about inference in a way which is reasonably hardware-agnostic. It takes some work, but it's getting there. And what is also the fun fact is a lot of these optimizations, the inference optimizations, they're also written by university researchers. And they're effectively get access to hardware that they'll actually think about how to write the optimizations. vLLM was born out of the UC Berkeley lab. So there are a lot of nuances of working with the researchers, research communities, the open source communities in some of these frameworks to really make it work well across hardware.
Eddie Villalba
>> I think they're the most optimal, because they could be scrappy, because they're researching. Enterprises can't be scrappy as much, because we have stockholders we have to take care of, we have things that we need.
Akshay Ram
>> Yeah.
Kelsey Hightower
>> Yeah. Exactly.
Savannah Peterson
>> Well, we have to pretend there isn't this big mess in innovation, even though there always is a bunch of stuff on the floor as we're doing this.
Eddie Villalba
>> So loving the fact that we can get that bleeding edge thought process, but then really quickly, the channel and then the community that was built by Brendan and Kelsey and the whole team and Tim and everybody and building that community, come together saying now, there's a fast path from this researcher at a university thinking this is cool and this is how we're doing it to production at a hyperscaler.
Akshay Ram
>> Yeah.
Savannah Peterson
>> Right.
Akshay Ram
>> And then have a mix.
Savannah Peterson
>> Which is wild.
Eddie Villalba
>> Yeah. It's crazy.
Savannah Peterson
>> Just to double down on that, the thing that's inspired me personally the most over the last few years is the true collaboration between enterprise, a startup-
Akshay Ram
>> Research....
Savannah Peterson
>> research, government. We can't build the future unless everyone's at least communicating.
Eddie Villalba
>> That's right.
Akshay Ram
>> 100%
Savannah Peterson
>> And it's quite a moment for all of the nerds in the world.
Eddie Villalba
>> Yeah.
Kelsey Hightower
>> It also helps that the workload is so expensive to run.
Savannah Peterson
>> An important point, Kelsey.
Kelsey Hightower
>> But there's an incentive to try new things that can reduce the cost.
Eddie Villalba
>> That's right.
Akshay Ram
>> Yeah.
Savannah Peterson
>> Absolutely right.
Kelsey Hightower
>> Because when you talk about a 5% difference in cost, most people will say, "We'll just keep it as it is." When you talk about huge contributions that reduce the bottom line greatly. Sure, I will try that experimental thing if it's the best solution for me to even attempt to run this workload.
Eddie Villalba
>> Yeah.
Akshay Ram
>> 100%. It's like an open marketplace and anybody can come and contribute and the best optimization wins. To your point, even if it's experimental, it can drive a lot of optimization, so I'm going to use it.
Eddie Villalba
>> That's right.
Akshay Ram
>> That's 100% right.
Savannah Peterson
>> And it continues the flywheel of creativity I think within this community, which is a really unique piece of this. You touched on a conversation you were having, Eddie, with someone who was just getting started with Kubernetes and I actually want to come bring us up just for a second, because it can feel potentially overwhelming hearing some of the conversation here, seeing us celebrate 10 years of GKE. The reality is there's still a lot of people just getting started, especially with the AI workload catalyst that's been happening. So I would love to hear your three very inspiring people. What would be your advice, Eddie? I'm going to you first. What would be your advice for someone getting started or trying to figure out how to navigate this right now?
Eddie Villalba
>> Yeah. I'd say one is just I think we had a question just like that and we said, look, understanding those base principles about these are resources that you can use and as an API to go to it and it's an orchestration engine. Understanding at least those base principles I think is almost a critical step in realizing this is a lot more than just a web server that's running on a server.
Savannah Peterson
>> Correct.
Eddie Villalba
>> It's something that does a lot of things, has a lot of moving parts and everything else is just an upscale from there. Everything else has just got a deeper dive from there. But I would say understanding in depth, what does it mean to say when Kelsey says we have a resource and that has a type and that type has a certain contract and this is what that contract should look like. And then the next layer right below that was then to say how does Kubernetes handle that type when certain things happen, so that you understand what's happening behind the scenes? To me, coming from learning from Kelsey and I got to work with also Brendan and understand the distributed computing concept and say, well, how is all this being reconciled and going through the controller? That goes deep, but at the same time, it was almost like a light bulb went off and it's like this is more than just I'm running a container somewhere. There's something that's making intelligent decisions about where things need to be and it has to follow a pattern. And then once you get that, then all the pieces start to fall into place. And I mentioned this on our last podcast. Inference is just another workload. Highly specialized serving workload for sure, but it's just another workload. So if I can get those principles of what does it mean to scale a web service to millions of pods and containers and so on, so forth, the next step is, okay, now, I just need to know where those other things run from, but it's the same kind of concept. Different scale, different economics, but it's the same.
Savannah Peterson
>> Well, it's another thing that's got to get done if we're going to make all this real-
Eddie Villalba
>> ....
Savannah Peterson
>> so it makes perfect sense. Kelsey, what about you? What's your advice? You've probably given more advice about this than anyone perhaps, so yeah, how has it changed today?
Kelsey Hightower
>> I remind people that they're getting started, but they're not starting over. And I think a lot of times when companies approach these problems, they're thinking they're the first people in the world to solve these problems. Just like when you write code, we don't write libraries from scratch. You go find one that works. We find frameworks that are tried and true and we bring those in. Kubernetes is a infrastructure framework. And a lot of the work that we're doing, we're making new infrastructure libraries. And whenever a new database shows up, new drivers and frameworks show up. And there's nothing different about this example, so even if you think you're a unique snowflake, you're probably not.
Savannah Peterson
>> Real talk. You're not.
Kelsey Hightower
>> Yeah.
Savannah Peterson
>> Yeah.
Kelsey Hightower
>> It makes a lot of sense to say what is the industry doing? And Kubernetes is one of those rare instances where the community has found Switzerland and Kubernetes and we try to make these ideas work there, because the value is around Kubernetes, and we found comfort in making Kubernetes the operating system and we contribute our drivers, we contribute our abstractions, so you can get on with the rest of it. So that's the way I would encourage people to look at it. If you're getting in here late, you have the late mover advantage. People have been doing this for a decade. You just get to inherit the maturity.
Eddie Villalba
>> That's right.
Savannah Peterson
>> I love that. That's a great way to look at it and thinking of Kubernetes as the comfort food on what is a bit of a roller coaster ride for all of us right now to tie together our food analogies. All right. What's your advice?
Akshay Ram
>> I think my advice is similar to what Eddie and Kelsey said, but in a little bit of a spin in that it's actually we're in a moment where it's in the early days of Kubernetes where so much is an open source. To your point, the communities are all working together, so everyone's actually learning on the fly.
Savannah Peterson
>> It feels good. It feels safer to learn on the fly. .
Akshay Ram
>> It feels safer to learn on the fly, so you can absolutely ask the dumb question .
Savannah Peterson
>> There is no stupid question.
Akshay Ram
>> And there is no stupid question.
Eddie Villalba
>> Yeah. There isn't.
Savannah Peterson
>> No. This is not-
Akshay Ram
>> So then you can actually ask the 101 question and I feel like in general, the vibe is more around curiosity, exploration and really optimization of a specific hardware. So I think it is actually far more approachable than it seems to be. We on GKE have released a bunch of quick starts. We're constantly doing benchmarks. As Eddie said, don't focus on the differences. Focus on the similarities, but there's so much which is similar-
Eddie Villalba
>> Sure.
Savannah Peterson
>> And that's a really good point....
Akshay Ram
>> and the difference is only the small point. And even the differences, there's a community of people who are there to help out. So yeah, there's-
Savannah Peterson
>> You're not alone....
Akshay Ram
>> so many Slacks which are so active. If you look at the vLLM Slack, there are questions on Kubernetes, there's question on a new model, there's question all sorts of stuff. So there's a lot of community, so you can jump on board if you're ready. Yeah.
Savannah Peterson
>> Yeah. I love that as a note. I couldn't agree with that more. And there's plenty of people who want to help. We all had to have help when we got started too. Nobody's above-
Eddie Villalba
>> That's right. Yeah....
Savannah Peterson
>> reaching out or saying, "Hey, this is actually how I do it. Come over. Look over my shoulder, Whatever that might be, or come play in my sandbox." I think that's really important. Okay. I've got one final question for you, gentlemen. Last segment, I asked Kelsey where this was going to be in 10 years, but I'm going to bring it tighter, because we're moving fast when we're talking about inference, which is obviously fast and it matters. When we're hanging out at KubeCon in Salt Lake City in 2026 or Amsterdam if you're feeling ambitious, totally up to you, what do you hope to be able to say then that you can't yet say today? Akshay, I'm going to start with you.
Akshay Ram
>> Yeah. That's a tough one.
Kelsey Hightower
>> Look, some of the best platforms in the ecosystem right now actually come from the non-vendors. So the thing Google has done really well is we married Kubernetes to the cloud. We abstracted away the pain points that it takes to run it, but some of the best deployment software comes from a tax company. Some of the observability comes from-
Savannah Peterson
>> Which is wild to think about straight up-
Eddie Villalba
>> ....
Savannah Peterson
>> Kelsey, because, yeah, we don't think of the tax companies as the number one on the innovations list.
Kelsey Hightower
>> I think Google's always done a good job of making sure that we build solutions that are componentized. So what I think is going to happen next year is when customers that are using this in production get their hands on it, they're going to customize it to that final step. It's like going to the grocery store, buying your own ingredients, but when you get home, you decide if you want to modify the recipe a little bit to your own taste. So once these things get released, the announcements went out. When customers get their hands on it, someone's also worked in the cloud for a number of years, we step back and observe like, wow, we never imagined people will take these components and do that.
Savannah Peterson
>> It's so cool.
Kelsey Hightower
>> .
Savannah Peterson
>> That's got to be one of the most fun parts.
Kelsey Hightower
>> Yeah. I think in 2026, we'll see someone on that stage talking about that thing we never thought that they were going to do.
Eddie Villalba
>> Yeah. And I would then double down and say, I hope, aspirationally, what we see is the foundational model builders. The Google DeepMinds, the Anthropics, all of the ones that are at the edge of it, they're going to say, "We always have to have our competitive piece, but we're going to be able to trickle those down to the startup, to the smaller company." We're still all in the same boat right now, but once these larger organizations start doing these at scale, at high performance, at a low cost and I'm hoping that aspirationally, Google Cloud specifically when having relations with these customers can bring that information down in a more consumable way for every customer. The federal government, the bank, the healthcare provider, the startup that's just starting up a little app tomorrow, they should all be able to say, "I can play the same way that these big guys are doing it, but at my scale." And I think that's we can do.
Savannah Peterson
>> Absolutely. My car can look different, but the engine's the same.
Eddie Villalba
>> Yeah.
Akshay Ram
>> Yeah.
Savannah Peterson
>> Yeah. No. I love that. All right. Now, we're coming back to you. You got an extra bonus minute there to think your-
Akshay Ram
>> No worries. Yeah. Yeah. I just want to summarize what Eddie says, because I feel the same way. Inference should just be the new microservice. It's just another workload and people should really be talking about the value they get out of it in terms of the productivity gains, in terms of how it's transforming their organization and how they're doing it at low cost and low scale.
Savannah Peterson
>> Awesome.
Akshay Ram
>> High scale. Yeah.
Savannah Peterson
>> Well, I can't wait to see the thing we never thought of and all of that being done at scale, at low cost, and perhaps efficient for our wonderful planet. Gentlemen, thank you.
Kelsey Hightower
>> Yeah. Thanks.
Akshay Ram
>> Thank you.
Savannah Peterson
>> This has been really delightful. And thank all of you for tuning in to our fabulous VIP special series with Google Cloud here at KubeCon in Atlanta, Georgia. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.