In this interview from KubeCon + CloudNativeCon EU 2026 in Amsterdam, Brian Stevens, senior vice president and AI chief technology officer of Red Hat, joins Robert Shaw, director of engineering at Red Hat, to talk with theCUBE Research's Rob Strechay and Rebecca Knight about the contribution of llm-d to the CNCF and what it means for bringing production-grade AI inference into the Kubernetes ecosystem. Stevens explains why inference — not training — is becoming the critical challenge as enterprises move AI into production, and why CIOs need infrastructure that speaks Kubernetes. Shaw, a maintainer of llm-d and longtime vLLM contributor, details how the project optimizes entire clusters of model servers to handle the explosive token demands of modern agentic workloads. Together they describe an SLO-driven architecture that disaggregates prefill and decode phases, giving IT teams independent control over input processing and token generation.
Key themes include the cross-foundation collaboration that made llm-d possible, with core changes flowing into vLLM under PyTorch, KServe adapting its custom resource definitions and the Kubernetes gateway becoming AI-aware. Shaw outlines how enterprises are splitting GPU clusters into two deployment patterns: dedicated monolithic stacks for high-priority workloads and shared multi-tenant model-as-a-service environments where developers across the organization experiment and build. He highlights the roadmap ahead, including request prioritization for interleaving critical and non-critical applications, support for next-generation rack-scale accelerator architectures and the security challenges emerging from agentic patterns. Stevens reflects on how rapidly the landscape has shifted — from every enterprise building bespoke DIY inference stacks a year ago to a standardized, community-driven reference architecture today. From the accelerating quality of open source models to the growing compute demands of agentic AI, both leaders provide a practical roadmap for how Kubernetes-native inference will scale to meet enterprise workloads in the years ahead.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
KubeCon + CloudNativeCon EU 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for KubeCon EU 2026
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for KubeCon EU 2026.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
KubeCon + CloudNativeCon EU 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to KubeCon + CloudNativeCon EU 2026
Please sign in with LinkedIn to continue to KubeCon + CloudNativeCon EU 2026. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
KubeCon + CloudNativeCon 2026 Preview with Mike Barrett
In this interview from KubeCon + CloudNativeCon EU 2026 in Amsterdam, Brian Stevens, senior vice president and AI chief technology officer of Red Hat, joins Robert Shaw, director of engineering at Red Hat, to talk with theCUBE Research's Rob Strechay and Rebecca Knight about the contribution of llm-d to the CNCF and what it means for bringing production-grade AI inference into the Kubernetes ecosystem. Stevens explains why inference — not training — is becoming the critical challenge as enterprises move AI into production, and why CIOs need infrastructure that speaks Kubernetes. Shaw, a maintainer of llm-d and longtime vLLM contributor, details how the project optimizes entire clusters of model servers to handle the explosive token demands of modern agentic workloads. Together they describe an SLO-driven architecture that disaggregates prefill and decode phases, giving IT teams independent control over input processing and token generation.
Key themes include the cross-foundation collaboration that made llm-d possible, with core changes flowing into vLLM under PyTorch, KServe adapting its custom resource definitions and the Kubernetes gateway becoming AI-aware. Shaw outlines how enterprises are splitting GPU clusters into two deployment patterns: dedicated monolithic stacks for high-priority workloads and shared multi-tenant model-as-a-service environments where developers across the organization experiment and build. He highlights the roadmap ahead, including request prioritization for interleaving critical and non-critical applications, support for next-generation rack-scale accelerator architectures and the security challenges emerging from agentic patterns. Stevens reflects on how rapidly the landscape has shifted — from every enterprise building bespoke DIY inference stacks a year ago to a standardized, community-driven reference architecture today. From the accelerating quality of open source models to the growing compute demands of agentic AI, both leaders provide a practical roadmap for how Kubernetes-native inference will scale to meet enterprise workloads in the years ahead.
KubeCon + CloudNativeCon 2026 Preview with Mike Barrett
Mike Barrett
VP & GM, Hybrid PlatformsRed Hat
Rob Strechay
Dir./Principal Analyst & HosttheCUBE Research
HOST
In this interview from KubeCon + CloudNativeCon EU 2026 in Amsterdam, Brian Stevens, senior vice president and AI chief technology officer of Red Hat, joins Robert Shaw, director of engineering at Red Hat, to talk with theCUBE Research's Rob Strechay and Rebecca Knight about the contribution of llm-d to the CNCF and what it means for bringing production-grade AI inference into the Kubernetes ecosystem. Stevens explains why inference — not training — is becoming the critical challenge as enterprises move AI into production, and why CIOs need infrastructure tha...Read more
exploreKeep Exploring
What is the significance of KubeCon + CloudNativeCon Europe 2026 in Amsterdam for AI infrastructure and the cloud native ecosystem?add
Why do the Linux Foundation and CNCF ecosystems matter for AI infrastructure and open-source security, and why is the Linux Foundation's $12.5 million funding significant?add
What were the main topics or concerns customers in Europe raised during visits around KubeCon in Amsterdam?add
How are customers deploying and managing frontier and smaller language models across their organizations, and what platform capabilities do they need to support mixed-model, large-scale AI deployments?add
What types of AI applications are appearing in the telco/telecom market, and how does the shift to horizontal cloud architectures position telcos to benefit from AI?add
What security features and emerging technologies are being implemented to help customers when they move workloads off-premises or adopt cloud services?add
KubeCon + CloudNativeCon 2026 Preview with Mike Barrett
search
>> Hello and welcome to The Kube, as we head into KubeCon, CloudNativeCon, Europe 2026 in Amsterdam. It's clear that this event has become much bigger than just Kubernetes. I've been saying that for a while, that the cloud native part has really been hitting, to put it mildly. What started as a developer and platform engineering conference is now one of the most important global stages for AI infrastructure. Not just models, but the systems that make AI usable, governable, scalable, in production. And Europe is critical lens for this conversation as we converge on all of these forces. AI adoption is accelerating across industries, regulatory pressure and digital sovereignty requirements are tightening and growing realization that open source and cloud native architectures are the only viable way to balance innovation with control. That's why the Linux Foundation and CNCF ecosystem matter more than ever. They aren't just hosting projects, they're defining the operating model for AI infrastructure. In fact, this week, the Linux Foundation announced a $12.5 million in new funding to strengthen open source security coming from a coalition of leading organizations, including Anthropic, AWS, GitHub, Google, DeepMind, Microsoft, OpenAI, and more. That funding is being managed through Alpha-Omega and the Open Source Security Foundation or Open SSF, who we'll be talking to at KubeCon next week. And the reason this matters is, it's all tied directly into AI. As AI accelerates development, it's accelerating the discovery of vulnerabilities in open source software, creating real pressure on maintainers and the integrity of entire software chains. This is where KubeCon and CloudNativeCon EU becomes uniquely important. It's where platform engineering meets AI reality, where CNCF projects become the foundation for AI pipelines, and where European priorities, sovereignty, compliance, data control, reshape global architectures. So to help me break this all down and to add some perspective, I want to welcome in Mike Barrett, who's the vice president and general manager of Red Hat Hybrid Platforms. Welcome on board, Mike.>> Yay. Thanks, Rob. It's great to be here.>> Yeah. I mean, I think that was a long way of getting that this is probably one of the most critical KubeCon CloudNativeCons ever. I think that a lot of the reality of getting to actual ROI of AI is really hinging on being able to do AI at scale with sovereignty and really have it be protected. You just came back from a series of customer visits across Europe. What stood out to you most in terms of how organizations are actually approaching AI?>> Yeah, and it's a great question since KubeCon is going to be in Europe, Amsterdam, what's going on over there. I did have an opportunity to visit, I think seven countries. I was over there for quite some time hitting some of our largest customers. It really came down to four main things that people wanted to talk to me about. And AI was obviously one of them, but there were some other things. There's virtualization. There is still a very strong desire to modernize your virtualization stack, and to move further away from legacy solutions. I think that's because if you are going to go spend a dollar, which the economics of that situation has changed dramatically, and that's what's causing people to go investigate it again. But if you are going to spend a dollar in that area, you want to make sure that dollar gives you something more than just what you had. And is it giving you a platform and a solution that can hold your AI future that you can build on? So we talked a lot about virtualization with our customers. Then we got into AI, and we're going to get in some of the technologies, I think, later in the discussion. But what people are starting to realize right now is they've experimented, and a lot of them have moved to production. But a lot of them are doing so, I don't want to say with a easy path, but with frontier models. And with a frontier model, you can accomplish a ton. But when you hit a home run with a frontier model, other people in the company want it. They want to touch it, they want to participate in it, and then you are pushing that solution out to tens of thousands of employees. And then it hits you that, oh, we should probably look to see if we're doing this the most effective way. Can we mix in some small language models to do things along the way of calling back to this frontier model? So we see a lot of mixing of models at this point. Maybe the finance BU has their Llama model. Maybe the accounting BU has a OpenAI model. So it's what is the most cost-effective way to surface the intelligence that you want back? And because of that, they're looking for a horizontal platform. They're looking for something that has the chops to do the inference, to do the model selection, the cataloging, the pipelining, all those agent orchestration tasks. And that's where we really come in strong with people that are looking to mix those things. The last thing is developer services, was front of mind. And a lot of that has to do with what you talked about. This new attention on CVEs and hardening image and having zero-CVE images, there's literally an entire startup market that has popped up and given birth to pursuing this, this taking open source solutions and selling to the customer the fact that you will build that for them in a SLSA Level 3 environment that is ready to be compliant with anything that you may be getting into as an industry. Red Hat has always done that, but we've always actually participated, maintained, and fixed the open source too. But now customers want us to expose to them how we're building our software, and they want to participate in that zero-CVE and the hardened image solution with us. And so we're working a lot towards that as well.>> Yeah. I think that you hit the nail on the head with that when you start to look at how everything is really just ramped up, and how people want to really join in from a community perspective. I know part of your tour over there, you also got down to Spain, you were at Mobile World Congress, which was really showing off the future vision of telco and AI. How did what you saw at MWC compare to what customers are actually doing today, especially around AI infrastructure and cloud adoption?>> Yeah, what a lineup. You have Mobile World Congress, you have KubeCon. In between, you have the NVIDIA Conference. And as I'm walking around these things, I see the Kube booth everywhere. And let me just stop for a minute and say thank you to The Kube for what you do. We all watch the morning shows like ABC Morning Show, CBS Morning Show, they have guests on. We watch the NFL pregame shows. We watch Face the Nation. You are that for our industry. And without you in filling that hole, I don't know what we would do. So thank you for what you do.>> Well, we appreciate it.>> Yeah. So the Mobile World Congress, if you have not been... Have you been to this conference?>> I have. It's been a while. I wasn't there this year.>> Yeah. So I grew up with a father who loved cars. And so for the '80s and '90s and early-2000s, I went to the Detroit Auto Show, which was the North American car show that all fans went to. This was like that. These are massive booths with huge displays, which take up eight halls just for the exhibit hall, just for the vendors to participate in. And the vendors, their booths have meetings and bathrooms and facilities. It's insane, the size of this show. Of course, there's 6G. There was Starlink talking about satellite-to-phone devices. There's a lot of smart devices and screens, but there was a lot of AI solutions. And just about every banner of every booth had the word AI in it. And for this market, for telco market, you get a lot of error correction, self-healing, aggregate decisions, autonomous RAN routing. But for the most part, it falls into two buckets. You have AI being applied to what is going across the network, the packets, the technology of telecommunications. And then you have AI being applied to the data inside the packets, like the actual intelligence of our data and what we're talking about to each other. I would say, 75% of what I saw in terms of AI was on the packet level. What are we doing for error correction and self-healing and aggregate decision-making? And maybe 25% was actually getting into the packets and start ripping apart the data. But I think what's exciting to the world around AI in telco and why it's like a peanut butter and jelly is because telco has all the energy. If you look around the world, who owns the feeds to power servers, it's, for the most part, the telco companies. And when you look at who governs data, and where the data is flowing through, it's mostly telco companies. So if you can have GPUs and you can have intelligence, that's a very lucrative technology sector to participate in. When we partner with our customers that are starting to get involved in AI, we've told a consistent story probably for the last eight years in this telco industry, and it's one of horizontal cloud. So if you're unfamiliar, a lot of these telcos have vertical stacks. When they go and buy a solution, they're probably buying solutions that involve the hardware, the operating system, and actual bits and bytes of the stack and the top from Nokia and Ericsson and everybody. Because of the beauty of Linux and the beauty of Kubernetes and the beauty of standards, it's opened the door to a lot of them to look at how I can cut costs and have a horizontal investment that I bring multiple vendors in who all agree to APIs and specifications, and then I can cut down my costs, my operating costs, and I can do more with that technology. So for those vendors and those providers that have moved to a horizontal cloud, they're in a perfect position to take advantage of AI because AI comes from them. It doesn't come from the solutions that they're buying and putting onto their vertical stacks, and that's what we help them position themselves for.>> Yeah. And like you said, I think that entire thing wraps around that whole car analogy. You understand what's going on across the industry, and there's different standards, and then people innovate on top of that as well.>> One other thing I noticed, and this connects to my trip across Europe, a lot of the telcos are getting into sovereignty aggressively. And it's because a lot of the boutique and regional telcos always had business services businesses. So in your local town, you'd probably have a local telco provider who's selling to businesses like phone systems and websites and things of that nature. Now they're selling servers because of sovereignty, because of people that are looking for colos and managed service providers, they can do so with the extra slack from those regional telco providers. And we have a lot of them turning to us to build stacks in all sorts of places all over the world.>> Yeah. And I think that's also, I mean, I know a lot you're working with are those what were called neoclouds. Now, I think they want to be called AI clouds or something to that effect. But when you start to look at these specialized providers for that type of infrastructure, the networking is one of the hardest parts of AI. And it's because I think, and I was talking about this at GTC yesterday where I'm coming to you from. So when you start to look at it across that, are you bringing the data to the AI or the AI to the data? And I think the answer is yes, it's going to be both, which is really, again, a fascinating thing. And I think sovereignty is a big push of that, like you were saying. Was there anything that you saw from a sovereignty perspective that you think is really going to drive this KubeCon, CloudNativeCon in Europe this year?>> Oh, yes. Yeah. It's one of those things that when you take a step back and you think, could I have foreseen this? Is this something that was showing signals that I should have picked up on? But sovereignty is massive right now, massive for the industry, but also for Red Hat's relationship to our customers. And I try to think in my mind and track back, what was the things that happened along the way? And I'm guessing that it probably started with around COVID, where a lot of those major industries realized that it wasn't a global economy, that stuff was concentrated in certain areas, and people needed to diversify and build things in other countries to protect themselves. Then Dora hit really aggressively. And this was the fact that if I'm in certain industries, I need to prove that I'm on multiple clouds. I'm not allowed to be completely invested in one cloud. I need to show that I have exit strategies. And then the word geopolitics came in, and I don't want to get The Kube banned on X, so we'll just call it geopolitical. But in this bucket, you have wars and tariffs and personalities and there's all sorts of fun stuff that we wrap up in the word. Then NIST 2 hit. And NIST 2 was a new requirement where it was born, I think, out of a lot of the ransom attacks and a lot of some of the other attacks. But in this new solution or regulation, you had to prove that you could restore from not backup. You had to prove that you could rebuild your entire business, and had plans to build the entire business from scratch. It meant deploying stuff new and hydrating it in with runtime information instead of just hitting a go-to-the-dark site or HA site, because that could have been attacked as well.>> Absolutely.>> Then Cyber Resiliency Act hit.>> I was going to say, the CRA one is coming on strong.>> Yeah. And this one hits vendors a lot because we have to prove how... We have to open the kimono a bit and show you how we build software, and prove to you that we have providence along the way and that we have various particular timeframes on when we fix things. All this added up to the type of customer that looked at it and said, "Look, we need some digital independence. We need to realize that no matter what we're invested in and how much it works for our business, we need to start carving out some time to make sure that we can return to our country if we have to return to very specific locations if we have to." And we've been helping a lot of our customers with that.
What that brings in is a lot of security features. It brings in stuff like post quantum encryption, which is going to have a huge milestone later in 2026 that the US government is mandating. It has confidential computing. So a lot of our hardware these days have these special enclaves that you can encrypt in memory. And everybody understands encrypting on the file and encrypting over the network, but nobody talks about encrypting in memory, but it's important if you don't own the infrastructure anymore. If you're doing a colo and you push it out to a telco, but it's an intimate thing, you want that. Then we get into workload identity. The clouds have been great about it. I think the cloud's taught the rest of the world about it on how you get on an AWS and you have to attest that you're Mike Barrett before you're allowed to take an RDS endpoint. Everything has that handshake, but we don't necessarily have that on-prem. And so we invest in SPIFFE and SPIRE, for example, that brings that sort of quality everywhere. And then you have AI. And there's some really exciting things going on in AI.>> Yeah. I like how you kind of framed that up because I think when you start to look at the ecosystem that the CNCF has really built, it's really heavily evolving as kind of the foundation for AI. I think a lot of people looked at, and like you said earlier, the models were kind of the sexy part. I honestly think that the infrastructure has become sexy again, because to be able to get to things like inference and agentic systems and to be able to scale, it's really important. What are you seeing from the CNCF landscape evolving because you guys play across so many different pieces of it?>> Yeah. I think it started three KubeCons ago, was it, where in the keynote they interviewed the big names in the Kubernetes ecosystem. And they said, "Hey, what's next? What is the big thing?" And a lot of them said inference. And three years ago when you said the word inference, people were like, "What? What is that?" But it has come back to inference because I think what people realize is that Kube, as successful as it is, and will remain to be, was written for webscale. It was written for the type of applications that horizontally scale. And then we added persistence and we added stateful sets and we added job sets. Everything had a set at the end of it that taught Kube how to sort of manage that application pattern for you. Now, we have inference and Kube didn't necessarily have those chops. When you look at inference, you need an inference server to begin with. Red Hat didn't necessarily have one back three years ago. We went out and purchased Neural Magic, which just was a brain tank of people that work on inference and the VLLM project that has been nothing but a home run for everybody because it really tailors itself to any GPU and TPU and sort of processor type, accelerator type, but also any model. And it does both ends of that for you instead of being very specific to a vendor solution. But once we took that into the solution, we needed to figure out how to distribute it across Kubernetes. And so we start this llm-d project and the llm-d project has a lot of really cool things inside it. And some of those cool things are like leader worker set. When you look at a web server, you understand that when an instance goes down, you want to just automatically add another instance. But what if it's a larger AI job? You need something that understands that, well, this created a hundred pods and one of the pods went down. Do you want to restart a hundred pods or do you want to just go with the 99 pods? You need all this decision that wasn't baked into the intelligence, so that's there now. You need a batching facility that is bigger than the cluster it's sitting on. So how do I batch things across multiple Kubernetes clusters? And that's the Kube project. They're sharing a very expensive hardware, and that's the DRA project. There's routing to very specific model endpoints with your tokens instead of going to the least-loaded model endpoint. And that's AI routing, that's intelligent routing. And then there's this sharing of the KV-cache across multiple machines and making sure that you can understand that topology, which is now bigger than a single node in a distributed system. So there's just tons of work that went in to get Kubernetes where it is. A lot of that is wrapped up in this llm-d project in the vLLM project for us. But that was generation one, now we're seeing immediately now, you mentioned you're at the NVIDIA Conference, there's a lot of beautiful things happen with agentic processes; in our personal lives, in the enterprise lives. But I think everybody realizes that we need a way to sandbox these things. We need a way to not let them run on your work laptop and have accidental access to everything. Everything has got to be very predetermined in their environment that you're going to allow them to run in. So I think what you'll see at KubeCon this year is a lot of people talking about how we take all these concepts that we've had in Linux and Kubernetes and put them together to sandbox something in an intelligent way for things like OpenClaw and NemoClaw to run in.>> Yeah, I agree with you. And I mean, again, it's I think going to be a lot of fun because llm-d and vLLM are two of my favorite things that are going on within that AI space because I think like you said, it's stuff, the DNA of Red Hat, of being able to help bring this to everybody has been huge. I mean, I've heard vLLM a bunch while I've been here at GTC, and it's going to be the talk next week. But I also want to kind of hit on something before we bust out of, here is that I'm really excited on Monday I'll be at Red Hat Commons. You also have a keynote going on as part of the bigger KubeCon, CloudNativeCon in Amsterdam. Kind of help us and kind of give us a little bit of a preview into that, and what people should be paying attention to as they go to these. And if you haven't signed up for Commons, do, because legit, the best things is the networking and the conversations with the customers and sharing ideas. I mean, that, to me, was one of the best things at that. Beyond what you learn, which is awesome as well.>> Yeah. And I know there's a lot of day-zero events to choose from and you only have eight hours of a day. But yeah, if you want to hear actual use cases from real customers, that's always been the draw for OpenShift Commons for people to come in and realize, "Oh wow, that's touching that industry." But we've always taken KubeCon and the CNCF extremely seriously. It's one of our pride and joys that we send our engineers to the upstream first. It's what we do here at Red Hat. And then we downstream a product from that. If something's delayed or there's disagreements in the upstream, we will not ship our product until that is solved. So we're definitely upstream first, open source first, down to our core. When you look at the schedule for KubeCon in Amsterdam, and you search Red Hat, there's 55 hits. Across the stack, things you've never heard of in the networking, things you never heard of in the shared processor, all the way up to AI, all the way up to sovereignty, we're talking about it a lot. I think we have seven projects that are trying to graduate, five incubating, 12 in Sandbox. So there's just a lot of open source happening there. We're involved with two of the keynotes. We have one with Solo and with PyTorch Foundation where we'll be talking about llm-d and vLLM. And then we have one that's very specific towards sovereignty and AI. On the Commons side, an unbelievable lineup. So we have this company, Diebold, who creates ATMs and intelligent payment systems. And you'll hear about how they're using Kubernetes to modernize that. We have Siemens, a massive global corporation talking about how they're moving off legacy virtualization over to KubeVirt and OpenShift. You'll have the airport in that region, Schiphol, talking about how they're maintaining their sovereignty and getting into CVU awareness with our ACS solution. You have the UK Health Security Agency talking about how they're using confidential containers. This is cutting-edge stuff up in our ARO solution on Azure. There's a robotic dog that is going to be there that is running OpenShift AI to issue its commands. And in that area of BBVA, the financial institute is big on OpenShift, and they'll be talking about how they're running AI and machine learning workloads. So definitely stop by please. It's been nothing but excellence at this event. So we look forward to seeing everybody there.>> Yep, absolutely, I can attest to it. Well, Mike, thanks for coming on board today. This has been great. I really appreciate you sharing all of that information. I look forward to it, seeing everybody next week.>> Yeah, it was great to be here, and thank you for The Kube.>> Yes. And thank you all out there for watching this KubeCon, CloudNativeCon EU Amsterdam 2026 on The Kube. Stay tuned, we're going to have a whole lineup next week. A ton from Tuesday through Thursday. Join us then on The Kube, the leader in analysis and news.