Stu Miniman & Robert Shaw, Red Hat

At KubeCon + CloudNativeCon North America in Atlanta, theCUBE’s Rob Strechay sits down with Red Hat’s Stu Miniman, senior director of market insights, and Robert Shaw, director of engineering, for a ground-level look at how Kubernetes is becoming the production backbone for AI. Shaw explains why most large language model (LLM) deployments are landing on Kubernetes and unpacks the latest on vLLM and LLMD – two projects hardening inference at both node and cluster scale. He details how vLLM maps the fast-moving open-weight model ecosystem (e.g., Llama series and new entrants like DeepSeek) to diverse accelerators from NVIDIA, AMD, Google TPU, Intel and AWS, while LLMD targets cluster-level optimizations such as load balancing heterogeneous workloads and specializing pre-fill vs. decode phases to boost tokens-per-node.

Miniman connects these innovations to what Red Hat is showcasing at the event: trust and security in the AI era (with projects like SPIFFE/SPIRE and KServe), plus hands-on learning at OpenShift Commons. He highlights community stories from industries such as financial services and public sector (with names like Ford, Morgan Stanley and Northrop Grumman) and underscores how platform engineering, observability and hybrid/edge architectures are evolving to meet demanding AI inference patterns. The conversation also touches on cost and performance economics, why hybrid remains foundational for AI training and inferencing, and how Kubernetes, GitOps and CNCF projects are coalescing to scale real-world AI use cases beyond simple chatbots into agentic applications.