Just Another Container: Demystifying Gen AI Inference on GKE | Google Cloud Passport to Containers
What does it really take to run generative AI at scale? In this Google Cloud Partner AI Series episode, theCUBE Research’s Savannah Peterson sits down with Poonam Lamba, senior product manager of GKE AI inference and stateful workloads, Google Cloud, at Google, and Eddie Villalba, outbound product manager at Google Cloud, to unpack how Kubernetes — specifically GKE — is evolving to support enterprise AI inference with real-world impact. Lamba shares how Google is meeting developers where they are, with tools such as the GKE Inference Gateway and custom compute classes. Eddie Villalba adds his perspective on how AI is “just another workload” — but with some important twists. From dynamic scheduling to stateful services and network-aware storage, the discussion makes it clear: Kubernetes isn’t just powering the web anymore — it’s the foundation for AI at scale. Whether you’re deep into DevOps or exploring agentic AI, this episode offers a grounded look at what’s next for containerized intelligence.