In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane state from etcd to Spanner for massive scale, and how new Kubernetes capabilities like Dynamic Resource Allocation, in-place pod resizing, Vertical Pod Autoscaling and improved cluster autoscaling are helping customers run AI on Kubernetes and manage Kubernetes with AI.
The conversation also explores how hardware limits and efficiency are reshaping cloud-native design, from power and cooling innovations seen at Supercomputing to squeezing more capacity into every data center. Holterhoff shares how Kubernetes, AI conformance efforts and projects like OpenTelemetry (OTel) are coming together to support AI agents and complex workflows with strong community backing and observability. Looking ahead, Macleod points to a future of millions of accelerators on Kubernetes clusters and better “graceful degradation” as systems hit scale ceilings, while Singh envisions true platform agents that can auto-size and reshape pods so developers simply deploy and let the platform optimize.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for GKE 10 Year Anniversary Exclusive
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for GKE 10 Year Anniversary Exclusive.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
GKE 10-Year Anniversary Exclusive. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to GKE 10-Year Anniversary Exclusive
Please sign in with LinkedIn to continue to GKE 10-Year Anniversary Exclusive. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Jago Macleod, Gari Singh, Google & Kate Holterhoff, RedMonk
In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane state from etcd to Spanner for massive scale, and how new Kubernetes capabilities like Dynamic Resource Allocation, in-place pod resizing, Vertical Pod Autoscaling and improved cluster autoscaling are helping customers run AI on Kubernetes and manage Kubernetes with AI.
The conversation also explores how hardware limits and efficiency are reshaping cloud-native design, from power and cooling innovations seen at Supercomputing to squeezing more capacity into every data center. Holterhoff shares how Kubernetes, AI conformance efforts and projects like OpenTelemetry (OTel) are coming together to support AI agents and complex workflows with strong community backing and observability. Looking ahead, Macleod points to a future of millions of accelerators on Kubernetes clusters and better “graceful degradation” as systems hit scale ceilings, while Singh envisions true platform agents that can auto-size and reshape pods so developers simply deploy and let the platform optimize.
Jago Macleod, Gari Singh, Google & Kate Holterhoff, RedMonk
Savannah Peterson
Principal Analyst & HostSiliconANGLE Media, Inc.
HOST
Jago Macleod
Director Of Engineering, KubernetesGoogle
Kate Holerhoff
Senior Industry AnalystRedMonk
Gari Singh
Product Manager, Google CloudGoogle
In this KubeCon + CloudNativeCon North America segment, theCUBE’s Savannah Peterson sits down with Jago Macleod and Gari Singh from Google and analyst Kate Holterhoff from RedMonk for a fast-paced look at how GKE is scaling to meet AI demand. Singh explains how Google doubled a reference cluster from 65,000 to 130,000 nodes in a year for massive AI training jobs that can require 130,000 GPUs, and what it really takes for the control plane to schedule, start and communicate across clusters of that size. Macleod details how Google moved internal control-plane s...Read more