Steren Giannini, head of product at Google Cloud Run, and Yunong Xiao, director of engineering at Google Cloud, join theCUBE Research’s Savannah Peterson to discuss the evolution of serverless technology, particularly Google Cloud Run's integration of containers and serverless infrastructure. They talk about how Cloud Run has redefined serverless technology by merging it with containerization, exploring its significant impact on scalability, efficiency and customer satisfaction.
The conversation provides profound insights into the myths surrounding serverless computing, specifically addressing scalability and its applicability for web-scale applications, according to Giannini. Key takeaways include the unique capabilities of Cloud Run in providing on-demand access to stronger GPU resources, which greatly benefit AI inference applications. With the industry's shift toward AI for expedited innovation, Giannini reveals how Cloud Run allows companies such as L'Oreal and VIVO to achieve remarkable cost efficiencies and scalability.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Google Cloud: Passport to Containers. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Google Cloud: Passport to Containers
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Google Cloud: Passport to Containers.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Google Cloud: Passport to Containers. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Google Cloud: Passport to Containers
Please sign in with LinkedIn to continue to Google Cloud: Passport to Containers. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Breaking Barriers: Cloud Run, Serverless Scalability and the Future of AI-Driven Infrastructure
TheCUBE's Savannah Peterson chats with Google’s Steren Giannini, head of product for Google Cloud Run, and Yunong Xiao, director of engineering, about how Cloud Run blends serverless and containers to power scalable workloads, including AI. Learn how companies such as L’Oréal use Cloud Run to reduce costs, boost sustainability and speed innovation — all with a developer-friendly experience and creative, fashion-inspired values.
Breaking Barriers: Cloud Run, Serverless Scalability and the Future of AI-Driven Infrastructure
Steren Giannini, head of product at Google Cloud Run, and Yunong Xiao, director of engineering at Google Cloud, join theCUBE Research’s Savannah Peterson to discuss the evolution of serverless technology, particularly Google Cloud Run's integration of containers and serverless infrastructure. They talk about how Cloud Run has redefined serverless technology by merging it with containerization, exploring its significant impact on scalability, efficiency and customer satisfaction.
The conversation provides profound insights into the myths surrounding ser...Read more
Breaking Barriers: Cloud Run, Serverless Scalability and the Future of AI-Driven Infrastructure
search
Savannah Peterson
>> Hello cloud community, and welcome back to our special series Passport to Containers with the Google Cloud team. My name's Savannah Peterson, coming to hear from our Palo Alto studios. Super excited for this particular segment. As you know, we have a multipart series going on, but today we're going to be talking about Cloud Run, serverless, and fashion. You're definitely not going to want to miss this interview. Gentlemen, thank you both so much for taking the time to hang out with me this morning. We are already having a good time.
Steren Giannini
>> Thanks for having us.
Savannah Peterson
>> Yes, absolutely. Steren and Yunong, you guys have been working with Google for a long time. Bobby speaks excruciatingly high of you in the best way. Steren, I'm going to start with you since you're described as the godfather of Cloud Run. Tell the audience what it is, what it does, and what makes you the godfather.
Steren Giannini
>> All right. Yeah, I'm Steren Giannini. I'm the product lead for Google Cloud Run, and I happen to be one of the founding members of it. So literally in the room with a few people who asked, "What if serverless was more than functions?"
Originally, the term serverless was really anchored on functions as a service. But as we looked at the problem space, we realized, hold on. If you extract fundamentally what it means, it means a fully managed infrastructure, on demand scalability, you don't pay when you don't use it. Very good developer experience, but it doesn't necessarily have to mean functions. What we started to realize is that at the same time, the software industry clearly agreed that containers were the packaging format for software. We wondered, what if we took serverless, we took containers? Two worlds that at the time really had nothing to do together, they were almost fighting each other.
Savannah Peterson
>> And when was this?
Steren Giannini
>> 2017, 2018.
Savannah Peterson
>> Oh, yeah.
Steren Giannini
>> You had the Kubernetes crowd.
Savannah Peterson
>> Right.
Steren Giannini
>> And the function, the service crowd, the serverless crowd clearly saying, "You cannot mix the two."
Internally at Google, people raised concerns, "No, you cannot say serverless containers. That doesn't make sense."
And that's what we shipped. In 2018, 2019, we announced Google Cloud Run after iterating on it privately for a while with early testers and introducing a broader definition of serverless, but still capturing its essence, its productivity that it brings to customers is its high scalability, its efficiency. After many years now, we've really evolved it. Cloud Run has grown from being a bet, a prototype to something that is now Google Cloud's focus when it comes to serverless. We have so many reference customers we are so proud of, and we have evolved Cloud Run quite a bit, which I'm looking forward to talking to you about.
Savannah Peterson
>> Yeah, that must have been quite a moment. Did you feel like you shocked the market a little bit when you came out?
Steren Giannini
>> So definitely we've observed other clouds responding to Cloud Run.
Savannah Peterson
>> I was wondering about the-
Steren Giannini
>> Probably notably by launching products which were very much looking the same. And yes, indeed, I think I believe we have influenced creating some kind of a new category of serverless containers as well as a little bit redefining and expanding the term serverless, which as I said again the beginning was very much functions and events.
Savannah Peterson
>> I love that.
Steren Giannini
>> And that has changed now.
Savannah Peterson
>> You're preserving the value add there, but really broadening the application. You've been working in serverless for a long time across clouds. What's so exciting about where we are now and what are some of the myths around serverless?
Yunong Xiao
>> Yeah, I'm really glad you asked. I think the biggest myth I would say is that serverless doesn't scale, right? It's great for your toy apps, it's great for when you're downloading from zero to one, but it doesn't work from 100, 100 to 1,000. I think that is one of the biggest myths, and often that's sets our customers and the industry back. What we're actually observing on service today is actually that you can grow and scale with serverless, especially on Cloud Run where it's built on top of 20 years of infrastructure development. At Google, last week we just had a customer scale to 2 million RPS in one single region on serverless.
Savannah Peterson
>> Whoa.
Yunong Xiao
>> Right? And so these are really-
Savannah Peterson
>> That's huge.
Yunong Xiao
>> Yeah, web scale numbers that we're talking about. I really once and for all wanted to do away with the myth that serverless is only for developers. No, it is great for web scale apps, it is great for hyperscalers, and it's great for additional natives.
Savannah Peterson
>> Why do you think there is that perception?
Yunong Xiao
>> Right, I think it started with a few things. One, functions, right? Functions isn't as great or as flexible of a runtime to really enable the large robust applications. But then also I think the mind shift of the industry is changing. And especially now you see, especially I think now with the AI revolution, there is such a big impetus for customers to go to market. Reduce the time to market, developer ergonomics. Now they're less focused on do I need to go build my own platform that can support 2 million RPS versus what if I could just use Google's platform? What if I could just use Cloud Run? I think with that expedited time to market pressure, all of your boards and CIOs are saying, "When do I get my next LLM?"
Or, "When do I get my next gen AI feature?"
That's really a forcing function for our customers to be like, "Okay, great, we're just going to use Cloud Run," scale with it, and they're actually very happy with that.
Savannah Peterson
>> I am not surprised. And having that industry as well as the breadth and depth that you all have as people make this pretty big transformation is a big deal. What about serverless and KAs?
Steren Giannini
>> Yeah, so I can talk a bit about that.
Savannah Peterson
>> Yeah.
Steren Giannini
>> Because as we created Google Cloud Run, we also wanted to have a strong portability story. We actually launched prior to Cloud Run an open source project to allow people to run the same containers and API and autoscale services on top of Kubernetes. To be very clear, Google Cloud Run is not running on Kubernetes. It is, as Yunong said, running on a highly scalable Borg infrastructure which scales to, I think we did demos from zero to 10,000 service instances in 10 seconds.
Savannah Peterson
>> Wow.
Steren Giannini
>> And for GPUs, by the way, we have the same kind of number that I'm looking forward to talking to you about.
Savannah Peterson
>> Speedy.
Steren Giannini
>> But basically that being said, we still wanted people to be able to, if they wanted to escape onto their own, onto Google Kubernetes engine, or even into other . First, containers are a standard packaging format. The container you deploy to Cloud Run has nothing proprietary about Cloud Run. That's very unique value proposition. You can literally take it, you run it on your local machine, you run it on Kubernetes, you run it on another cloud, but hopefully you prefer to run it on Google Cloud Run because it's more efficient and highly scalable. The container is portable and we went even beyond beyond that by offering a portable API. The Cloud Run API, if you look at it, actually will remind you the Kubernetes API. You can literally copy-paste it into your Kubernetes cluster and you change one thing and you can deploy it to Kubernetes. However-
Savannah Peterson
>> Wow, just given the complexity of that, that's pretty impressive.
Steren Giannini
>> Exactly. We went above and beyond to look like Kubernetes, but we are absolutely not. It's a different implementation of the same specification.
Savannah Peterson
>> Wow, that's impressive. Okay, cool. Well, when we sat down, I got very excited because you brought up one of my favorite conversations to be having right now. Inference is what makes AI real for the world. For most people that experience that instantaneous action, whatever that might be, y'all are doing some really cool stuff with Cloud Run and inference. Yunong, I want to start with you, can you tell us what's going on?
Yunong Xiao
>> Yeah, I think this is where the intersection of serverless and value proposition really comes in for customers. Inference is where the money's going to be made. You spend all of your money on training, and you never get that money back. Where you make that money back is on inference. What we're seeing is actually, there's a great market for inference across broad industry so not just digital natives, but across the Fortune 500 and the blue chips, right? What is the big problem that people are struggling with inference and just gen AI generally today? It's the cost. They're very expensive hardware that you have to buy and there's a capacity crunch. What we're seeing with our customers and with customers generally speaking is all of them are struggling to even just get supply of the cards or the TPUs or GPUs to be able to run their inference applications. And often, on a lot of our competitors, and just broadly within the industry, when you want to run inference applications, you can't really get these cards on demand. Your cloud provider will ask you to sign a two or three year contract to reserve the cards because they're so scarce. And so this is, again, a great opportunity for Cloud Run where we are actually today providing on-demand access to GPUs. And again, it's the same value proposition, scale up dynamic, scale down, pay as you go and pay per use.
Savannah Peterson
>> Which is unique, just to be super clear.
Yunong Xiao
>> Which is extremely unique across the industry, and we're seeing really great uptake and growth with our customers.
Savannah Peterson
>> I can imagine. I mean who doesn't want that bundle? Yeah.
Steren Giannini
>> But we haven't talked about the performance yet. You might be used to request a visual machine with a GPU and have to wait 10 minutes, 20 minutes on Cloud Run. You get an instance with a GPU and the drivers installed, ready to serve in five seconds. And from zero, right? From nothing where you don't pay, in five seconds, you have your instance up with your GPU.
Savannah Peterson
>> Wow.
Steren Giannini
>> And then the time for you to load the model. What we observe with models like Gemma 2-9b is that you can go from zero to you return the first word of the LLM in 20. 30 seconds. That means that you can trust the autoscaler to be fast enough so that when your users are not using your API endpoint, you will not pay or pay less. But when you have a traffic spike, you can rely on that autoscaler to spin up GPUs on demand extremely fast to return that traffic of requests and prompts that you get.
Savannah Peterson
>> That flexibility and elasticity is so critical right now. I mean, from a cost management perspective, from a productivity perspective, and frankly realizing ROI and figuring out what you want when something really works, you do want to ramp up like that. You don't want to have to deal with it. Wow. So had you been working on this, well, let me phrase this a different way. How has our AI revolution accelerated or augmented the development of what you've been doing? Because I feel like this definitely would've put a lot of attention on that. Yunong, I'll start with you again.
Yunong Xiao
>> Yeah, I think the biggest thing for us is to know that all of these, so let's go back to the fast starts, the cold starts for a second.
Savannah Peterson
>> Yeah.
Yunong Xiao
>> It is a prerequisite to have a service experience. If your app starting 10 or 20 minutes when there's a demand spike, you're actually causing auditors for your customers. Just want to note that the many years of innovation that we've built into the platform is really what's enabled us to ship serverless GPUs on Cloud Run really quickly. We've been thinking about this for a long time, and we started in earnest I think early last year. In a very short amount of time, we were able to ship this into our customer's hands. But again, that's because of all the innovation we've done in the previous years and the investments there.
Savannah Peterson
>> It's the ecosystem and the building blocks that you've already had as that foundation that allow you to provide those solutions for your customers. I love that we're all passionate about the actual results, not just the perceived results, you mentioned you have some customers who can share some narratives with us.
Steren Giannini
>> I think those customers, why do they even pick AWS GPUs with Google Cloud Run? It's often because they love the developer experience of Cloud Run, and as Yunong said, are looking for an efficient way to do AI inference. The first customers who worked with us even before the feature was publicly available was L'Oreal. L'Oreal built an internal AI chatbot and image generation system for their marketing so basically L'Oreal marketing can use-
Savannah Peterson
>> Huge in makeup....
Steren Giannini
>> an internal, yeah.
Savannah Peterson
>> Totally imagine, honestly, it makes so much sense.
Steren Giannini
>> An internal image generation so that it is tailored to the L'Oreal brand and aesthetics and that's running on Cloud Run. And why do they pick it? Because their usage is for their own internal marketing team and employees. During the night, everybody's at home, nobody's using it, and therefore they were looking for a scale to zero runtime. And that's what Cloud Run offers too. During the night when most L'Oreal employees are at home, L'Oreal pays nothing. And when there is a spike of demand to generate those marketing images for L'Oreal products, this is where Cloud Run can scale on demand. They were very helpful giving us feedback as users, but also, they were very proud to be on stage with us the day we launched several Edge GPUs because they were so happy to use it. And for them, they realized massive cost gains by scaling down, notably down to zero during trough and being able to handle peaks with Cloud Run.
Savannah Peterson
>> This also impacts the overall sustainability of these projects as well.
Steren Giannini
>> Oh yeah, interesting because L'Oreal happens to be one of the customers who cares the most about sustainability, and yes, indeed. Fun fact, I actually founded and launched not only Cloud Run but also Google Cloud Carbon Footprint, which allows you to monitor your carbon footprint of your Google Cloud usage. What we observe is that due to Cloud Run's on-demand nature, you really are more efficient in your resource usage. If you have an idle machine, it will use maybe a little bit less of power than when it is active, but it is still attributed to you and reserved to you while Cloud Run, due to its highly multi-tenancy and efficient nature where you can literally only use what you need, you are more efficient, therefore more efficiently using energy and reducing your carbon footprint.
Savannah Peterson
>> It makes total sense. I read somewhere recently that 25% of power usage is just things being plugged into the wall and we're not using them. That goes for, I mean machines, you don't have to be training a large model to have the power output. Obviously, that increases it, but just it existing to your point, it is a huge chunk and being able to cut that out doesn't even impact anything else. It's not like you're changing your behavior, you're just not wasting, which is really cool. Do you sleep, by the way? I mean, you founded a bunch of things by Google.
Steren Giannini
>> Yeah, I'm actually, on the side, I'm a bit passionate about sustainability. My home runs on 100% solar power, of course.
Savannah Peterson
>> I love that.
Steren Giannini
>> But also, I was trying to look at how can we help our customers with that? And by the way, the best way to lower your carbon footprint is to pick the right Google Cloud region. Not every power grid is as green as the other. And L'Oreal notably, they are running in Europe where some of those regions are highly optimized in term of carbon footprint.
Yunong Xiao
>> I mean, I might just add, the best way to reduce your carbon footprint in Google Cloud is to use Cloud Run. The multiplexing, the multi-tenant nature of it, the fact that we hold a pool for customers and they scale up and scale down, we manage that capacity ultimately is going to be the most sustainable way forward.
Steren Giannini
>> You reduce waste.
Yunong Xiao
>> Yeah.
Savannah Peterson
>> Well, and it makes so much sense, and also, you don't want to have to think about that. I mean, becoming a power management expert isn't necessarily somebody needs to add to their AI transformation, but it's obviously one of the hugest things that people are thinking about when they're factoring in their expenses and everything else. Yunong, you mentioned you had some other fun customer applications.
Yunong Xiao
>> Yeah, and so this is maybe a little bit different, which is one of our customers that we're launching with, VIVO. They're the third-largest cell phone maker in the world, is moving most of their phone-based AI features on a Cloud Run. What they've told us is AI for cell phone makers, for smartphone makers is a life and death struggle for them. And that's how serious it is.
Savannah Peterson
>> Wow.
Yunong Xiao
>> If you think about the competitive nature across all of their competitors, if you don't have the new circle to search, if you don't have the magic eraser features, if you don't have text to speech, you don't have on-the-fly translation, those are the features now that customers care about. So really, it is that life and death struggle for them. And for them, going to market is the number one priority. Everyone's running thin teams, everyone has tremendous market pressure to compete and this is where the two dynamics of Cloud Run. One, the fast time to market and the developer experience. The thing that I should only care about is developing my business logic, not managing the infrastructure, not figuring out how much capacity I need to use, not managing any of that. And then secondly, how do I reduce my cost of deploying these features because I'm deploying them to tens or hundreds of millions of customers in feature phones. If I were to provision for peak, that's extremely expensive, but also, there's the environmental factors of it. So those two things really come together where VIVO, after we've talked to them about Cloud Run, one, they love the platform play at Google Cloud. So this is the other thing that I should mention is with Cloud Run with serverless, it isn't just compute, right? Compute is useless by itself, but the platform play with our networking, with our VPC, with our storage, with our applications.
Savannah Peterson
>> It's really the whole ecosystem too-
Yunong Xiao
>> Yes....
Savannah Peterson
>> that surrounds it, not even just the actual solution there.
Yunong Xiao
>> Cloud Run is fully compatible and fully integrated with all of the Google Cloud platform components, and so it was very easy for VIVO to migrate their existing workloads, whether it's a magic eraser, whether it's translation, from where they were before onto Google Cloud Run. We've seen them do that within the span of a few weeks actually.
Savannah Peterson
>> Wow.
Yunong Xiao
>> That transformation, that migration, this is what we were talking about, it's like that big myth of it only works for zero to one, no, it does. It does work for zero to one, but it works for one to a hundred, it works for a hundred to a million.
Steren Giannini
>> And let me explain why it was so fast.
Savannah Peterson
>> Yeah, please.
Steren Giannini
>> So you literally open up Google Cloud from scratch, click Cloud Run, and you can deploy the same container you were probably using on your old platform and you can check the checkbox to get a GPU, and you have it. As I said, you deploy in a matter of seconds and then you get the-
Savannah Peterson
>> So simple. What I love about this is it allows you to scale. I mean, really exponentially adjust or tilt up the nose of the aircraft or the rocket ship depending on whatever you're building and do it so simply. People talk about decreasing complexity in our world and containers in particular all the time. It's been the entire conversation around Kubernetes for the last 10 years, but the fact that you can literally do that.
Steren Giannini
>> No cluster, you come and you migrate your workload because of this container standard. It's all about productivity and time to market, Cloud Run enables your developers to be productive and your business to be fast.
Yunong Xiao
>> I think that's the advantage of Cloud Run and Google. We're not. Maybe one of the other means for serverless is lock-in, right? Where if you think about functions as a service, there's a very custom API that you write your applications to. One that requires customers to have to make code changes and migrate their applications, but then once you have, there's because for you to get off, you have to make code changes. Our premise for Cloud Run is we don't want to win your business because we're trying to lock you in. We want to win your business with an open product because it's the best product for you. Kudos to Steren and the team for keeping everything about Cloud Run open. The APIs, native integrations with Terraform, the container standard means that you can very easily take any containerized application, put it on Cloud Run, and then just scale out of it as you go.
Steren Giannini
>> Those performance and those GPUs, yes, they are quite differentiating and unique among the cloud providers. As I said, five-second startup time on the main GPU, but that's not where we start. We are not satisfied. We will soon announce even better performance. Five seconds for us, it's slow. We come from serverless, we are used to milliseconds, not seconds. That's one thing that we are still working on improving that startup time. The GPUs we offer today, we want to offer bigger ones. Of course, there are large set of GPU types. Today, you can run many models on Google Cloud Run, but some of the biggest you can't yet. We are looking forward to adding bigger GPU types, which hopefully will have the same kind of performance as the ones we have today.
Savannah Peterson
>> Well, having spoken to you too, I suspect that will be the case given your aspiration and drive to make it happen, I love to hear it. It would not be 2025 appropriate if we did not bring up agentic in this conversation. I know that you've got some interesting things going on at Cloud Run and agentic. Steren, let's stay with you and get us started there.
Steren Giannini
>> So here for the past few minutes we've discussed AI inference. That's really an autoscale service that will run an open source model that you can fine tune yourself or take an off-the-shelf model like Gemma, Llama, whatever. This is for the inference piece. But what we observe is customers using Google Cloud Run to build something bigger, which is AI agents. I think we tend to put agentic and agent worlds on things that from our perspective, we rebuild the runtime. The needs of those types of agents are very similar to the needs of web applications actually. Let me explain why.
Savannah Peterson
>> Yeah.
Steren Giannini
>> Usually, your agent will use an orchestration framework. Last year we were on stage with the LangChain founder. LangChain is a very popular orchestration framework for agents, and he, himself says Cloud Run is the best place to deploy your LangChain agent.
Savannah Peterson
>> That's got to feel good.
Steren Giannini
>> Yeah, yes. It's easy to when people love what you do and when you love what you do, but they do because LangChain is one of the frameworks I would recommend if you want to build an agent which not only calls an inference endpoint. By the way, this inference endpoint can be hosted on Cloud Run but can be directly as a service. If you want to consume Gemini as a service from Vertex, that's the same from the perspective of the orchestration, the agent or the orchestration. But this agent orchestration framework will prior to that process, the user input, enrich it with some context, maybe retrieve some more context from a database. That's something we call the RAG, so retrieval augmented generation. You do all that before you call the model. And actually, nowadays what we see is agents which are multi-agent where you have one agent potentially branching to another one to do some different tasks or coding tools. What we've observed is customers who will be with us at Cloud Next who have built a code execution environment on Cloud Run. If you think about it, the agent will generate some code on the fly, and then you will execute that code on Cloud Run too. Why? Because we haven't talked much about it, but every container on Cloud Run is strictly sandboxed. We run customer's code in that sandbox, but customers themselves can run untrusted code on that sandbox so that means you can have an agent that orchestrates the user input, add some more context via retrieval augmented generation. Execute untrusted code via this sandbox, calls an LLM. What we observe people is also generate browser websites and these agents become quite complex and powerful and we've observed a few customers picking Cloud Run, again, for its characteristics of autoscaling on demand and developer productivity.
Savannah Peterson
>> I believe that. So this transition, this new lens on agentic wasn't that big of a leap for what you were already doing?
Steren Giannini
>> No, I mean the GPU, the AI inference was a big leap technically.
Savannah Peterson
>> Right.
Steren Giannini
>> It can go way deeper into why it was extremely hard and why we believe we have something very differentiating in terms of infrastructure. AI agents from a technical standpoint, they look very much like an API where you have a database and some Python code that needs to execute. Potentially the only differences are those code execution environment or maybe asynchronous processing.
Yunong Xiao
>> But what I would chime in is even before we had GPUs, we had customers performing those sorts of use cases anyway already on Cloud Run, so it wasn't a leap to be where-
Savannah Peterson
>> Exactly, that's what I was.
Yunong Xiao
>> Call them by another name, but certainly we had customers who were executing untrusted code within Cloud Run sandboxes. They just didn't have the GPU accelerators.
Steren Giannini
>> Customers have built a very popular dedicated chatbots for their brands that they expose to their users, and those have been benefiting from Cloud Run's autoscale during some events. I cannot share much more details, but it's basically when you can imagine a sport event, and obviously everybody will interact with the chatbot at this exact same time, so you need a scalable runtime that will be able to handle that load. As Yunong said, we have customers doing 2 million requests per second right away, and Cloud Run can scale to that and so your AI agent can scale to a spike of user prompts on Cloud Run.
Savannah Peterson
>> This is like an anxiety reducer for anyone who works in tech. It feels like you're a bit of an antidote in that regard for when these moments do happen. Super impressive. Let's go back to the inference just for a second because I would love to follow up on that. Why is what you've done so unique?
Steren Giannini
>> That's a good question. I believe, so why is it unique? Because of its unique performance and developer experience, but why are we leaders here, differentiating here? I would say because we've built on that extremely differentiating infrastructure, which is Google Cloud Run running on Borg, highly, highly optimized for startup time. Yunong engineering team was really able to go even beyond by adding GPU support to that infrastructure. There is a lot of engineering innovation that have happened and maybe you want to go deeper into that.
Yunong Xiao
>> Yeah, maybe I can add that.
Savannah Peterson
>> Please.
Yunong Xiao
>> I think first from a serverless perspective, typically the economics of it are very hard for the cloud providers because if you think about it, the base proposition is you don't pay upfront, you pay as you go, and you could spike to some number of instances that we don't know ahead of time. So typically, you solve this by keeping a large pool of instances around sitting at idle, and then the cloud provider ends up being the cost, right? Where Google can really differentiate here is we keep going back to this thing called Borg. So Borg is, for those of you don't know, is Google's own internal orchestration system for compute workloads. That's what Kubernetes was inspired by largely, but it runs an alphabet scale. So if you think about all of our properties, YouTube search, Maps, Ads, et cetera, all that runs on Borg.
Savannah Peterson
>> Little bit of scale.
Yunong Xiao
>> Just a little bit, right? Maybe the biggest scaling system in the world.
Savannah Peterson
>> Yeah.
Yunong Xiao
>> But with that economy of scale, with that shared architecture, that is how we can enable the innovation at a cost basis that actually makes us a really profitable business. This is where you can see Google really innovating in this area. A lot of other players in this space, they're building this maybe on top of VMs or instances, they don't have that alphabet scale that's going to help them provide the right value propositions for their own businesses to be able to put this out to the market. That's one. But I think the second one really is around fundamentally we have a very strong business within Cloud Run. Obviously, we can't get into some of the numbers, but we've seen some really explosive growth in the last five or six years. It's a very strong business, really strong revenue, really strong margins, and so obviously when a business is doing well, we want to continue to invest there. By far, I think those are the two reasons, the technical innovation, the economics of it, but also the success of the business so we want to really double down. AI gives us a really good place. It's a really good moment in time for us to continue that serverless revolution because as we talked about, a lot of these businesses, they really don't have the time to tinker around with their platform teams, to build platforms. They have to go now it's a life and death struggle for a lot-
Savannah Peterson
>> And by the time they do it, it'll be time to do it again.
Yunong Xiao
>> Yes.
Savannah Peterson
>> You'll never catch up with the velocity we're going right now.
Yunong Xiao
>> That's why they really see that value proposition even more than traditional compute workloads because they've never had these platform teams to manage GPUs, some of them can't even get GPUs, right? And all, they got tons of downwards pressure from their board saying, "When is the next agent shipping? When is the next gen AI feature shipping?" And so that is such a great fit for Cloud Run.
Savannah Peterson
>> I mean, I feel like you've convinced me that everyone should be using Cloud Run at this point. It just doesn't make any sense not to do it a different way for so many different reasons, I'm not surprised you're delighted.
Steren Giannini
>> Give it a try for the audience, for yourself. As I said, we have quick starts, we can link them on the video.
Savannah Peterson
>> Oh, we absolutely should, and we definitely will.
Steren Giannini
>> As I said, you open up Google Cloud, you follow, and in just three, five minutes, you will have your agent up and running. You will have your AI inference endpoint up and running with a GPU.
Yunong Xiao
>> And it's production ready.
Steren Giannini
>> We would be happy to share those-
Yunong Xiao
>> Just to be clear, it's production ready.
Savannah Peterson
>> Right.
Yunong Xiao
>> You can scale up right away, but when you don't scale up right away, it doesn't cost you anything.
Steren Giannini
>> So for development and exploration perspective, you will basically stay below the free tier. But when you need it to scale, scale to production levels like millions of requests possibly needed.
Savannah Peterson
>> It's really great to be able to play like that and to learn, especially with everything that's going on. Google is always known for caring about the developer experience as well, so it does not surprise me.
Steren Giannini
>> It's hard work. Since the 2018 date, it is a lot of passion and craft for building something that we know developers love. It's rewarding to see the feedback from the developer community Cloud Run leads Google Cloud's developer satisfaction numbers. It has amazing usability. We measure usability or a third party measures the usability over all the big clouds and we are always coming up at the top. We've published those numbers in the past, but even internally when we measure ease of use, test success, and just satisfaction like CSAT.
Savannah Peterson
>> Yeah.
Steren Giannini
>> Cloud Run is at the top of Google Cloud.
Savannah Peterson
>> Well, congrats on that too, I bet that feels good. I bet you're excited to see everyone who uses it at Google Cloud Next here in a few weeks too. It's a fun time. All right, Yunong, I have one, I got a little bit more questioning for you, but I have one for you in particular. You really strike me as someone who's no BS and I like that a lot. What do you think are some of the myths around agentic right now? Especially like we were talking about with this hyped conversation. I mean, you were essentially already doing a lot of this, and now we just have a term we're using for it.
Yunong Xiao
>> Yeah, maybe it's like the old Shakespearean thing, a rose by any other name. I think agentic is just a term meant to describe the use case of traditional inference workloads. I have these agents, they perform some functions for me, I can chain them together and it's a term that's helpful. It's helpful for us to group them into a specific category in terms of use case. But ultimately, I think the thing that we should realize at the end of the day is what is the value that they're providing to the customer? I think therein is where the value we could actually unlock the value for customers.
Savannah Peterson
>> I just want you to be louder for the folks in the back. I feel like this was the one thing I keep asking, what's the value? What's the thing there? What's going to make this better or how does this make the world more sustainable or whatever that might be?
Yunong Xiao
>> It's very clear that it has to solve concrete problems and they're usually among a few axis, but first of all, are they generating actual value for customers? There are some many examples that we see out in the world where, hey, I have an agent, it does something. Does it solve a problem? Does it solve a pain point for customers? Does it streamline things? Does it make things easier? There are lots of agents out there that do that, which is awesome, but there are also lots of agents that doesn't necessarily do that. We're still very early days and folks are all experimenting, which is great to see. I think the second level of that is do these agents help to provide better ROI for their businesses? It may not provide better, more value for customers. You think about customer service agent, maybe it does, maybe it doesn't, but on the back end, it's replacing thousands of manual steps for customers. And therein, I think is also where a lot of the value comes in. I'm excited to see what happens over the next few years as we build the ecosystems around agents, so you're not innovating from scratch and not propelling.
Savannah Peterson
>> That's where I think it's going to start to get interesting.
Yunong Xiao
>> Right. And it starts-
Savannah Peterson
>> And that's where you'll save some money, I think.
Yunong Xiao
>> Exactly. Again, I think Cloud Run is one of the best places to go run and build these agents because at the end of the day, it's inference via GPU and sometimes it's a lot of inference by a lot of different agents together. And again, the other thing we haven't talked about is where Cloud Run really shines. It's lots of different services all working in concert together and the ease of management of those services, because you're not managing the infrastructure so it's trivially easy for you to create new revisions, new projects, new services, and string them all together. And then again, therein, you can have a very small thin team potentially build 20, 30, 40 agents and not have to maintain the infrastructure there. And so that cost of scale and that time to market really accelerates with Cloud Run.
Savannah Peterson
>> The little kid nerd inside me is getting very excited thinking about this because you think of how many things will be enabled by that kind of innovation ecosystem. It becomes a different game to create when these things will unlock.
Steren Giannini
>> As an agent builder, deploying to Cloud Run really frees yourself from having to worry a lot about operations, many customers or individual builders deploy to Cloud Run and then don't have to do anything for it to run.
Savannah Peterson
>> I mean, how glorious, right? You have this kind of load.
Steren Giannini
>> We take care of the security updates, we take care of the infrastructure upgrades, and it's all about trusting Google Cloud Run for doing the operations for you so that you can focus on cooking your AI agents. We know this field is going extremely fast, so the last thing you want to do is setting up infrastructure.
Yunong Xiao
>> I would take that even a step further, which is if Google is your SRE, we will probably do a better job of it than most people and then yourself included. You're getting better service without having to worry about any of that. Really, where I really would like to see the industry shift, and I think we see that especially in AI spaces, focus on the thing that generates value for your customers. That's the business logic. For almost all businesses, that's your business logic. It isn't running infrastructure. Leave that to us, right? We will help you with that.
Savannah Peterson
>> That. Oh yeah, no, let everyone do what they're best at and what they have a seasoned history of doing. Okay, last technical question before a little bit of a fun round here at the end. What do you wish, if you could wave a magic wand and the whole market, everyone, maybe even your families knew something about inference that they don't currently know now? What do you wish you could tell them? Yunong, I'm going to start with you.
Yunong Xiao
>> I think the thing that I would tell folks around inference is that at the end of the day, it is not magic or rocket science. It's actually quite simple. If you look underneath the hood, right? It's a bunch of vectors and linear algebra. My encouragement of everyone else is go try building your own inference applications. There's lots of really great frameworks out there that helps to abstract all of that complexity away. LangChain, Ollama VLMs, some of these folks we partner very closely with, stay tuned for Google Next where we'll have some exciting announcements there, but literally anyone can go out with a little bit of a back running software and build your own inference application. It's incredibly easy to go do that so don't be afraid of what you think is the complexity beneath it because the industry is fastly moving towards the intersection of say, Ollama plus Cloud Run means that anyone who can write a little bit of Python or JavaScript can immediately start deploying an inference application themselves.
Savannah Peterson
>> I love that. That was a really great sound bite there. Very good motivational line too.
Steren Giannini
>> I think we should re-add to the picture there the open models that we have seen since Llama, right? Who would have predicted that AI inference was going to go from very proprietary APIs to a flourishing ecosystem of open models which are pushing everyone forward? This is where those open models, you can take them off the shelf, you can actually tweak them for your business, for your specific needs, either by fine-tuning or by doing LoRA adapters. And then you can run them very easily. We've partnered, I really want to say that name again, with Ollama. I believe Ollama is the simplest way to run those open models on your local machine, but obviously also on Cloud Run. It is literally, you take the official Ollama container, you tell it which model you want, you put that model in Google Cloud storage. As Yunong said, very close integration between Google Cloud storage and Cloud Run. That means your open model can be loaded into Cloud Run very easily. And with high performance and Ollama, all you have to do is deploy to Cloud Run, and then you have your own AI that you've built yourself. The first time I did it, it felt strange because I'm talking to a computer, to my local computer or to my Cloud Run service. It's not like a company-owned thing. No, I did it. It's my own little AI that I can talk to and that felt strange when this AI returned me some words. It was the first time a computer was talking to me at that level of proficiency, and more importantly, a computer that I had deployed on my own infrastructure on Cloud Run.
Savannah Peterson
>> That was actually, I mean, it felt weird. I bet it felt pretty cool and exciting too. Those are those little magical moments where you really do get to see what your customers are experiencing too all the time. I can only imagine what it's like for them to have that moment too. You're like, "Oh, wow, it works."
Yunong Xiao
>> Yeah, I think the barrier of entry has really come down and that intersection of great developer tooling, so Ollama and Cloud Run, but then the democratization of access to GPUs. If you're some indie developer as an example, for you to get access, I encourage you to try to go on any cloud provider and try to get a VM of a GPU. It is incredibly difficult because of the lack of supply, but also the cost and that barrier entry is super high. What we're really trying to do with Cloud Run with some of our partners is for anyone out there, it is trivially easy for you to get access to GPU, but also be able to deploy your open source own models onto the cloud. And so again, we want to use that as the way to really accelerate innovation in this space as well.
Savannah Peterson
>> Oh, yeah. I hope that when we are at next, maybe we can talk about some of those examples and some of the smaller teams. I do think that's very interesting and the term democratization gets a little abused from time to time, but I think in this case, when you do have the capacity for smaller teams to have the same toolkit as the biggest companies in the world for free to begin, that's quite compelling when it comes to access. We'll make sure that we put a bunch of links in the comments below so that people can do exactly what you are describing. I want to close this on a note I was not anticipating closing this on today. I could have talked to you guys all afternoon. We're going to talk about fashion. You don't always talk about fashion here in Techland, but I think we absolutely should. Steren, I just got to give a shout-out. I came in right away and I complimented your shirt today, who made that shirt?
Steren Giannini
>> That shirt.
Savannah Peterson
>> Yeah.
Steren Giannini
>> Was made by my wife. That's the shirt I actually always wear when I'm on stage or doing interviews. It's not the exact same one, I've had many of them-
Savannah Peterson
>> Do you wash it?
Steren Giannini
>> Yes. But it's also, I love it so much, I started to wear it on stage at Google I/O years ago, and now it has become my stage attire. Every time my wife ask me, "What do you want for Christmas?"
"Look, the same shirt." And I had many of these.
Savannah Peterson
>> I love that.
Steren Giannini
>> And now it's also great because when I have to dress for today's interview, I don't think, I know what I'm wearing. It's always the same shirt that I love that my wife made, and it's a gradient shirt so quite hard to make, demands a lot of time and a lot of craft.
Savannah Peterson
>> How many are you up to now? How many do you have?
Steren Giannini
>> I have the next one already, but this is the third one I believe.
Savannah Peterson
>> Oh, I love this. And what's your wife's name?
Steren Giannini
>> Anne.
Savannah Peterson
>> Anne?
Steren Giannini
>> Yes.
Savannah Peterson
>> Shout out to you, Anne. I am very impressed with your crafts. If you ever want to outfit anyone here at theCUBE, we would love to wear your creations. Steren is really killing it and it just looks great. And I also like that that's how you don't think. Most people would look at a multi-hued gradient shirt and be a little intimidated by that from a fashion perspective. And I love how for you, you're like, "Oh yeah, I'll just throw it on. This is what I wear when I go on stage." It's absolutely awesome. Yunong, you've also got some interesting fashion perspectives. There's an article of clothing you really hate wearing. You want to tell us about your no-sock life?
Yunong Xiao
>> Yeah, I don't wear socks. I haven't worn socks in quite some time, essentially since I moved to California. The weather's nice. I grew up in Canada where-
Savannah Peterson
>> I was wondering if you grew up somewhere cold, that was going to be one of my questions. I figured that had to be a little bit, you had to free the feet.
Yunong Xiao
>> Yeah, exactly. It's very freeing. You don't have to wear socks. I'm big into Crocs. You can't see them, but I've got them on right now. There are some more fashion-forward Crocs that you can even wear to customer meetings and fly.
Savannah Peterson
>> Do you have business Crocs? Is that actually a thing?
Yunong Xiao
>> Fly stealthily by. They're black, they're dark. You have long enough pants, it doesn't even show that you're not wearing socks.
Savannah Peterson
>> Do you have the little Croc charms on your business Crocs?
Yunong Xiao
>> Those, my kids take away. Those don't quite make it, but if you travel a lot, not having to pack socks actually frees you up to have more space in the luggage as well, and so on the top of being carbon-neutral.
Savannah Peterson
>> Yes,
Yunong Xiao
>> Not wearing socks, not buying socks, not having to wear socks puts you in the game for a little bit doing better in the carbon world.
Savannah Peterson
>> I love that y'all are thinking about sustainability all the way from crafting fashion to lower carbon-emitting sock habits. Yunong, Steren, this has been absolutely fantastic. Honestly, you guys are great. I can see why Bobby wanted us to hang out. There's been more smiles and giggles than I was expecting, but there has been as much learning as I was expecting so thank you both for taking the time, seriously.
Steren Giannini
>> Thanks for having us. Give a try to Cloud Run yourself, the audience.
Savannah Peterson
>> Oh, I'm going to go build some stuff.
Steren Giannini
>> Feel all of the things. We are very proud to have built this and share it with the world.
Yunong Xiao
>> It's a pleasure to be here. Had a great time, and hopefully we'll be back at some point as well.
Savannah Peterson
>> You will absolutely be back and you will definitely see me at Google Cloud Next. Hopefully I'll have a story of something I played with or built or have questions for you on that journey. Thank all of you, wherever you might be tuning in on this beautiful rock. We're here at theCUBE's Palo Alto studios in California, my name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.