Red Hat Summit + AnsibleFest 2025 | Mark Lohmeyer, Google Cloud

Clips
More from Red Hat Summit + AnsibleFest 2025

Mark Lohmeyer

Vice President & General Manager, Compute & AI Infrastructure

Google Cloud

play_circle_outline Google Cloud's focus on AI inference capabilities, reasoning models, and agent-to-agent workflows

play_circle_outline Collaboration between Red Hat and Google Cloud on open-source technologies for AI

play_circle_outline Expansion of ecosystem partnerships and interoperability for customer benefit

Info
Transcript

Mark Lohmeyer, Google Cloud

Mark Lohmeyer

Vice President & General Manager, Compute & AI Infrastructure Google Cloud

Mark Lohmeyer, vice president and general manager of AI computing infrastructure at Google Cloud, joins theCUBE’s Rob Strechay and Rebecca Knight at Red Hat Summit 2025 to share Google Cloud’s evolving approach to artificial intelligence infrastructure. The conversation highlights Google’s open-source leadership and deepening collaboration with Red Hat to support next-generation AI workloads.

Lohmeyer discusses Google Cloud’s 2025 focus on inference and reasoning models, emphasizing how technologies like JAX and Kubernetes are foundational to its infra... Read more

explore Keep Exploring

What is Google's perspective on the market and the evolution of infrastructure to meet the demands of inference workload in 2025? add

What role does open source play, especially with AI, in Google's initiatives and partnerships in the technology industry? add

What is the importance of expanding the ecosystem of partners for LLMD and how will it benefit customers? add

bolt Powered by CUBE AI

Mark Lohmeyer, Google Cloud

search

Rebecca Knight

>> Good afternoon, everyone, and welcome back to theCUBE's live coverage of Red Hat Summit AnsibleFest 2025. I'm your host, Rebecca Knight, alongside my co-host and analyst, Rob Strechay. Rob, we are here in Boston, Massachusetts. This is a home game for us. And for you, it's like old home week because you're always the mayor of these shows, but this one in particular, you know everyone, you go way back with everyone.

Rob Strechay

>> Well, I think it's always fun to see where people are at in their journeys and what new companies they're with. This particular one, I've gone through multiple different places we've both been. And I'm really excited about this one. Again, off the main stage and to our set here. So-

Rebecca Knight

>> Exactly. So, fresh from the main stage, I would like to welcome Mark Lohmeyer, vice president and general manager Compute and AI infrastructure at Google Cloud. Thank you so much for coming back on the show, Mark.

Mark Lohmeyer

>> Thank you for having me here.

Rebecca Knight

>> Yeah, and well done up there on the main stage.

Mark Lohmeyer

>> Oh, thank you.

Rebecca Knight

>> A lot of pressure with those bright lights. I want to talk about the Red Hat-Google Cloud collaboration, but before we do, just set the scene for our viewers a little bit in terms of AI workloads. They're changing fast, they're evolving. What are the capabilities that are top of mind for Google Cloud customers right now? And really, how do you think about this market?

Mark Lohmeyer

>> Sure. So, we've said 2025 is the year of inference at Google, and we're seeing this in terms of our internal use as well as how our external cloud customers are using AI. And you're seeing, of course, the emergence of reasoning models, which require multiple steps to reason through what is the best possible answer. And then, those reasoning models get built into agents and agentic workflows. So, different agents talking to each other to solve complex challenges. And if you think about the pressure that this new class of inference workload, these types of models puts on the infrastructure, it's really like nothing we've ever seen before. And so, we think that the infrastructure really needs to evolve, almost transform, to be able to meet the needs of this next stage of inference with these very complex reasoning models.

Rob Strechay

>> Because again, I think people may not think straightforward and think Google Cloud and think open source, but then again, Kubernetes really was born at Google, so I mean, again, people who know, know, if you're in the know, I guess. But there's some really exciting stuff. And I think you talked about it, some of the partnerships, especially around DLLM and what you're doing in the open-source community. How does open source really play a role, especially with AI in what Google's up to?

Mark Lohmeyer

>> Sure. So, as you said, Google has a very rich history in open source. Kubernetes, of course, changed the world in many ways. But also, more recently technologies like JAX, which is amazing framework for AI model training and serving that we actually developed within Google to support our creation and training of Gemini. But then we thought, "Hey, this is such a powerful technology. Let's open source it. Let's make it available to the broader community." So, we have a rich track record here in this space. And of course, Red Hat does as well, of course. And so, we're really excited to be able to partner with them on bringing these open-source models to this next big problem that we think needs to be solved for AI, which is what we were talking about before. And just to double-click on LLM for a sec, one thing that we noticed working with our customers within Google Cloud is we were seeing them start to rapidly adopt VLLM on top of GPUs. You give them great performance, great cost efficiency, day-zero support for new models. And so, we're seeing it rapidly gain traction on GPUs. And so, we thought to ourselves, "Hey, what if we were to enable the same VLLM technology on Google Cloud TPUs and unlock the great price performance and value of TPUs, but make it super easy for customers to use their models across both, based on the needs of their business and the needs of their workload?" So, that's really where the first stage of this partnership together with Red Hat got a start.

Rebecca Knight

>> So, how are enterprises balancing those things, the cost, the performance and the openness when it comes to AI infrastructure?

Mark Lohmeyer

>> Yeah, so I think it's an interesting one because it's such a diverse space. So, you have such a broad range of models based on how the models are getting incorporated in the application they're getting used. In some cases, you might be optimizing for the highest level of intelligence. In other cases, you might be optimizing for the lowest levels of latency. But across the board, of course you want to drive down that cost per inference transaction. Because if you think about it, for most businesses, that cost per token, let's say, that is a key determiner of either how many customers you can serve at a certain cost point or what the profitability of your business model is if it's based on AI. And so, really driving that efficiency and that cost per token down is a priority for almost everyone I talk to.

Rob Strechay

>> And I think, again, Google does it at just such scale that you have some really good visibility into how to do it. Obviously, you have the Gemini models and you have a number of different models that you worked through and develop yourselves, plus you support all of those other models that are out there and all the tons that are in Hugging Face and everything like that. But there's also the fact that you have to coordinate across servers. Because most of the time when you're training, it's not just having in one server and that's where the LLMD comes in. And how are you guys working on that and how does that really help with Google?

Mark Lohmeyer

>> Yeah, so we're super excited to announce this project together with Red Hat and many other contributing partners that you heard today. And from a Google perspective, this is a problem that we've been working on for a number of years now. And as you referred to with Gemini and serving multimodal models and reasoning models, we pretty quickly realized that you need to be able to have the ability to not just operate at a single inference engine level, but be able to dynamically route requests to the right inference engine, based on utilization, you need to have the ability to do things like disaggregate prefill versus decode for serving, you need to have the ability to disaggregate storage for caching across different storage tiers. And so, some of these technologies, insights that we created over a few years, really helping Gemini to scale and delivering great intelligence per dollar, we're bringing those experiences, together with Red Hat, into LLMD and making it available not just for Google's own internal use, but for our overall broader customer base and the industry more broadly.

Rebecca Knight

>> So, Mark, as AI use cases, diversify, what are some of the bottlenecks that you're seeing in terms of enterprises maybe getting stuck into themselves a little bit? And how are you helping them overcome them and how does that also inform your product roadmap?

Mark Lohmeyer

>> Yeah, so it comes down to this distributed nature that we were talking about before. Let's imagine an enterprise application or an enterprise use case where you have multiple different agents that need to work together to accomplish some complex task. And maybe some of those agents are actually based on reasoning models that require multiple steps. Ultimately, what that means is you've got a very dynamic load on that inferencing computing system that can scale up, can scale down, the nature of that load can change very dynamically over time. And so, the infrastructure itself needs to be able to respond to meet those changing demands. And that's where these distributed capabilities that we talked about before become so, so important. So, you think about taking one of those models, distribute it across different classes of compute, scaling those different classes of compute up or down based on how that's changing in real time. These are the techniques that you need ultimately to deliver those outcomes, but at the lowest possible cost and the highest utilization of those underlying very valuable resources.

Rob Strechay

>> Yeah, I think, again, when you start to look at how all of this is coming together, I mean AI takes a village, to put it mildly. And I think one of the big things that we've seen is how the ecosystem has come together, especially with Google does have the open-source chops, and I think in Google Cloud really does leverage a lot of that. And again, I've known that from my history understanding Google Cloud platform. Where do you see this going next? Because I mean, Red Hat was one of the early adopters of some of the models. They're also working with the agent-to-agent technology, how does that all go together with... Because it also then supports the TPUs and everything else underneath there, which .

Mark Lohmeyer

>> Yeah, so you mentioned the ecosystem. We think this is super, super important. So, you saw the announcement with many other partners today that came together to make this happen. And if you look at the different types of partners that we announced with, there's hardware companies, there's model companies, there's other open-source companies, a pretty broad range of companies that have aligned behind this solution. And so, one thing we hope to achieve over the coming weeks and months is to continue to expand that ecosystem partners and to have that be a very active, engaged ecosystem where everyone is contributing, everyone is making LLMD better and better working across a broader and broader portfolio of solutions. Because we think ultimately that's what's going to help us drive customer adoption and help deliver the ultimate customer outcome. That they'll have the flexibility to use it with whatever accelerators they want, that they'll have the flexibility to use it with whatever models they want, that they'll have the flexibility to integrate it with the rest of their enterprise infrastructure. So, we think that that ecosystem expansion is... We're excited to get off to a really great start here, but we think also expanding over time is going to be really important too.

Rebecca Knight

>> But with that expansion and the ecosystem as you point out, is vast and diverse, how does that affect the customer experience when there are so many different vendors and so many different partners working together? And as Rob said, coopetition is the endearing trend and theme here.

Mark Lohmeyer

>> Yeah, it's a really good question. So, I think when I talk to customers, they want a level of operational consistency, regardless of what's happening behind the scenes. But within that, they also want to have a choice and flexibility. Maybe for some workload or model, they would like to leverage Nvidia GPUs, and we're proud to support those really well in Google Cloud. But for another use case, they might want to leverage TPUs. And so, one of the beautiful things about what we're doing with LLMD is because it's an open source project, all these different companies can contribute together on that single consistent platform. That platform can work across all these different underlying hardware solutions. And ultimately, the end customer is the one who benefits because they get that common way to operate. They get a common way to do inferencing with great performance, low cost, but with the flexibility to use whatever makes sense for their particular business and for their particular AI needs. So, we're really excited about unlocking that broader opportunity for inference as a result.

Rob Strechay

>> So, I mean, to your exact point, I think one of the interesting things is that Google has brought some technology in that agent-to-agent technology for agents talking to each other where you have MCP, which they talked about earlier today as well, and now Red Hat's contributing to agent-to-agent. Do you see that this has been one of those things that's been good, even with the TPUs all the way down to the hardware? Because I think to me, that has to be flawless in how that all-

Mark Lohmeyer

>> Absolutely. So, I think openness and interoperability at every level of the stack is really, really important. I think from a broader Google perspective, we're fortunate to have amazing capabilities at every level of the stack, from the hardware, to the optimized software on top, to the models, to the applications. But we also recognize that when we talk to customers or users, they also want it to be open at every level and they want the ability to use whatever makes sense for their business. And so, we're excited to support that level of openness within our solutions. And A-to-A is a great example of that. We started talking about that broadly at Google Cloud Next, A few months ago. Recently, Microsoft announced that they would be supporting that, which was fantastic to see. And then, we're really excited to also be enabling this together with Red Hat and their leadership around A-to-A as well. So, it's another great example a little further up the stack of how interoperability and openness is really important to unlocking the true value.

Rebecca Knight

>> So, when you look ahead, talk a little bit about some of the infrastructure-related innovations that you're especially excited about, perhaps ones that aren't getting enough attention.

Mark Lohmeyer

>> Sure. Sure. So, I think we talked a lot about LLMD, we talked about VLLM. Of course that layer of software is absolutely critical. And the fact that it gives you flexibility across these different types of accelerators. But more broadly when we look at AI inference, you really need to take a systems-level approach to solve the problem. And so, in addition to caring about those things, you also end up caring about storage, you end up caring about networking, you end up caring about how those things work together ultimately to deliver that outcome. And so, from a Google perspective, we're investing in all of those areas. If I take storage as one example near and dear to some of our previous roles, we've actually taken every aspect of the Google Cloud storage portfolio, block storage, object storage, file, et cetera, and we've created optimized versions of those specifically for the needs of AI. So, for example, we recently announced something called Hyperdisk ML, which is a block storage service designed specifically for AI inferencing and rapid loading of models into GPU or TPU memory. So, I just use that as one specific example, but think about how that entire infrastructure stack works together at a systems level, becomes super important for this next generation of workloads.

Rob Strechay

>> Given that you work on the compute side of the house, I think it looks like, especially with the way LLM and LLMD, VLLM really look to optimize, especially on the memory side of things, that has to be great from an efficiency, not only for the customer, because again, they get more work done at a lower cost, but is that one of the plays that Google Cloud is looking for and why? Because it helps you not only do that, but you can put more workloads on those assets.

Mark Lohmeyer

>> Oh, yeah, absolutely. So, if you think about just the core capabilities of VLLM, highly-efficient use of memory, which is a very, very precious resource, whether you're talking about GPUs or TPUs or other accelerator platforms. So, that's a key piece of it. But also, the fact that as the industry rallies around this day-zero model support, so whether it's a Google model or whether it's a third-party model. For an end customer, knowing that that model is going to be supported on day zero in that platform, they can get going quickly and deliver value is super important. So, unlocking all of those benefits with VLLM across our portfolio is important. And the other thing I should have mentioned earlier is it's VLLM and LLMD, but also, how do we integrate that to the actual end products that the customer takes advantage of? And so, you're going to see us enable this within the full Google Cloud portfolio of AI services, in partnership with Red Hat and their amazing capabilities that the customers can take advantage of on Google Cloud as well.

Rebecca Knight

>> That was a perfect note to end on. Mark Lohmeyer, thank you so much for coming back on the show. It's a pleasure having you on.

Mark Lohmeyer

>> Thank you.

Rebecca Knight

>> I'm Rebecca Knight for Rob Strechay. Stay tuned for more of theCUBE's live coverage of Red Hat Summit AnsibleFest. You're watching theCUBE, the leader in enterprise tech news and analysis.