KubeCon + CloudNativeCon NA 2025 | Brian Monroe, NVIDIA & Venkat Ramakrishnan, Pure Storage

Clips
More from KubeCon + CloudNativeCon NA 2025

Venkat Ramakrishnan

VP & GM, Cloud Native Business Unit

Portworx by Pure Storage

Brian Monroe

Senior Software Engineer

NVIDIA

play_circle_outline Introduction of guests Brian Monroe (NVIDIA) and Venkat Ramakrishnan (Portworx by Pure Storage).

play_circle_outline NVIDIA's Kubernetes infrastructure supports R&D, focusing on self-service for application developers.

play_circle_outline Portworx aids NVIDIA in storage management, enabling self-service for various development teams.

play_circle_outline New innovations include Kube datastores for Kubernetes virtualization and streamlined data management.

Info
Transcript

Brian Monroe, NVIDIA & Venkat Ramakrishnan, Pure Storage

Venkat Ramakrishnan

VP & GM, Cloud Native Business Unit Portworx by Pure Storage

Brian Monroe

Senior Software Engineer NVIDIA

In this KubeCon + CloudNativeCon North America 2025 segment, theCUBE’s Rob Strechay sits down with Venkat Ramakrishnan, vice president & general manager of Portworx by Pure Storage, and Brian Monroe, senior software engineer at NVIDIA, to unpack how a container-first strategy is powering NVIDIA’s R&D – from chip design to firmware and internal apps – across on-prem and cloud Kubernetes. Monroe details why his platform team chose Portworx to wrangle DAS-backed, commodity storage at scale and deliver true self-service to application teams. He explains how names... Read more

explore Keep Exploring

Who are the guests featured in the KubeCon + CloudNativeCon North America 2025 introduction? add

What is your team's role in supporting the Farm team's infrastructure and capabilities? add

What has been the relationship between NVIDIA and Portworx prior to the acquisition, and how does Portworx's platform support developers working with Kubernetes? add

What new capabilities has Portworx introduced to support virtualization workloads? add

bolt Powered by CUBE AI

Brian Monroe, NVIDIA & Venkat Ramakrishnan, Pure Storage

search

Rob Strechay

>> Hello, and welcome back to KubeCon + CloudNativeCon North America 2025 from ColdLanta. I'm Rob Strechay, excited to have you on. And we have some returning guests and somebody who's new to the show. So I want to thank Brian Monroe, who's a senior software engineer for NVIDIA. I think people may have heard of your company before. You've been in the news a little bit.

>> Small company.

Rob Strechay

>> Small company. And then Venkat Ramakrishnan is backed with us again, who's the VP and GM of Portworx by Pure Storage. Welcome on board, guys. I think this has been, besides being cold here, things are heating up on the floor here, we're getting going. And the hot stuff going on is with Kubernetes and how people are using it with AI and other things. But let's first, Brian, get to know you a little bit. What is your team up to and your role at NVIDIA and what do you work on?

Brian Monroe

>> So my team is involved in providing services and infrastructure for our Farm team. And our Farm team's responsible for providing the infrastructure for our R&D, and our chip design, and chip implementation. So my team specifically focuses a lot on enabling capabilities. One big part of that is our Kubernetes infrastructure. So we have both Kubernetes on-prem and Kubernetes in the cloud. My responsibility is typically focus on the on-prem Kubernetes and basically providing as much of a self-service turnkey environment for our application R&D developers, and R&D application folks to be able to deploy their applications.

Rob Strechay

>> So you're an end user. So you're really in the day-to-day with all the rest of the people here trying to figure things out. What does your container strategy look like going forward? Because again, like you said, you're dealing with the Farm, funny enough, fabs, and chip design and all that. You've got to get things moving, it takes a long time for new chips to come about. But what is your container strategy as you look out into the future here?

Brian Monroe

>> So generally, we're definitely, I would say, a container-first environment. So in every aspect of our R&D group, we're trying to get out of fixed infrastructure models. So every place where we can dynamically provision, dynamically allocate resources, stand up and tear down as quickly as possible, repurpose is really our goal. So that's getting rid of really simple things like jump hosts and bastion hosts, turning those in containers, in addition to our traditional workloads like application workloads, stateful and stateless workloads on Kubernetes that are used for all different kinds of things. Could be metrics collection, could be traditional business applications, could be wherever. Wherever there's that need for deploying something, we're looking at containers as the way to get that out there.

Rob Strechay

>> So why are you up here with Portworx?

Brian Monroe

>> Well, we're a Portworx customer.

Rob Strechay

>> Okay. Well, how does that play into this strategy of being container-first, being Kubernetes-native?

Brian Monroe

>> So obviously there's a number of things when you get into the container world that you've got to come up with some unique solutions for. So storage is a little bit unique in the container world, networking is a little bit unique, some things like that. So resource management, all these things are areas that sort of change a little bit and you have to adapt and come up with new solutions to meet those. So specifically in the storage world, so we started off, NVIDIA's typically been investing in the hardware. We would buy a lot of bare metal type servers and then we would deploy those. And that was traditional for our batch workloads that we ran in those environments. Then we got into Kubernetes, we were still sort of doing that model. So we got into sort of DAS storage and had lots of commodity infrastructure with DAS storage, and we needed to find a way to manage that storage effectively. And so we looked at several different vendors and ultimately we aligned on Portworx as the right option for us.

Rob Strechay

>> Yeah. I look at it, there has to be interesting things and interesting feedback coming back to Portworx from the folks at NVIDIA. Tell us a little bit about what the journey has been from your side.

Brian Monroe

>> We have been partners. We have been working with NVIDIA even prior to the pure acquisition of Portworx. They've been a long-term customer of us. And the platform team at NVIDIA, Brian's team supports a large number of developers and different development teams that do everything from chip design, firmware development, internal enterprise, software development, AI training, and a whole bunch of workloads that run on Kubernetes and on Portworx. And this is essentially kind of the sweet spot use case for us because Portworx delivers data management at scale for Kubernetes, but makes it self-service for application developers. Because you could build your code in your namespace, but if you want to run data services or if you're building a dev pod where you're using a GPU for training but you're developing your software, you don't need to ever file a ticket to get storage service or like a file block on an object because you're running Portworx. And it's available scoped to your namespace with multi-tenancy. That means you could have a team that's developing a whole bunch of ASIC firmware, but another team that's developing, doing AI training can share the same Kubernetes infrastructure without ever stepping on each other's data. So that's the level of multi-tenancy and shared service model that Portworx delivers. And these are teams that support a large number of developers with a few platform engineers, so we enable them to operate at scale. It's been a phenomenal journey working with NVIDIA and learning with them as they evolve as well.

Rob Strechay

>> I can imagine. Again, it's like there's no shortage that you guys have to go fast, but you also have to be secure and available. Kind of talk to how the availability factors into what you're building there.

Brian Monroe

>> So I think the big thing in terms of delivering the solution to our stakeholders is that, as we've talked about, it has to be we don't really want to get our hands into it. It needs to be we stand it up, we layer on top of that our delivery model, and then basically allow our teams to be able to allocate their namespaces, deploy their applications, deploy their persistent volumes and storage right there. They don't have to think about what's under the covers. And then the second part of that, once it's up and running, we need that availability. So we need to be able to, we take down a cluster, do maintenance, we need to be able to shift our workloads. We try to generally do a zero downtime maintenance. So we basically would take down one node in a cluster, do the upgrades, things like that. We shift the workloads around, the Portworx storage infrastructure with replication spread across multiple nodes allows us to move those workloads around in various locations without having to worry about taking down a specific business process or function.

Rob Strechay

>> So are these applications very ephemeral or are some of them long-running? And what types of applications are on these pods?

Brian Monroe

>> We have both. We have a lot of batch-type workloads that run for a period of time each day, collect information or do some other workloads that's part of the R&D process they might need to implement. We have others that run continuously. So it's a pretty diverse mix. And so there's no real sort of specific type of application.

Rob Strechay

>> So when you look at this and you look at scalability, how does that factor into what you're doing as well?

Brian Monroe

>> I think from a standpoint of scalability is that we need to have the ability to scale out as needed. So to our stakeholders, we try to provide a very flexible environment. So when it's a question of someone coming in and saying I need more resources, it's a function of, okay, either we will add more capacity or we're always sort of planning for how do we stretch and to accommodate those sorts of things that they might need. And storage is a big piece of that, so we need to be able to add more storage or add more capacity. It's more of adding an additional node or a set of nodes to the cluster, scaling out the storage and that sort of thing.

Rob Strechay

>> So the other side of the coin with scale is resiliency. How does that factor in and how are you dealing with that as well?

Brian Monroe

>> I think from a standpoint of resiliency is that we try to make everything as commodity as possible, everything as standardized as possible. And so that we try never to make things so specific that an application's got limited resources. So we try to provide a lot of flexibility in our designs. And so that gives some inherent flexibility. Of course, we add a lot of the other capabilities around observability, metrics gathering, all that sort of stuff to make sure we're taking an inventory of our systems and how they're functioning and things like that.

Rob Strechay

>> This has got to be key learnings that you guys take back. And again, with your new stuff coming out and all the announcements and everything like that you had earlier, you got to see this as a really just a key thing that you're learning from them about how they're doing it because different industries have different requirements.

Brian Monroe

>> Yeah, definitely different industries have different requirements, but I think the fundamental problems are kind of similar when you go from one industry to another. I'll tell you for example, you're talking about scale and resiliency. Now NVIDIA, especially as a developer platform, a developer experience platform, is it different from, for example, a payment gateway? Absolutely, yes. But let's look at the fundamental KPIs and SLAs we need to drive. They have thousands of developers. They're building code constantly. And at the time of a release, like a chip tape out or a new software release, likely you're going to need more resources, but you're testing a lot more on parallel because you're trying to hit a timeline. So your underlying infrastructure has to be elastic and scalable. You could start off with something, but you need to be able to scale based on your need. Same thing with the payment gateway. You could be obviously processing payments, but at the time of, for example, a Black Friday, your payment gateway needs to be able to scale. So the underlying problems in scale is the same. Same thing with resiliency. You don't want an outage where tens of thousands of your developers are just sitting idle doing nothing, and that's the most expensive resource in your company. You don't want an outage in a payment gateway because you're going to affect millions of users not being able to make their payroll. But the outcomes and the effects can be different, but the underlying resiliency and the need for resiliency, need for scale is similar. So what we have learned from NVIDIA, what we have learned from other large banking customers, that's what has gone into the Portworx product significantly. And we have now extended that to not just this is all a container platform as a service use case, but we have extended that to virtualization where obviously our friends at Broadcom have jacked the prices up of VMware and a lot of people are leaving VMware. So Portworx now supports Kubernetes virtualization as well. And some of these learnings and some of these hardening that we did at scale, now customers with virtualization workloads can also benefit. If we can do this at scale at Kubernetes containers where we're supporting tens of thousands of developers, trillions of dollars of payments, VMs is a much easier workload for Portworx to handle, and that's why we are extending it. We are announcing, as I mentioned in the previous session, Kube datastores, which is our latest innovation of delivering VM datastore-like capability across Kubernetes virtualization. We're extending, we are cementing our claim to be the de facto data management and storage layer for Kubernetes virtualization. We are going to extend our Kube datastores to do everything that VMware users are used to from it when it comes to data and storage management workflows, and of course extend it to similar to VSAN and VMFS and solve end-to-end data management problems for VMs as well as containers.

Rob Strechay

>> So as you look out into the next 12 months here, 12 to 24 months for that matter, how do you see the business value you're getting today playing into that, your container strategy going forward? And how do you see it looking when we're in Salt Lake, as I found that out, we're going to be in Salt Lake in a year. So we're back to cold weather again, so maybe snow again.

>> I really like you're calling this Coldlanta.

Rob Strechay

>> I know. I can't call it Hotlanta today, that's for sure. So where do you see your container strategy and how you're trying to get value, more business value over the next 12 months?

Brian Monroe

>> So we're going to be updating and expanding in a number of areas. I'm being asked to deploy new CAITs environments in other geographic locations. Some of those are more satellite, but there's still areas where you have development groups, things like that. So we're doing that, we're increasing that footprint. I'm asking to put more mission ... When I say mission-critical workloads, so we were doing traditionally applications with sort of more simplistic stateful workloads. Now I'm being asked to put multi-node Redis instances up across spanning eight nodes with underlying storage. We just POC'd our pure array with cloud drives storage as a way to look at managed storage so we can add additional resources and flexibility under the storage layer. So we're beginning to expand and add capacity and resiliency and really flexibility to the infrastructure. So that's really where I'm focusing my energy right now. And then making sure that we have the best of breed solutions in place to enable and evolve.

Rob Strechay

>> Yeah, I love that. I love this. Hey, thank you both for coming on board. This has really been great, and I think you had a lot of insights into that, and I think people can learn. Again, people have heard of your company, so if they can understand how you're doing it and supporting your organization, they can see it's possible. So thank you for coming on board today.

Brian Monroe

>> Thank you.

Rob Strechay

>> Thanks as always.

Brian Monroe

>> Thanks for having us, Rob.

Rob Strechay

>> Yep. And thank you for watching this episode of TheCUBE, live from KubeCon + CloudNativeCon North America 2025 from Coldlanta. See you soon.