SC24 | Kevin Cochrane, Vultr & Ted Marena, AMD

Clips
News
More from SC24

Kevin Cochrane

CMO

Vultr

Ted Marena

Dir. Business Development for DCGPU

AMD

Powering the AI revolution: How Vultr and AMD are reimagining GPU-accelerated workloads

Since graphics processing units offer unparalleled speed, efficiency and scalability for specific tasks that are computationally intensive, GPU-accelerated workloads are critical in the artificial intelligence age. The collaboration between Vultr’s composable cloud infrastructure and Advanced Micro Devices Inc.’s next-generation silicon architecture unlocks new frontiers of GPU-accelerated workloads, providing businesses with powerful, cost-effective and flexible resources to handle demanding AI functions, according to Kevin Cochrane (pictured, right), chief marketing officer of Vultr (aka The Constant Company LLC)

play_circle_outline How AMD Partnership Enhances Cloud Computing: Importance of Inference Optimization for Enterprises

play_circle_outline Shift towards distributed computing for modern application development

play_circle_outline Maximizing Application Performance with AMD EPYC Compute Clusters and Developing AI Skills for Improved User Experience

play_circle_outline Enhancing Data Governance through Secure Network Connections and Harmonization Strategies

play_circle_outline The Future of GPU Infrastructure and Cloud Architecture: Single-Click Deployment Goals and Bold Claims

Info
Transcript

Kevin Cochrane, Vultr & Ted Marena, AMD

Kevin Cochrane

CMO Vultr

Ted Marena

Dir. Business Development for DCGPU AMD

Coverage of Supercomputing 2024 in Atlanta, Georgia continued with guests Kevin and Ted from Vultr, a partner of AMD. Vultr utilizes AMD EPYC processors and Instinct cloud GPUs for cloud compute services and Agentic AI initiatives. AMD's GPUs are optimized for inference with ROCm software. The focus is on scaling AI models for business functions with distributed computing and secure data residency. Vultr provides managed Kafka service and serverless infrastructure for real-time data processing. The goal is to enable real-time digital representations for bette... Read more

explore Keep Exploring

What is the reason for extending the partnership with AMD into their Instinct cloud GPUs product line? add

What are the benefits of distributed computing in modern application development and deployment at enterprise scale? add

What is AMD doing to lower the learning curve for developers who need to transition from building web applications to building AI applications using GPUs? add

What considerations need to be made when dealing with data residency, sovereignty, and security in a hybrid multi-cloud architecture? add

What are some features of the truly composable cloud infrastructure that will be shown next year? add

bolt Powered by CUBE AI

Kevin Cochrane, Vultr & Ted Marena, AMD

search

Savannah Peterson

>> Good afternoon nerd fam and welcome back to beautiful Atlanta, Georgia. We are here coming to the close of day two of our three days of coverage of Supercomputing 2024. My name's Savannah Peterson, joined by the wonderful, clever and a bit silly this afternoon, Dave Vellante.

Dave Vellante

>> You know what I don't get? If there's all these GPUs in here, how come it's so cold?

Savannah Peterson

>> I know. Are they trying to cool our liquid?

Dave Vellante

>> Are they not plugged in?

Savannah Peterson

>> I'm not exactly sure. They must be doing something great with the heat and the exhaust. Speaking of bringing the heat, we've got two fabulous guests. Kevin and Ted, thank you both so much for taking the time today.

>> Guys.

>> Thanks for having us here.

Savannah Peterson

>> Kevin, this is becoming a bit of a tradition for us.

Savannah Peterson

>> That is true.

Savannah Peterson

>> I'm very excited to really dive in. I had a fabulous conversation with Nathan about Vultr last week at CubeCon. You guys are on a tear right now. You want to talk about momentum? Y'all hot?

>> Thank you.

Savannah Peterson

>> Y'all might need some liquid cooling yourselves.

>> Exactly.

Savannah Peterson

>> I'm very impressed by the scale. This partnership seems like a match made in heaven. Bing had big announcements just a couple months ago. What's going on right now? What brings you to the stage today?

Savannah Peterson

>> Well, it's very simple. We have had a long-standing partnership with AMD. On our core cloud compute, we are big consumers of AMD EPYC processors to support some of the most complex cloud native application workloads on the planet. Also, HPC workloads as well. It's a very natural extension for us to extend the partnership into AMD's Instinct product line, their cloud GPUs. In particular for our enterprise customers that are moving beyond the art of the possible in 2024 and moving into inference at scale for all of their Agentic AI initiatives, it's incredibly important that they have that inference optimized, cost-efficient, energy efficient GPU powered by AMD.

Savannah Peterson

>> That must make you feel good to hear.

Savannah Peterson

>> I love hearing that.

Savannah Peterson

>> I see that grin. That's exactly what you're aiming for. That is exactly what you're trying to do, right?

Savannah Peterson

>> Yes, we are. I mean a lot of people, Kevin spelled it out, they know AMD is a CPU organization, but we have a sizable business in GPUs and it's the fastest growing business that we've ever created. Our Instinct product line went up to, did a billion dollars in a quarter. Just one step, so we went from about 100 million to 4.5 billion dollars in a year. It was staggering. The growth rate is significant. The GPU technology, the performance, the tools, the ecosystem that we brought, companies like Vultr are taking advantage of. We're super excited to be working with them.

Dave Vellante

>> You were saying before we came on that AMD built its GPUs for inference, and that's really the target sweet spot.

Savannah Peterson

>> Let me clarify. You're generally correct.

Dave Vellante

>> My interpretation of, okay.

Savannah Peterson

>> Yeah. Our GPUs are built, they can do inference, they can do training, they do AI and HPC well. That's what their hardware is architected to do. When I say we initially optimized for inferencing, I'm talking about the software stack. Our ROCm software stack is an open source software stack and that is what we initially optimized for inferencing because we knew the market was moving there. It's kind like the saying, "Go to where the puck is going, not to where it is." That's really the way we looked at the market.

Dave Vellante

>> Well the reason it was so interesting to me is because we've written, I mean years ago we said that a lot of action AI training in the cloud, but the real action is going to be in inference. That's where the interesting use cases are going to go. That's where the money is going to actually be made. Yeah, you got big five LLMs cranking out, that's great with just horrendous economics. Then you've got all these enterprises with all this data that want to apply agents, and that's where the real money is going to be made by your customers.

Dave Vellante

>> Yes.

Savannah Peterson

>> We use them where the benefit is too.

Dave Vellante

>> That's where the value is.

Savannah Peterson

>> I mean, inference makes it real. Yeah, go ahead.

Dave Vellante

>> Yeah, I mean we used to say that in 2024 that inference was a new web app, and here in 2025 agents actually put inference to work. We recently did a study here at Vultr of 1,000 C-level executives that were responsible for AI planning for 2025, and what we saw across all verticals was that they were looking to deploy agents at scale, up to 180 different agents to support every single line of business function, both for internal use case to support employees as well as external use cases to support their customers. In every single case, they were looking to just take an existing open source foundational model and they were looking to RAG enable it, retrieval, augment and generation, and they were looking to power inference. The core message here was that they weren't looking to buy large scale training clusters, they were looking for large scale deployment of GPUs for inference.

Dave Vellante

>> They wanted to do that on their own proprietary data, correct?

Savannah Peterson

>> That's 100% correct.

Dave Vellante

>> Yeah, and that's where it gets interesting. I mean the title of this is Enterprise is Powering to Handle Data Amid the Agentic Era. That's what's really interesting about this is that proprietary data, that's never going to go onto the internet.

Savannah Peterson

>> No, that's correct.

Dave Vellante

>> Not going to be trained by ChatGPT organizations.

Dave Vellante

>> They aren't going to do that. They're going to want on-prem.

>> 100%.

Dave Vellante

>> Yeah.

>> There's really two things there. To your point, for people that are looking to train their own models, they will deploy GPUs on-prem and they'll take that trained model and then they're going to need to deploy for inference, and that in all cases is going to have to be in the cloud. Secondly, when they are looking to even do training in the cloud, for those people that are doing it, they want to have a safe, secure, compliant cloud provider that can guarantee data residency, that can guarantee data sovereignty, and that can guarantee that their data's not going to be used for any other purpose than for training their specific model. That's where Vultr comes to play. Whether you're training on-prem and deploying to the cloud or you're training in the cloud, we're going to give you the safest, most secure, most cost-effective environment across all six continents to make that a reality all on an integrated AMD stack.

Dave Vellante

>> When I think about the history of computing, you had this monolithic mainframe and the applications were pretty much back office and finance, and then that exploded and the application stack changed. Client server, three tier. AI is completely changing the application stack.

>> Correct.

Dave Vellante

>> Okay, so what you guys are doing is saying, "Here's all this computing power and we know that every wave, whether it was mainframes to minis to PCs, brings new thinking about how to build applications." To me, the reason I love this title, Empowering Enterprises to Handle Data Amid the Agentic Era, is you really can't do the agent swarms without data.

>> That's right.

Dave Vellante

>> The other thing is you got to have compute, you got to have data, you got to have all these parameters, and now you're bringing inference to the table. How do you see that playing out? I'm really interested in the computing power that you're bringing as both inference, but don't these agents also have to be trained? Isn't there maybe a new form of training that emerges that I'm getting to? I'm not sure what the lines are between inference and training.

>> Yeah, fine-tuning, that's what it's called. Fine-tuning, when you hear that, it's basically like mini training. That's kind of the way to think about it. Even when you're inferencing, you're going to be doing this fine-tuning as new data comes in because the data in your organization or that you're storing in Vultr's cloud or what have you gets updated, and so you are going to need from time to time to do that fine training.

>> Yeah, and it's really an important point that you make here, which is we almost in the early days of the GPU revolution, we swung back to a earlier time where suddenly we were looking to build these big monolithic, mainframe-style systems where we were scheduling jobs to run training workloads and all of the data was in one big monolithic pool to support that training cluster. I mean that was just 2024. That was just a month ago.

Savannah Peterson

>> I know We were talking about how crazy time is moving right now.

Dave Vellante

>> Crazy time, but it's time to fast-forward from the early part of 2024 and the early part of the market, the 1960s mainframe era of the market and get into modern application development, deployment at enterprise scale, which is distributed computing worldwide. That distributed computing means you have distributed agents, which means you need to have distributed inference clusters, you need to have distributed clusters for fine-tuning, and you need to be able to safely, securely distribute your data to update your agents at all points all around the world. I think that's the power of, I think what Vultr brings to the table with AMD, because we have the global scale as an independent hyperscaler and we are now provisioning integrated AMD stack, CPU and GPU to power all of your applications of all of your agents and all of your AI models worldwide.

Dave Vellante

>> To the point that you're just really emphasizing, this is not a static world.

Dave Vellante

>> No.

Dave Vellante

>> It moves very quickly. I'm imagining, we always talk about human in the loop. It's not going to just be AI taking over and these agents aren't necessarily God agents, they're worker bees. They're going to do stuff that you don't want to do. Go figure it out.

Savannah Peterson

>> They're going to help meet your data where it is. They're going to help meet your customers where they are and then you're going to bring that back and do the right information with that data so that you can achieve results. They're like Santa's elves.

Dave Vellante

>> Yes. They're like Santa's elves.

Savannah Peterson

>> That's a savvy original.

Dave Vellante

>> I love this. We'll stick with that metaphor. When the elves don't know what to do, they ask either the head elf or maybe they go up the chain to Santa, so they bring the human into loop. Where it gets interesting is the agents will learn from the reasoning traces of the humans on those exceptions.

Savannah Peterson

>> Correct.

Dave Vellante

>> The agents don't know what to do, need a human, okay, but then they'll learn from that. That's guess fine-tuning or inferencing or a combination of both. Then you get this virtuous cycle up.

Savannah Peterson

>> The North Pole.

Dave Vellante

>> Up the North Pole. Thank you. Completing the metaphor. Okay, you each get a candy.

Savannah Peterson

>> I'll be here all week.

Dave Vellante

>> You each get a candy cane.

Savannah Peterson

>> If it doesn't work out, sack it, we'll just go to stand up. Yeah.

Dave Vellante

>> Very good.

Savannah Peterson

>> And a gold star for the

Dave Vellante

>> Get a sticker. Have a sticker.

Savannah Peterson

>> Just for that. Yeah.

Savannah Peterson

>> Yeah. I can keep going.

Savannah Peterson

>> I was just going to make the point that one of the things that we really are really enthusiastic about working with someone like Vultr is because we're providing the hardware and our GPUs are not market leader, of course we're aspiring to get there. And so to have and make it as easy as possible for people to adopt our hardware, that's really the goal of what we're doing. Our open ROCm ecosystem, all the drivers, API, everything that we provide, if you're using PyTorch, you can just move over. You take your code, just immediately target it when using ROCm it runs on instinct, but someone like Vultr makes it even easier as you're doing these various workloads.

Dave Vellante

>> I think there's a couple of key points here, which is number one, if you're talking about humans in the loops of the agents, remember, humans are interacting with these agents. If a human's interacting with an agent, that means there's actually an application experience. It's not just this naked model that is just sitting there.

Savannah Peterson

>> That's great point, Kevin.

Dave Vellante

>> There's actually a mobile application that you're actually accessing. There's a website that you're accessing and that application code is actually running on CPU cluster. That's a cloud-native application. It's a containerized code running on Kubernetes.

Savannah Peterson

>> I was just going to say, we were just talking about this, literally just talking about

Savannah Peterson

>> The application is still involved, and so you need your AMD know EPYC Compute clusters in the cloud at global scale to actually deliver that application to wherever your consumers or your customers are. At the same time, that application wants to infer a model that our model wants to be co-located to that application code, so the container that's running in the CPU cluster needs to be next to the container that's running in the GPU cluster in order to have a good, good experience. I think the key point here is that all of these applications, every application, every website that's ever been built is going to get rebuilt, and GPUs are going to be at the core, inference is going to power it. The key is we have all of these developers all around the world and they're not AI engineers right now. They're not AI engineers. They need to learn a new set of skills. I think the power of the ROCm open ecosystem that AMD is building is to lower the learning curve to get a whole new class of developers that today can build a web application but they don't know how to build an AI application. You saw them all at CubeCon. Every single one of those people at CubeCon needs to learn how to build on a GPU.

Savannah Peterson

>> Absolutely.

>> I think what AMD is really doing really nicely is that building that open ecosystem, lowering the learning curve for developers all around the world.

Dave Vellante

>> Reason of why this is so important, and you CubeCon CNCF-

Savannah Peterson

>> Yeah, I've got a great analogy for this one, so you go ahead. You call in your dev team.

Dave Vellante

>> It's got something involved with Santa.

>> Yeah.

Savannah Peterson

>> I got a quickie. Just went-

Dave Vellante

>> I'm over my head, that one.

Dave Vellante

>> For Christmas, we just want The MI325 Act.

Savannah Peterson

>>

Dave Vellante

>> 10 years ago, I'm going to to blow your minds here. When you think about the universe, 10 years ago, John Furrier made a phrase, "Data as code." And I was thinking about that like, wow, that's impressive. Last week on the Cube Pod he said, "Business as code." I was like, oh wow, that's interesting. Now tying it back into agents, the business processes, so many of them are not automated. Extend it back to agents learning and the humans in the loop, we're going to be creating new processes on the fly.

>> Yeah, that's right.

Dave Vellante

>> I want to get an outcome, figure out, AI, the best way to do it.

Dave Vellante

>> You're 100% right.

Dave Vellante

>> Give me a plan and I'll give you feedback on how feasible-

Savannah Peterson

>> It's not even just processes, it's new problems. We're answering different questions.

Dave Vellante

>> Yeah, totally.

>> I mean, it's almost like route optimization when you're trying to get somewhere. Google Maps, it's going to dynamically try to figure out what's the best route to get you to that outcome given current traffic conditions and road closures and so on and so forth. To your point, I think it's a brilliant one, which is like, look, we're going to try to achieve this business outcome. We might not even know the question to ask.

Savannah Peterson

>> Exactly.

>> It might actually guide us in the right questions to ask to solve the problem. It might just go and give us the answer to the problem. In either case, we're going to be able to be much more outcome oriented, and this is the benefit of the agentic AI revolution. Because you started talking about now we're going to actually start actually delivering real results at enterprise scales, well that's because we're moving into this agentic AI era and it's now hopefully powered by a new world-class infrastructure by the two of us.

Dave Vellante

>> I'm sorry, I'm so excited about this I keep interrupting you.

Savannah Peterson

>> Yeah. I'm a woman detective.

>> It's fine.

Savannah Peterson

>> This is not the first time but listen for the analogies, mate. Listen for the analogies.

Dave Vellante

>> You carried me through the last week. I'm just catching up. You're talking about the outcomes. There's top-down goals of an organization that the systems are going to be able to interpret, infer and those are going to be guidelines and you're going to optimize for those. To me, that's mind-blowing.

Savannah Peterson

>> I just want to give you a shout-out with some numbers. This is why Vultr's in such a unique position and honestly why I trust Kevin's hot take every year and look forward to it, is because you've got 1.5 million customers in 185 different nations.

Savannah Peterson

>> That's right.

Savannah Peterson

>> Which tells me you've got a true lay of the land. I mean you are like Santa's workshop to a degree. You're talking about route optimization. You've got to think about how to get around the world in front all these people all the time.

Dave Vellante

>> That's right.

Savannah Peterson

>> When it comes to that up-skilling, I got some more analogies for you, it's the difference between cooking and baking. Baking is science. You put something in the microwave versus the chemistry of baking. We're kind of at that stage here, and I feel like you're a wonderful team for Christmas dinner for delivering between AMD's hardware and solving these problems with all the wonderful things that y'all are doing. I got to ask, and you kind of said this but I want it on cam so we have the sound bite, 2025, is it the year of agentic AI?

Savannah Peterson

>> 2025 is the year of agentic AI.

Dave Vellante

>> '26, '27, '28 will also be the years-

Savannah Peterson

>> You're not just showing us you can count.

Dave Vellante

>> That's right. Thank you for pointing that out. The reason I say that is because a lot of people last year were talking, even earlier this year talking about single agents. Now people are talking about multiple agents.

>> Multiple.

Dave Vellante

>> You're also talking about agent control frameworks.

>> That's right.

Dave Vellante

>> That's actually a phrase that we kind of created out of thin air, but others have different names for it.

Savannah Peterson

>> That will be coming. That is 2026. Don't auger the future. Remember, I have to come back here.

Savannah Peterson

>> Right. That's going to give us our hot take next year.

Savannah Peterson

>> That's the hot take for next year.

Dave Vellante

>> The hard part is data.

Savannah Peterson

>> Right.

Dave Vellante

>> Harmonizing all the structured data, the unstructured data, the JSON data, the graph data, the SQL data.

>> That's right.

Dave Vellante

>> That's really hard and that's why I think it's going to take 5 to 10 years for this whole thing to play out.

>> On the data side, I mean you have to remember that's where all of your security, governance and your compliance starts really coming into play. It's really important when you're deploying these agents and you're doing real-time updates, your vector stores for your RAG applications, and you're doing real-time updates of your training clusters for fine-tuning your models. It's very critical that you have safe, secure, dedicated network connections, that you're meeting all of the local data residency and data sovereignty requirements. This is kind of like our specialty here at Vultr. Without going into specifics, what we excel at is making sure that you can meet all of your data residency, data sovereignty requirements and set you up with a hybrid multi-cloud architecture so that at no point in time are you ever at risk of your data actually leaving your dedicated network pipes and being exposed on the public internet. Training, tuning, inference, imminently safe in Vultr. We encourage everyone just to go to Vultr.com to learn more about them.

Dave Vellante

>> Really important point.

>> Really important.

Dave Vellante

>> Because if you think about the way data works today, you have operational systems factored and then you take all that data, the exhaust of that, and you stick it into a data lake or an analytic system and you say, "Oh, it's got to be governed. I need metadata to that. I need technical metadata, I need operational metadata, business metadata."

>> Right.

Dave Vellante

>> Then as soon as you do that, it's out of date, so you've said something really important, it's real time.

Savannah Peterson

>> Real time.

Dave Vellante

>> That is a really hard problem.

>> Correct.

Dave Vellante

>> All the business logic is trapped into the operational systems and all the metadata, but that's going to be solved. It's going to take time, but that is a completely-

>> That's right, and we're here to help to solve this. This is why we have our managed Kafka service. This is why we have our serverless infrastructure to do the real time data updates in a safe, secure, dedicated manner to actually feed all of your RAG applications so that... We work with AMD in a lot of hardcore industry use cases across manufacturing vertical, telecommunications vertical healthcare life sciences vertical, and a lot of that real-time operational data that's coming in from drones or other IoT devices, it needs to get processed and it needs to real time update all of your agents. How do you do that safely and securely? This is some of the things that we're pioneering as we go to market together this year because it's absolutely critical. This is where we have key services, turnkey services, that can make this a reality today.

Dave Vellante

>> This is the vision of a digital representation of an enterprise, people, places and things. Not strings that databases understand, but stuff that business people understand.

Savannah Peterson

>> Absolutely.

Dave Vellante

>> Real time.

Savannah Peterson

>> Going back to business as code.

Savannah Peterson

>> Real time.

Dave Vellante

>> View of your business, finally.

>> Right.

Dave Vellante

>> It's going to happen.

Savannah Peterson

>> We're modernizing. It is. I think we're all happy about that. All right, you teased this, but I need the sound bites from both of you. Ted, now you've proven yourself, so you're going to become a regular just like Kevin is. Last question for you. When we're hanging out at X Supercomputing, what do you hope to be able to say then that you can't yet say today?

Dave Vellante

>> A year from today, I hope to be able to share with you next generation architecture that's going to unleash the next level of innovation for GPU infrastructure, AI and HPC.

>> Awesome.

Savannah Peterson

>> Beautiful.

>> Little leg there.

Savannah Peterson

>> Oh, yeah.

>> Love it.

Savannah Peterson

>> Kevin?

>> Next year I hope to show you what a truly composable cloud infrastructure looks like, where you can have single click deployment of pre-composed stacks to solve for any use case across every vertical. Name a use case, drug discovery, single click deployment of the entire global architecture, the entire hardware software stack, and all of the models pre-configured ready to roll in under 10 seconds.

Dave Vellante

>> Whoa.

Savannah Peterson

>> Single click is a strong... Kevin likes to make bold claims around the cube, so I'm inspired.

>> I saw someone by the name of Nathan Golding in that hot meeting, and so luckily both Nathan and I have a pretty fun year ahead of us.

Savannah Peterson

>> Yes, and it's possible he also talked about single clicks.

>> Oh, he did, did he? Well, you're going to have to see us at CubeCon next year.

Savannah Peterson

>> Yeah.

Dave Vellante

>> Guys, that was fun, thought-provoking. Really appreciate you guys coming on.

>> Thank you so much.

Dave Vellante

>> Thank you.

>> Thank you.

Savannah Peterson

>> What a sleigh ride.

Dave Vellante

>> Thank you.

Savannah Peterson

>> What a sleigh ride, gentlemen. That was absolutely fantastic. Ted, Kevin, thank you so much.

>> Thank you so much.

Savannah Peterson

>> We really do appreciate it.

>> Wonderful to see you guys.

Savannah Peterson

>> Dave, what a joy. We're really starting to have some fun up here.

Dave Vellante

>> Rollin now.

Savannah Peterson

>> I know. Yeah, watch out.

Dave Vellante

>> Not even half way.

Savannah Peterson

>> I hope you're having half as much fun as we are up here in Atlanta, Georgia at Supercomputing 2024. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.