SC24 | Steen Graham, Metrum AI & Manya Rastogi, Dell Technologies

Clips
News
More from SC24

Steen Graham

CEO

Metrum AI

Manya Rastogi

Technical Marketing Engineer

Dell Technologies

Scalable AI: Open-hardware solutions driving the next wave of AI innovation

Silicon diversity is redefining the future of artificial intelligence by addressing critical challenges in performance, cost and scalability.As AI workloads grow increasingly complex, spanning inference, training and multimodal applications, the need for adaptable, open hardware solutions is at an all-time high. This shift prioritizes flexibility and efficiency, allowing enterprises to choose the best tools for their specific needs while driving innovation and resource optimization across industries, according to Steen Graham (pictured, left), chief executive officer of Metrum AI Inc., which is partnering with Dell Technologies Inc. on AI workload innovation

play_circle_outline Optimizing AI Workloads and Testing Quality Metrics with 'Know Your AI' Platform

play_circle_outline Dell PowerEdge XE9680 offering with Intel Gaudi 3 Accelerator for silicon diversity.

play_circle_outline Development of autonomous AI agents for customer service and decision-making tasks.

play_circle_outline Dell Systems praised for flagship eight-way system

play_circle_outline Future hardware offerings from Dell Technologies and Intel

Info
Transcript

Steen Graham, Metrum AI & Manya Rastogi, Dell Technologies

Steen Graham

CEO Metrum AI

Manya Rastogi

Technical Marketing Engineer Dell Technologies

SuperComputing 2024 is underway in Atlanta, Georgia. John Furrier and Savannah Peterson are discussing AI workloads and hardware advancements. Cine joins the conversation to talk about Dell's new solutions and the importance of knowing the fidelity of AI before deployment. They also discuss the PowerEdge XE9680 offering with Intel Gaudi 3 Accelerator and the benefits of optimizing this combination for cost reduction and efficient cluster building.

The conversation then delves into the importance of diversity in AI choice and the need to prioritize work... Read more

explore Keep Exploring

What factors do enterprises consider important before deploying AI technology? add

What is Dell showcasing at SuperCompute and what are they planning to make generally available to customers in December? add

What advancements have been made in the software industry in recent years? add

What are some reasons why Dell's XE9680 system is considered a top choice in the market? add

What are some upcoming hardware offerings from Dell Technologies and Intel for next year? add

bolt Powered by CUBE AI

Steen Graham, Metrum AI & Manya Rastogi, Dell Technologies

search

Savannah Peterson

>> Good afternoon, nerd fam and welcome back to Atlanta, Georgia. We are here midway through day one of our three days of coverage at SuperComputing 2024. My name's Savannah Peterson, delighted to be joined by John Furrier for this afternoon's content. John, this is one of our favorite shows.

>> Yeah, hardware is the advanced feature of this show. Hardware advancements, clustered systems, and people want faster, faster, faster horse-power.

Savannah Peterson

>> Absolutely. And one of the things we're talking a lot about is AI workloads. I'm very excited to have this next conversation with a CUBE regular and VIP. Cine, welcome back to the show. Great to see you.

>> Thanks for having me. Really thrilled to be part of the nerd fam, as you said.

Savannah Peterson

>> Absolutely. This is the ultimate nerd fam.

>> Right? The SuperCompute is the best. Brilliant people, brilliant innovation. Doesn't get better.

Savannah Peterson

>> Yes. And Mania, thank you so much for coming to hang out with us.

>> Absolutely. Thank you so much for having me. Thanks for the opportunity. It's always fun with theCUBE. Yeah.

Savannah Peterson

>> So Cine, we're going to open with you just because we've been talking about AI workloads all day. When we're talking about measuring hardware and optimizing, that's your bread and butter. What's happened since the last time I saw you here on this stage?

>> Well, we've been building a ton of new solutions and advancing our product portfolio. I think the one thing about enterprises today, before they deploy AI, they really want to know what the fidelity of it is. Both from all the performance metrics, we love throughput and latency, but the quality of the AI. So we've been thrilled. We actually announced our new know your AI platform where we test the AI in development and in production for those domain-specific quality metrics as well as those typical metrics we all love at SuperCompute, like latency and throughput as well.

Savannah Peterson

>> That is really exciting. I know that we've got some PowerEdge party to talk about.

>> Absolutely.

Savannah Peterson

>> Mania. Tell me what's going on.

>> So yeah, so Dell here right now is at SuperCompute. We have a great booth. Everyone is welcome to join it, but-

Savannah Peterson

>> It was packed, by the way. I walked by to do a little drive-by and I couldn't even get inside.

>> It's a big booth and it's packed. But the main thing I want to talk about today, we together with Cine, is about the Dell PowerEdge, XE9680 offering with the Intel Gaudi 3 Accelerator. So that's one of the latest things. We have already started making it available to our limited customers, then come December, and we are targeting making it generally available to everyone. And it's just part of the messaging for Dell with silicon diversity. We want all our customers to have every single choice that they can have with the XE9680 servers, and that's where this comes in.

Savannah Peterson

>> What are the benefits of optimizing that combination?

>> So, few things. That's a good question. There are few challenges which exist in the industry today. What Gaudi 3 with Intel kind of solves is that okay, you just don't have any more like one GPU, you have choices. So, customers are not stuck in with, "We have to go with this." Then second thing, it's basically an OAM, which is open compute accelerator module, like the card, the GPU card. And it ultimately builds up to a big board like eight of them, which are in the XE9680 server. So basically, it's a way out of the proprietary networking and software so people can move out of it and the networking is kind of in it so you don't have to add additional, which ultimately leads to other benefits such as cost reduction. And third thing I would say, all this networking also provides you an opportunity for scale out, like building big clusters and in an efficient way, in a cost-efficient data center that you can get out of it. So, that's three things.

>> You brought up silicon diversity. One conversation we're having with AI now is choice, whether it's model choice or now workload performance. As performance becomes a cost resource allocation challenge, diversity is more important than ever. This is the top story because you don't want to run a super powerful thing on something that doesn't need it, but they're all connected. Can you guys share your perspective on this because this is the nuanced point I think that highlights why diversity matters. And then how do you verify which workload's best for Gaudi or a super cluster or this or that? What's the key? Take us through that.

>> Do you want .

>> Yeah, I think, I mean you're absolutely right. There's choice in the market, which is fantastic. There's a ton of innovation in the market and I think for us, whether some GPUs will have different memory footprint size. That's really important from a functionality perspective. I think if you're driving your two-door, 911 Porsche, you can get a couple people in it, it goes zero to 60 really fast. If something's a four-door, I can carry a few more models in it across that memory footprint. And then there's a different TCO story. So as you think about memory footprint, you think about pricing and you think about raw performance, you can put all those together to provide choice in the market as well. So that's typically what we look at. And we've had the opportunity to run a ton of code on Gaudi 3, both traditional leading performance metrics like running VLM-based inference, and we're seeing it in line with Intel's published estimations, which is great to see because they're showcasing the TCO story on that front. And that price point, that TCO story, that drives another opportunity for us to provide more innovative, affordable models in the ecosystem and new enterprise AI deployments as well.

>> And you have existing investments you've made, so adding onto that will help. We see that come up a lot by adding on the cost side. But what about the workloads, specifically? I mean, is it inference that's the killer app? Is it more training? How do you guys look at the use case workloads on this performance?

>> I mean, I can give you a perspective from what Dell sees from the customer side of view. It is like, yeah, that's what you said. Influencing, fine-tuning or distributed fine-tuning, those are the main AI/ML/DL kind of workloads that everyone is expecting. And that's what we are working with Cine on, testing some of that. We have some data for our competitors, like part of the silicon diversity with other GPUs. So I mean, those are the main workloads that we are trying to address right now. And then RAD comes into picture and something specific called agentic RAD and I am pretty sure Cine can educate you all about it, but those are some cool applications that can be built on top of that and that can not just a basic simple RAD, which we did two years ago. We have moved ahead and all that work is possible. That's the proof of life. You can come in the Dell booth, it's right there. That's I guess probably the first one which exists that, I don't know if anyone else has done it yet, but that's there right now so you can see it live in action.

Savannah Peterson

>> Wow. Definitely going to have to go see it live in action. Since she just teed you up, Cine, I'm going to let you go ahead and take it away and tell us all about agentic RAD.

>> Well, I think yeah, mean it's a great question. Is it inferencing? Is it fine-tuned training? The answer is both. But I think where the industry has gone over the last couple of years is we moved to more systematic software and compounded software and that's where we add the embeddings models, we add multimodal models. And what we're actually showcasing live is kind of the next evolution of that, which is autonomous AI agents or digital workers. And so you see live in the Dell booth, we've got multimodal capabilities, we've got voice capabilities, but we've got an AI agent that's doing customer service support for an internet service provider. And that AI agent can use chain of thought reasoning to do things like look at what plan you have. If you're saying you've got poor internet connectivity, it'll upsell you a plan based on chain of thought reasoning. It'll create and automate a ticket to upgrade your plan as well. If you're having network connectivity issues, it'll issue a work order to repair your network as well. So, these autonomous or semi-autonomous AI agents that we can prescribe, they're really a great platform to run on top of a Gaudi 3. And I think you guys have probably heard Zuckerberg talk about how there's going to be more AI agents than humans at some point in time. And so what we do need is a few chips out there to run some AI agents set with choice in off-com and off-the-shelf hardware like the XE9680 provides us that choice as well.

Savannah Peterson

>> I want to hang out here for a second because you just brought up a couple of good points. I mean, Salesforce just announced they want to have a billion agents within the next 365 days. Powerful, so you're not wrong. Like you said, there's got to be something to run it on. And you also just brought up some interesting customer use cases there and I'm wondering if you can tell us, and this is a question for both of you. I'm very curious. I love your car analogy by the way, the two-door speed racer versus the four-door or the minivan for example. What are some of the trends that you're seeing across the industry in terms of verticals? Who needs what car, I guess is what I'm asking?

>> The biggest thing in the market right now is I think what we're seeing is there's a lot of batch-based processing that can be done in the AI workloads. And that's where, for example, we're also demonstrating a use case around legislative bill insights. This is a use case that the state of California actually put up in the market and said, "Hey, one of the biggest problems that we have is we get new legislation and then we have to do so much work. Each committee has to do an evaluation of that new legislative in a short period of time." But that's a workload that you can do overnight, that we don't want a human working overnight to get the environmental impact analysis done, but the AI agent can work overnight as well. And so that's where you're not looking for as much latency as you would a live chatbot where you have a human in the loop interaction. And I think what we're seeing with enterprises is they want to run some of these tasks. They want to pre-program AI agents to get things done, and then they want to loop humans in the loop later for quality assessments as well. But then humans don't need to be in the loop for every token, every second, which is I think what we're all hoping for AI to do some work for us. That we don't have to sit there with it and co-pilot all the time. Sometimes it's got to fly the plane and we'll make sure it's on the right track.

Savannah Peterson

>> I'll just add in terms of the different industries, you asked what different verticals. If I mentioned we have done a lot of stuff on manufacturing or retail or or healthcare and then internet service provider, that's one of the latest demo that Cine has done for us. So, we are hitting on all different industries, what they need depends on what kind of application they want to do. But yeah, I guess we are right there moving with the industry in that direction, trying to solve one problem at a time.

>> When you brought up the car thing and said mini-van thing, I thought about other van, like van life.

Savannah Peterson

>> Yeah, my camper van. . I didn't even think about that.

>> But it brings up the use cases of speed and I want to see if you don't mind sharing your perspective, because there's Llama benchmarks in there which come out great on the performance, but the developers are starving right now for performance. What's going on in the developer community? Because they're the ones going to build the software and they're going to want to have their own system and they want to have the car, they're going to want to have the van, they're going to want to have the diversity of processing and horsepower, so to speak. How does it translate to their environment?

Savannah Peterson

>> Yeah, I mean, it is an amazing time for developers because we have so much choice in the marketplace right now. But the real challenge is because you've got that much choice, you're constantly looking at different things. And so what we find is different vector databases work really good for different tasks or sometimes you've got to add a graph database in. We're always trying new embeddings models based on domain-specific information as well. And so every component of that modern AI agent stack needs to be kind of evaluated and assessed for that particular use case as well. So, that choice leads to a little bit of paralysis, but that's the thing you're going to want to pack in your minivan is you're definitely going to want your embeddings model. We're definitely going to want to choose a good vector DB or a good graph DB, right? And we're going to need to think about what our agentic framework is too, if we want that minivan to operate seamlessly overnight as well. And so these are all things that you need to think through and I think it's great we have a lot of choice, but there's a lot of confusion and you want to pack all these things together in a compound systematic way. That's really important for developers today.

>> But yeah, I mean I guess we have something like this with Gaudi 3 offering. I would just say something to make it simpler for customers to decide, the performance... I think the main differentiator that we need to mention is the performance per dollar. So overall, it's kind of public news when Intel announces that with Gaudi 3, it's cheaper than the competitors, the GPU that are in the market and the performance we are trying to see here, it's on par with what we see with -

>> And when's it going to be available? Because I was in New York and I was at a meetup and I heard two founders talking as I was going down the hallway. I heard them say, "I can't get any GPUs." I mean, they're startups, they need more... And this is common across larger companies. It's hard to get the GPUs to build on.

>> That's true. And I guess they we know all about it. It's so hard to get just the servers and the GPUs where everyone, for our own testing right now, it's kind of limited.

>> So, when's it going to be available?

>> So it's available, it will be shipping in December. And yeah, like I said, one of the biggest things you can get with it is the price point. It's a great price point for customers to get an experiment about it, like the startups you mentioned. Maybe if you don't want to get stuck up with expensive setup, this can be another choice. And, yeah.

Savannah Peterson

>> I can imagine you both probably have to give folks a lot of advice on which car to buy in this circumstance. I mean, do you have blueprints or matrices for, let's say I'm an enterprise right now listening to this conversation saying, "I don't even know actually exactly where to start." I mean, everyone says they're doing AI. We all know we're not quite at full scale across every enterprise. How are you up-skilling folks and understanding the different trade-offs there?

>> It's not easy, of course, I would say. It's not like, okay, I can make few differences right away. The idea, this is better, this is not better, this not better. No. I guess the overall AI is growing up building on top of one another. It's been some time, it will take more time, we get to a point where we can say... That's my view, that you can find out, yeah, this one choice for one thing. But I guess at this point it's more about let's find out, let's do the work, what works best for which thing on AI. Like fine-tuning, if we are doing training, if we are connecting them together and building up a cluster, this one works better or not. So yeah, I guess that's where we are going to find out later.

Savannah Peterson

>> I think the other thing is Dell does a great job, Dell has Dell validated designs, and so they do have reference implementations of what they put in the market. And in addition to that, when we partner with Dell, we put the reference code in the market as well, so through Dell AI GitHub repo.

Savannah Peterson

>> Cool. Yeah.

>> And so in that reference code, you'll also see these points. To your point earlier, it's no longer about the Llama 3 performance numbers. It's about the full AI agent's performance on top of that agentic RAG stack, which is so much more software than just the LLM, is doing a small part. And so you kind of need to know that full architecture, we're providing the solution code, we're providing as a reference. And so that really enables people to think through like, "Hey, it didn't take us years to build an agentic solution that works for internet service providers on Gaudi 3. It took us weeks." And I think other people that don't have our expertise, we're going to put it out in the open so they can actually reference that and build great solutions on Gaudi 3 or other new entrance to the -

>> What about your solution, what you guys do and for the audience watching, and your value proposition?

Savannah Peterson

>> So for us, I mean we have a great enterprise gen AI platform that we build a ton of different vertical specific solutions. So for companies that believe that they have proprietary IP, that they've got a great workflow and that want to get AI agents out there to save their employees time and delight their customers, we offer vertical industry solutions for that. And one of the things that we had to do to make high fidelity AI is we had to build a performance testing test suite for all these new techniques. Not just the LLMs, the multimodal models, the vector DBs, the graph DBs, the agent frameworks. Because they all perform different both from a throughput latency affordability perspective, but also from a quality of the AI based on those domain specific metrics. And so you have to use different tools for different jobs, especially for independent businesses. And ultimately that's what we're working on.

Savannah Peterson

>> Given them the... Oh, go... Did you want to add?

>> I wanted to add, that's like the work that Cine be doing. So that's part of the whole thing together as partnership. There is a lot of performance and then we also get metrics like what's the power consumption, what's the memory usage, what's the CPU utilization, GPU utilization? So I think that all builds up once you ask the question how customer can decide. I think that's another thing. What's the performance? What's my power budget? What's my price budget? And then which one will look better? So, I think that's all part of this effort that will come together. Finally, we'll be able to get to that.

Savannah Peterson

>> I love to hear it. Cine, how do you prioritize what systems to measure first when you're doing this? I mean, the velocity of the market is crazy and you have to be looking at a lot of different factors.

>> Yeah, I mean I think we're mostly just based on particular industry-specific use cases that we're building out. And sometimes you'll be like, "Hey, there's certain vector DBs that are really, really important for time-sensitive material if you're pulling time-sensitive queries, and some just need to be right all the time." And so you can kind of infer based on the use case, but you're really trying to get that high fidelity, "How do we get from 99% to 99.8%?" Because for enterprises, they're not ready to deploy unless they see really high certainty. And the cool thing about what we help them with is continuously monitoring the domain in development and in production. So, we create scorecards, like you would get in middle school that comes back and says, "Hey, here's how the AI is doing. Now, here's how it did in development, here's how it's doing in production." And you're constantly asking it new questions independently. So, there's a bunch of tests for the AI that's happening behind the scenes to make sure it's giving you the right answers all the time as well. The continuous learning is always there. Luckily we have software to do that because the human part of that continuous testing is endless permutations at this point.

Savannah Peterson

>> So, what grade do you give Dell Systems?

>> I mean, Dell's are A+. I mean, the XE9680 is like the flagship eight-way system in the market. And they got to market early with it, right on time for the gen AI boom. And they offer choice across all three major providers in the market. And now with Gaudi 3, they've got that preeminent third choice. And so if you want to make the right selection between your workload, you've got the choice across all those modalities. And I think it's really important for the marketplace because choice leads to innovation and we'll continue to innovate on all three platforms.

Savannah Peterson

>> We're all about that. All right, I have one final question for you, Cine. I'm sure you've heard me ask this before, so I'm going to you first.

>> Okay.

Savannah Peterson

>> You'll both be back on this stage this time next year because you've just smashed the panel, as I expected. What do you hope to be able to say this time next year that you can't yet say today? Cine, I'm going to you, like I said.

>> Well, I think right now with AI, we've really kind of optimized software in a great way and we're building this really systematic software with AI workers that will save people material time and ultimately drive top line revenue and getting enterprises to really high fidelity solutions in that regard, I think is part A of my answer. I think part B is I think we're seeing a lot of evolution around the new modality of AI, which is like robotics. And I think everybody complains that, "Hey, I want to write my own poetry. I don't want the AI to do that. I want the AI to do my laundry." And I think you'll see an emerging new modality of physical world implementations of AI paired with our modern, new generative AI inspired techniques that I think humanity will come to realize, like this is a massive time saver and it's going to make my life easier and really deliver on the-

Savannah Peterson

>> .

Savannah Peterson

>> Yeah, exactly. I'm with you, Mania. So Mania, what about you?

>> I'll just say from hardware perspective, from Dell Technologies, next year, definitely we'll be having more offerings. Look forward to Intel offering the greener and Gaudi 3 in a PCIe four-factor. Right now it's the OAM part, and now we'll have this next year, probably sometime mid-next year. And yeah, probably new innovations happening on top of that. So, that's one key takeaway that that's the new Intel Plus Intel server that's .

Savannah Peterson

>> I love it. I can't wait to hear all about it. And who knows, we might have full takeover by agentic AI at that point. It's going to be a wild year. I feel like between the next 12 months, anything can happen. Cine, Mania, thank you so much for taking the time. This has been a wonderful chat as always. And John, always a pleasure.

>> Yeah.

Savannah Peterson

>> And thank all of you for tuning in wherever you may be on this beautiful rock. We're in Atlanta, Georgia here on day one of SuperComputing 2024. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.