SC24 | Sam Werner, IBM

Clips
News
More from SC24

Sam Werner

Vice President, Storage Product Management

IBM

IBM advances AI efficiency with intelligent storage innovations

The role of AI storage solutions is more critical than ever in today’s evolving artificial intelligence landscape.With enterprises navigating increasingly complex datasets and energy-intensive workloads, innovations in storage technology are essential to support high-performance computing and AI, according to Sam Werner (pictured), vice president of product management at IBM Corp. The transformative potential of intelligent storage systems is reshaping how organizations optimize AI workloads and enhance overall performance.IBM’s Sam Werner talks with theCUBE about innovative storage solutions’ impact on AI workloads and the future of scalable, efficient data management

play_circle_outline Unlocking Limitless Potential: IBM's Storage Scale Revolutionizes High-Performance Computing and AI Workloads with Yottabyte Scalability

play_circle_outline Integration of new gear and existing storage with IBM Storage Scale solution.

play_circle_outline Enhancing AI System Reliability Through Automatic Recovery and IO-500 Benchmark Performance Improvements

play_circle_outline Pushing intelligence down to storage for data optimization

Info
Transcript

Sam Werner, IBM

Sam Werner

Vice President, Storage Product Management IBM

Live coverage in Atlanta, Georgia for Supercomputing '24. Hosts John Furrier and Dave Vellante. Sam Werner, VP at IBM talks about the importance of storage in AI and HPC. Storage vendors focus on efficiency and speed due to power hungry GPUs. Exabytes of data requires efficient storage solutions. IBM's Scale system supports high-performance computing and AI, offering fast throughput and scalability. The system's efficiency and caching capabilities enable high performance and cost-effective storage solutions. IBM's focus is on making storage easier and more ef... Read more

explore Keep Exploring

What is the IBM product named IBM Storage Scale and what capabilities does it have in terms of high-performance computing and AI? add

What new gear is being introduced that offers the best storage in the industry, starting with potentially a 500 terabyte box, with support for existing storage and NVIDIA data flows? add

What is the company's focus on simplifying AI integration and improving performance benchmarks in the world of high performance computing and enterprise applications? add

What is content-aware storage and how can it optimize data management by pushing some jobs down into the storage? add

bolt Powered by CUBE AI

Sam Werner, IBM

search

>> Welcome back everyone to theCUBE's live coverage here in Atlanta, Georgia for Supercomputing '24, SC '24. It's shorthanded down. It's been around since 1980. I'm John Furrier, host of theCUBE with Dave Vellante and my co-host also, theCUBE Pod every Friday. Go check it out this Wednesday. We're pre-recording it, we're on the road. Dave, it's been quite the show. Three years watching this evolution. We saw four years ago, all the recruiting was being done here for all the top engineers, machine learning, AI was coming into the scene. HPC was moving the needle inch by inch, and then all of a sudden game changing inflection point. And the data's coming in, the infrastructure's leveling up, new software's coming in, architecture's leveling up again. It's an ongoing advancement and storage is a big part of it. I called it super storage, super networking, super cloud, super apps. Storage is a big conversation. Let's get into it, Sam Werner, vice president, product manager at IBM. Same, great to see you.

Sam Werner

>> Great to be here.

>> So we were talking before we came on about the role of storage, and I think one of the most important things besides power cooling from making racks not melt, is storing the data. Because having that data and the data platforms, that's where all the action is right now, and it's really not just about blob storage or just object store, it's just a bigger picture. Gen AI needs that data in low latency. This has been the top conversation on theCUBE for a year. This is where UIs are. Give us your take on that storage paradigm as this new era unfolds.

Sam Werner

>> It's a great space to be in right now with everything happening with AI. And if you've ever been to the super computing event, you'll notice it's also a storage show, that all the big storage vendors are here talking about what they do. And there's a reason for that. You have to be able to store all this data very efficiently. You've got to have low power and cooling costs because you have to make way for all the very power hungry GPUs required to do AI and super computing. First of all, you have to have very efficient storage. It has to be super-fast because you don't want these GPUs sitting idle. They're very expensive. And then you have to be able to organize massive amounts of data to get any real value out of AI. So, there's a lot of challenges that storage has to address to make this work.

Dave Vellante

>> And this show, we were just talking about, it's GTC like. Sam, you call it the open systems of GTC, and the other one's NAB. That's another big storage show, isn't it? I mean, it's just, sometimes with all this compute, we get lost in the fact that we're talking about petabytes upon petabytes of data and it's just exponentially growing.

Sam Werner

>> That's right. I mean, we have customers at exabytes of data and growing, right? And with a lot of these new AI models, people want to keep all the data forever too, so you can trace back how you came to conclusions you did. So that, you even bring in tape storage, which we also provide, and we can give you long-term archival of this data at the absolute lowest possible cost when it comes to energy consumption because these are just tape cartridges and the extreme economics.

Dave Vellante

>> I love the sound bite, John, and you've written about this. I have too, as says, GPT-IV was trained on half a petabyte of data. JP Morgan supposedly has 150 petabytes of data. So, that's kind of your hybrid IT, hybrid wheelhouse.

Sam Werner

>> That's right. That's absolutely right. What's interesting is when you look at the big AI models that are out there, they've already consumed all the world's data, right? They've trained on everything there is that you can get your hands on. They've come up with really creative ideas to get more data, and they've even create data in order to have more to train it with. But then you go look at an enterprise. The large enterprise customers out there, their data's not in any of these models. 1% of their data is in those models. So how do they bring all of that data to the models to actually get real value for them and for their customers? That's the next big frontier.

>> And what's the challenge on that? Because that's a huge headroom. We've written about it, Dave, your breaking analysis around comparing Jamie Dimon to Sam Altman and showing that the enterprise opportunity is huge because they're going to have their own models. They have to build their own machines, they need to have their own systems. And we're talking about the classic storage networking and compute now redesigned in this new era. We're calling it, the old chapter is now closed. We're entering in the clustered systems where storage is a big part of it. What is the core problem that you see your customers facing and you guys are solving? How would you describe that?

Sam Werner

>> Well, I mean, I think everybody knows one of the biggest challenges they have, first of all, to decide where they're going to do it. Are they going to do it in a cloud or are they going to do it on-prem? I think they find out the cloud is extremely expensive, and it actually is probably more economical to build their own. So, I see a lot of them looking to build their own, and then they realize their data centers aren't really up to the task. In a lot of these cases they don't have, first of all, if you've walked around, you notice a lot of liquid cooling happening in this place.

Dave Vellante

>> Back to the future, Sam.

Sam Werner

>> Back to the future.

Dave Vellante

>> It's a hot area, as we say.

Sam Werner

>> We've been doing water cooling a long time at IBM. But yeah, I mean, so they have challenges with power. They have challenges with having a data center that could support water cooling. So they're having to rethink about how they're going to build all this. And I will tell you, the storage piece is extremely important too. You need to get it into a small footprint at the most affordable cost in terms of power, cooling, everything to go with it and then organize your data in a way that you can actually bring it for training. When you train a model, you have to do checkpoints because GPUs fail and they fail a lot. So you have to constantly do checkpoints. And while you're doing a checkpoint, everything stops. So, your training cycle time is elongated if you have long time to write your checkpoints. And that's where storage is so critical. We're right in the middle of shortening down the cycles to do training.

Dave Vellante

>> The scale today is a completely new dimension. So, I wonder if you could talk about how IBM, generally IBM storage specifically thinks about scale, how you're supporting AI scale. I mean, everybody thinks scale, they think cloud, but now when you start bringing this stuff on-prem, help us understand. Because you may not need the giant scale for some of the smaller language models, but at the same time you may. How are you guys thinking about that?

Sam Werner

>> Well, I have a product that's very aptly named for this challenge, which we call IBM Storage Scale. And it dates back to the very beginning of high-performance computing. It was the file system for high-performance computing, used to be GPFS, right?

Dave Vellante

>> GPFS, yeah.

Sam Werner

>> And yes, GPFS is part of scale, but it's so much more than that. We've been working on it for years and years, and it is the fastest storage solution in the industry for running high-performance computing, and now AI. I mean, we just published our benchmarks that we did with NVIDIA GPUDirect. We're now certified with our scale system 6,000 for the NVIDIA SuperPODs, and we're getting 310 gigabytes a second throughput for reads and 155 on writes, which is by far the fastest in the industry for those benchmarks. But we can start you, to your point on scale. We can start you in a little two box. I mean, we're starting now at 500 terabyte range. Very affordable to get started with AI, but this thing can grow, I mean, Scale can grow to yottabytes in its file system.

Dave Vellante

>> Can you talk about how you're doing that? If I may, just how you're getting that level of right performance. What's the caching look like? What's the state of the art today for lowering latency?

Sam Werner

>> Well, I mean, first of all, Scale is an extremely efficient file system that we've been innovating on for years and years to get down to hardware limitations on performance. We have so little overhead in our path. And yes, we use cache and we go up to three terabyte cache in our 6,000. So you can get to extremely high performance, but one of the great things we also allow you to do is we'll put in NVMe drives in the main system to give you super high performance, but then you can put racks and racks of spinning disk behind it so you get super cheap economics, right? Because spinning disk is still, I mean, a fraction of the cost of even QLC drives, a quarter of the price even less for spinning disk. So, we can do racks of that. We run all the performance through our NVMe drives, and we have the intelligence in it to manage the data onto this back end of spinning disk. And it's extremely parallel. It's a parallel file system, so we're able to do all this IO and parallel, so you're able to take advantage of all of the IO paths you have. We take advantage of every lane. We're now in PCIe Gen 5, and we're able to completely peg the throughput on the-

Dave Vellante

>> So, it lends itself to AI, John.

>> Yeah, I'm glad you asked that question because I was going to talk about the multimodal data coming in. Computer vision is massive, so they need huge horsepower on that. Storage is critical. Search is a killer app. You're seeing retrieval augmentation generation or RAG as the entry level, but that's not going away. I still want to find stuff and get all that data ingested into vector embeds and the format of the neural network. Talk about the performance that you guys are seeing and what your offering. I think storage can't be overlooked and it's not overlooked. And it's for the critical area for making gen AI work. Computer vision and the search paradigm, because they're coming fast and hard. Vision is the world.

Sam Werner

>> Yeah.

>> I mean, you're going to see more camera footage than ever before.

Sam Werner

>> You're so spot on, and I love this topic right here because the way it all works today is you take all... Everybody treats storage just as a fast data provider, and then we'll make sure you have persistence of that data and don't lose it, right? That's how storage is treated today. So everybody is copying all this data to their AI model. And you talked about RAG. People are building these vector databases, they're copying files out of storage onto servers. They're doing all of the vectorization using GPUs, and they end up with six copies of this data to get to this vector database. And customers we work with have only been able to vectorize up to 10% of their data because the vector database gets too big, or they have new data coming in and they have to update it, and you can just never get all your data in there. So, we're changing this paradigm completely. Rather than bringing all your data and building this vector database, what if we did all of that in the storage, put accelerators in the storage products, and actually did the vectorization as data changes? We know when the data changes, so we can actually vectorize it in real time.

Dave Vellante

>> You got our attention.

>> Sam. First of all, yes, I love this because a lot of people look at RAG, retrieval generation, as an easy thing to do, and it's almost trivial to play with it.

Sam Werner

>> Right.

>> Scale, you mentioned earlier, you see things at scale, you don't see, and you're seeing the winners like AWS, NVIDIA, IBM. When you're at scale, you can see problems and solve them that no one else can. This is a huge point in today's modern era today, this post-AI modern era, whatever you want to call it. Talk about the scale piece of it because the search works. As you get down the RAG road with search retrieval, it breaks because it's not optimized.

Sam Werner

>> Right.

>> You're saying you're optimized. Explain that in more detail because I think that's a killer feature for where people break right now in their POCs.

Sam Werner

>> Yeah. And let me take it one step farther in how we do... So I'm going to address your question, but there's another capability we have in IBM Storage Scale. So the thing I was talking about, we call it content-aware storage or CAS, and that capability we're putting into our products, so you can query the itself to get the answers rather than bringing it out and doing it elsewhere. But in order to do that, you have to imagine an enterprise has data all over the place. They have it in maybe in HDFS behind their Hadoop or Spark clusters. They have it in object stores, maybe in the cloud and in their own data center. They have old NAS systems sitting around. I have data everywhere. These files and all my different applications land in different places. It's not practical to say, "Copy everything into one Uber storage pool," right? That's a big challenge to an organization. We don't say you need to do that because we have something in Scale called active file management. With active file management... Scale has a global namespace. We give you one single namespace you can use everywhere and scale forever. But then we can talk to all of your unstructured data sources using our active file management and cache the data in, and we can keep track of changes across all that storage with CAS. So as the data changes and all these different repositories you have, we'll constantly update within Scale and give you this query engine that we call CAS. So, at scale we can do it high performance in our box and we can attach to all your data sources. That's how you do it.

Dave Vellante

>> Are there physical limitations, like distance limitations? To do that within... It's a global system, correct?

Sam Werner

>> Yeah. There's always the performance of distance, but because we cache the data in as it's needed, I mean, we have customers who run Scale all over the world and share data with each other, and we act as an accelerator. I have a customer, for example, that runs high performance workloads in a public cloud, and they have their data in their data center, and they have the data in the object store. They use our caching capability to run the high performance workload in the public cloud, and it caches data out of their data center and out of their object storage in the cloud to give them the high performance?

Dave Vellante

>> And the accelerator that you talked about on the storage, what is that? Is that an ASIC? Is that your design?

Sam Werner

>> No, so that's a great question. We'll support some of the standard GPUs you see in the industry, but also, IBM and IBM Research, we built our own accelerator that we call... Well, I don't know what we're calling it these days. We've called it our AIUs-

Dave Vellante

>> ....

Sam Werner

>> which is IBM inferencing chips.

Dave Vellante

>> Right. Okay.

Sam Werner

>> Yeah. And so we have our own accelerators, and then we'll also use other industry standards.

>> On your scale example with their search, I want to come back to what the impact is to customers, because again, I appreciate and I recognize that that's a great position to be in today. People are in pain, and you got the headroom for the future. Scope the order of magnitude change from a customer experience standpoint, old way versus the new way. Because you mentioned what they had to go through with the retrieval piece. You got the scale. I get that. What's it like for them from a deployment standpoint, how they consume to the new way? What's the customer experience, consumption?

Sam Werner

>> Well, I'll say the biggest thing is that this actually works compared to the way they're doing it today.

>> Exactly.

Sam Werner

>> I know people think they can do it today, and you already made this point. At smaller scale, sure, it works the way they're doing it, but when you get to larger scale, you have so many problems of why you can't do it this way. I mentioned the 10%. The data's always stale because data's constantly changing and you're not able to update your database. I've talked to customers who destroy their vector database every day to start over because they don't know what's changed.

>> Yeah, they tear it down.

Sam Werner

>> So, the only way to do it, tear it down and rebuild it.

>> There's no resilience. None. Zero.

Sam Werner

>> No, and it chews up a lot of GPU power and

>> I'm sold on this, you got to be sold. So thank you very much. Now, there's two scenarios. I'm an IBM customer, so what do I do? What do I have to do to take advantage of this? Or two, I'm not an IBM customer. I have a little bit of IBM storage. What do I do? Do I rip and replace? Or I just install new gear? Is it software? Take me through, I'm trying to figure out the at steady state customer environment.

Sam Werner

>> It is new gear. The good news is, in my opinion, and I am only slightly biased, best storage in the industry, we bring that in. Like I said, you can start pretty small. I mean, a 500 terabyte box maybe, right?

>> Yeah.

Sam Werner

>> You put that in. We'll attach to all your existing storage. You don't have to go rip all your storage out. You're just bringing in a great new experience with Scale. It'll provide acceleration. It's totally certified with NVIDIA. We're supporting the NVIDIA data flows as well as our own WatsonX stuff. So we IBM have our own WatsonX platform, and we support both of the different data pipelines, NVIDIA or IBM's.

>> What are the key product features for you guys on the roadmap right now, that you're prioritizing, and how do you see your growth strategy implementing off that?

Sam Werner

>> You talked about scale a lot in how enterprises... Look, everybody's challenged with skills, so we're really focused on making it easier and easier. And so it's not just about supporting AI, it's putting AI down into our products. Automatic recovery, automatically diagnosing problems. Telling you what issues need to be addressed without you having to have such a huge number of skills to support it. I mean, Scale has a background in high performance computing where people love to play with it. And that's part of the fun of being in HPC. But in this world of AI where people are using it for enterprise applications, they really want it to be simple and hands-off, so we're bringing the AI to the systems. That's a big part of our roadmap. I mean, we're at Supercomputing, we love to talk about performance. We will continue to lead the way in performance improving our IO-500 benchmarks.

>> Yeah, price performance is back in fashion. It's always never gone away. So energy's also a big deal. Sustainability, price performance. These are what the top conversations are here at Supercomputing.

Sam Werner

>> Yeah. And I challenge anybody watching this to go look at the kilowatts required to run our Scale System 6,000 versus the competitors. Look at how much capacity you need to support the NVIDIA published SuperPOD specs or BasePOD specs. Look at our energy consumption versus the competition. We're at least half. And in a lot of cases, depending on which competitor, even more than that. I mean, half against our best competitor. And that's huge because I said it before, the enterprises are bringing in GPUs. They don't have the energy for all this stuff. Let NVIDIA take up all the energy. We'll give you all your data.

Dave Vellante

>> Every little bit helps. We know that compute is the big culprit, but if you run it out of power and you can steal some from storage or you can improve from storage efficiency, why not? People forgot about it when we went from spinning disk to flash, but now the systems are so large at scale, it becomes, I don't know what percentage it is of the blame pie, but it's double digits.

Sam Werner

>> Yeah. And we can help even more with our tape products. I always have to bring up tape because we designed a tape library for hyperscalers and some of the hyperscale-

Dave Vellante

>> I know, when I try to get my pictures off of Facebook, it takes a while.

>> We always say tape is dead, long live tape.

Sam Werner

>> Yeah, that's right.

>> So final question for you. We've seen this movie before in other ways. Now we're obviously in a new changeover. I mentioned that earlier. In every performance wave, we've seen the same thing happen around storage specifically. There's been an acceleration layer. There's been an offload. Every time you have a constraint, you optimize around it. You mentioned GPU cycles. That's a big one. How do you see the architecture evolving as we move forward? The techniques are all there. It's just re-architected in a new modern way. What would you say to someone who's evaluating their, "Oh, I got wasted CPU cycles. I've got to re-architect my clustered system." What's your advice as you look at this new architecture?

Sam Werner

>> Well, I mean, I think it is what I've already been talking about, really. I mean, if you look at what we're doing with content-aware storage, I think it's time. We've talked about this a lot of times over the history of storage. I've been around storage a while. Pushing some of the jobs down into the storage, I think we finally have the killer app for that. And it's this idea of putting the intelligence down in the storage to do the optimization of the data and provide just what you need. Because you talked about it. I mean, the huge amounts of data, especially when you get to the multimodal models, copying all this data I mean, physically is not possible, right? If you want to architect something the right way, you're going to want to push some of this down. Put some of the GPUs down in the storage, do some of the work down there with a lot less data traffic to support it.

Dave Vellante

>> So, you just didn't have the killer use case previously, is what you're saying.

Sam Werner

>> I don't think we ever had it until now.

>> Well, I would say also, first of all, I think that you're right on that too. That other data point I'd share is that from our research from theCUBE research team, is that the ISV developer market, they're going not up the stack in frameworks with the models, but the killer value extraction is closest to the hardware.

Sam Werner

>> That's right.

>> For classic development, not like machine code developers, the normal chip developers. The old school classic software developer, they go on low level. So as you push down, are you enabling that up? That's I guess where I see that going. What's your view on that paradigm of developers getting closer to the action to get great performance, squeeze every inch out of that intelligence?

Sam Werner

>> I mean, I think that's part of the idea of pushing the data pipeline work down into the actual array where it's really close to the hardware. And yeah, I mean, the application developers are going to be down at that level, right?

>> Well, Sam, great to have you on theCUBE.

Dave Vellante

>> Always.

>> Again, storage continues. Dave, we've been in theCUBE for 15 years. I think the first time we did a CUBE segment, we said, "Storage is sexy," and it's never not been sexy in terms of its relevance. And now more than ever. Sam, thanks for coming on theCUBE.

Sam Werner

>> Thank you so much.

Dave Vellante

>> Thanks, Sam. Appreciate it.

Sam Werner

>> It's great seeing you both.

>> Okay, you're watching theCUBE here, the leader in high-tech coverage. Check out theCUBE Pod every Friday with me and Dave. Thanks for watching.