Future of Data Platforms Summit | Sam Newnam, Hammerspace

Clips
More from Future of Data Platforms Summit

Sam Newnam

Hammerspace

Sr. Director, AI Solutions

play_circle_outline Focus on file granularity enhances data accessibility and usability for AI applications.

play_circle_outline Hammerspace emphasizes simplification, automation, and acceleration in their data platform approach.

play_circle_outline Governance is critical, ensuring secure access and auditability of data across platforms.

Info
Transcript

Sam Newnam, Hammerspace

Sam Newnam

Hammerspace Sr. Director, AI Solutions

In this Future of Data Platforms Summit interview, Sam Newnam, senior director of AI Solutions at Hammerspace, joins theCUBE’s Rob Strechay to unpack how enterprises can overcome the chaos of unstructured data in the AI era. Newnam explains how Hammerspace’s approach, built on global namespaces, automation and pipeline acceleration,helps organizations organize, move and activate data at file-level granularity across hybrid, multi-cloud environments.

The discussion explores the urgent challenges facing data and AI teams as they attempt to power agenti... Read more

explore Keep Exploring

What is the current landscape of data management and storage in relation to AI and file granularity? add

What are the core technologies that Hammerspace utilizes to address data management challenges? add

What challenges can arise when providing access to data for a data scientist working on an AI strategy, and what solutions might improve data governance and accessibility? add

bolt Powered by CUBE AI

Sam Newnam, Hammerspace

search

Rob Strechay

>> Hello and welcome back to the Future of Data Platform Summit. In this episode, I'm joined by Sam Newnam, who's the senior director of AI Solutions at Hammerspace. Welcome, Sam.

Sam Newnam

>> Good morning. Thanks for having me.

Rob Strechay

>> We look at data platforms in many different ways. And as we jump into this, I think it's really good to help the audience understand why Hammerspace is a really good fit for this Data Platform Summit. Can you help people understand what are the core AI data problems that Hammerspace looks to address today and is really solving and why are they so urgent?

Sam Newnam

>> Yeah, that's a great question. Hammerspace really attacks the data silo problem, right? For years we've had traditional NAS systems and unstructured data and we've stuffed it everywhere, on-prem, in the cloud, in some vendor, in some SaaS application. And so, cleaning up those strategies, everybody's looking for a way to use that data to find the value of that data. Hammerspace is uniquely fit to really step in and help add visibility to that, as well as manageability to that chaotic data state.

Rob Strechay

>> Yeah, it is definitely chaotic, to put it mildly. And I think one of the things is that we get a lot of data teams and data engineering teams and AI teams that will be watching this, and think can really benefit from the approach that Hammerspace is taking. What is that approach that Hammerspace is taking to a data platform?

Sam Newnam

>> Yeah. So, it starts with file granularity. Gone are the days of massive volumes and proprietary replication. We're looking at millions and billions of files being fed into AI every day. And so, for people to be able to get a grasp on that, to be able to move that particular dataset, to understand the tier, and most of this data has to transcend protocol, provider and geography at this point. And so, you really need a platform, not storage, not some silo, but something that understands both the file systems and the complexity of the usability of that data.

Rob Strechay

>> I think that is so... Again, the usability being a key challenge. And so, what are the challenges and what challenges do you see that the data and AI teams have with distributed data in particular? And why does it pose such a specific challenge to AI and agentic application development as we move forward?

Sam Newnam

>> That's a great question. These applications need data from everywhere. It's different when a human could look at a file directory and understand the context of that, but when you have two agents talking to each other, trying to negotiate a decision on real-time inventory purchases and that sort of stuff, data has to be organized. They don't have that same cognitive ability to make decisions on data. And so, now we have to have truly organized, accessible, and usable data tied to these particular agents and the applications that are using them. And we see a lot of these teams struggling because they simply can't organize the data at the right place at the right tier or at the right time.

Rob Strechay

>> Yeah, I think that gets to something that we talked about with Frederic Van Haren from HighFens, which is the usability of the data. What are the core technologies that Hammerspace is bringing to bear and bringing to the table to solve these problems?

Sam Newnam

>> Yeah, Frederic's actually a good friend. We've talked a lot. I think it really comes down to three main areas. First is simplification, right? You look at a global namespace and what Hammerspace brings to the table, that's visibility to your data. The second piece is automation. How do we move that data repeatably? It's easy to build application day one, but how do you make sure that the finance data, the marketing data, everything delivers to that rag pipeline on time every single evening. Third is acceleration. It's not just about the speed under the GPUs. Obviously we want to make the best use possible of those resources, but we look at it in pipeline acceleration. How do I get all of that data lined up where it needs to be at the right time and we think we can bring customers 50% faster to understanding their data and using those data in the applications, not just the latency of that data underneath the GPU.

Rob Strechay

>> Yeah, and let's dig a little bit deeper into that because when I was talking with Frederic, that was the episode before this, we talked about the fact that you're diving in and especially getting usability of things like the NVMe in the servers and things like that, you guys do some really interesting stuff there.

Sam Newnam

>> So, part of it, when I talked about the right tier in the right time. So, cost is running all over the place with these agents. When you look at token production and that sort of stuff, there's this R&D budget, and then when does it start making a customer money? And so, we were able to take the NVMe in the front of the GPU servers that's often used to scratch disk in a messy model spray mentality, depending on whether they're doing training or inference, and make a unified storage volume out of that, adding that into our global namespace. And so, now we can help customers that have traditional NAS storage, where we can assimilate that data, take all the metadata from things already in their environment, promote that to tier zero, directly above the GPUs, have it be used, whether it's in a RAG pipeline, embedding training, fine-tuning, and then restore that data back to where it was. So, we're not asking customers to take a massive forklift migration onto yet another data silo to achieve these AI outcomes.

Rob Strechay

>> They're not having to duplicate the data all over the place and things like that, which absolutely... Because we hear it a lot is one of the things is cost and all of that, and we'll get to that in a minute. But I think one of the other things that we see with these organizations is also that they really want and place an importance on open standards, and they're really looking for interoperability where, to your point, they don't want a forklift upgrade. How does Hammerspace's approach really take this into account for those organizations?

Sam Newnam

>> So, you're hitting on a real uniqueness of our platform, and so we've got a very big open-source movement behind us, right? Our CTO is the kernel maintainer for NFS. When you look at the number of commits we've made... I think it's really easy for someone to go build an app on top of something. It's another effort to change something. And so, we looked at it and said, "Linux is the underlying operating system for everything that's happening in AI. Why would we not do all of our homework there?" So, when you get to the kernel modifications we've made and the ease of use, there are no custom clients. Operational ability, SOP, all this stuff, take any Linux client and it works with our system, upgrade that client. Modern kernel, you're still working. I remember years back in my storage career, I used to look at these massive interoperability matrix and had to lay out this entire upgrade plan from NICs to OS to server. Those days are done, right? We're moving so much faster and we don't think clients want to deal with that world anymore. They want Linux. They want it simple. They want the protocols and understandings that they've dealt with most of their career, and that's what we feel we're offering them.

Rob Strechay

>> Yeah, it goes back to that cloud-operating model, and I think again, it pays out what we were talking about briefly before, but really, the tangible benefits of this approach that Hammerspace is using really to underpin a data platform. And from a TCO perspective, what are organizations seeing and what are some of the biggest cost savings that they're seeing as well?

Sam Newnam

>> Yeah, I was going to say, I think everybody says, "Hey, I've got to do this. How much is it going to cost me and how much will I make off this particular project?" And so, setting aside what ROI calculations look like right now, I think every single customer is like, "How do I build an AI factory with what I have today?" And that's what we're offering enterprises, the ability to start their journey or what they have. And so, whether that's tier zero, where I'm utilizing existing power and space from the GPU servers. It cuts back on port density, on rack density, cooling. And you look at the shift in gravity, right? We've always talked about data gravity, but it's really GPU gravity and even power gravity to that point. And so, when you look at cost savings, obviously, it has to do with everything that's in the rack, but we also think it has a lot to do with data preparation, right? Migrations are expensive. To forklift data to that new platform comes at a cost almost exceeding sometimes the value of the hardware you're moving it to. So, if we allow customers to say, "Hey, I'm going to use what you have. We're going to start that journey there. Then, we'll scale you into your applications journey." We think it's a much better savings mantra than trying to say, "Hey, how do I lay out millions of dollars and hope that this project's going to work?"

Rob Strechay

>> Yeah. I was just thinking back to your whole discussion about all of the different things that went into running storage and DR for multiple companies in the past. I looked at it as that's why I love the cloud operating model so much as well, like God, if I had to look up an HPA code, rev level and stuff like that, but most organizations are looking for that easy button, that cloud operating model, whether it's on-premise or in hyperscale cloud. Really, you guys have a very unique way. Everything is hybrid these days and you guys fit that model. Explain that and how some of the clouds and organizations that you're working with are taking advantage of that.

Sam Newnam

>> No, you're right. I think 80% of the customers I talk to have a hybrid strategy. They're often experimenting in the cloud. Maybe they're not going to stay there long-term for security reasons or whatever, and they're planning on coming back on-prem, but it is the quick and easiest path for them to start. And often, a portion of their data is already in the cloud. And so, I used to laugh. I would ask during a large presentation, "How many of you have ever built a hybrid cloud?" And nobody would raise their hand because hard understanding the file systems on-prem, the file systems and cloud, dealing with egress fees. There's so much that goes into that. Our uniqueness in that thing is when you really want to stitch a global namespace together. A user should be able to sit on a console in the cloud or a console on-prem, do whack-whack same files and see their whole thing and not worry about which bucket it's in, the economics of that bucket. So, we usually take an easy approach to say, "Hey, let's simplify all this." I've talked about simplification, but I really feel like that's what every enterprise is asking for, not how do I deal with more buckets and more tools? But how do I go back to, "All right. I want to understand my file system simply. I want to be able to have access to that data," and it shouldn't matter whether that's on-prem or in the cloud. We'll handle that for you. If it's on-prem and it needs to be in the cloud, we can move that automatically. We'll clean that up with the policy after it's done being used. There's so much intelligence to our automation and orchestration that makes that entire hybrid cloud not just thinking about a VPN, but actual data movements disappear to the user.

Rob Strechay

>> Yeah, I mean I think that's a great thing because, again, actually stitching it together can be some of the hardest parts of having a hybrid cloud. And really, looking at it, like you said, and the namespace and worrying about that stuff and how you actually do the data movement, how you do the data protection and how all those data services, we look at that as we have the storage layer, the data services layer, and then you have, again, you get into the data management layer, as well as then the GRC layer. I think that's really one of the approaches, and I think you fit very well and I think that this is why I thought it was really interesting to bring you forward as one of the sponsors in this was the fact that, again, as you look at... And up the stack, it's a unique way that you're taking this approach. There's others out there though that are doing different things in this landscape. How do you see some of the gotchas that Hammerspace is uniquely positioned to avoid for organizations? Some of those gotchas or pitfalls that different approaches could hamper a data platform strategy?

Sam Newnam

>> I think enterprises are at a fork in the road. So, they're looking to upgrade infrastructure and they're trying to make a decision about what's going to get me AI ready, what's going to future-proof me for these particular things? Hammerspace's uniqueness is while we provide storage and all these things. We sit on top of so much stuff. Think about a clip-on-the-side mentality. I can take all your favorite NAS, bring that into this global platform. I can provide you extremely high performance parallel file from our system, and I can push that data to and from the cloud seamlessly. And I don't think there's a lot of other platforms out there that deliver that whole story. I think there's really good platforms that solve point problems, but if you look at future-proofing your organization and what... We talk about the zettabytes of information that's coming in '26, '27, '28. You talk about physical AI, where you're going to have 4K video streaming and sensor data all over the place. That's not a storage problem, that's a data problem. And how do you move those right datasets to the right place, to the right agent? The feedback loops, the learning that's going to happen through that particular thing. I think Hammerspace take it up a level. We really think about data virtualization, about solving that in the same way that we've got to extrapolate the metadata from the files. When you think about the ownership of where data lives, it's got to be file granularity, it's got to be application-specific, and I don't think traditional storage technologies were built for that type of workload,

Rob Strechay

>> Yeah, especially when you get into things like RAG and training and fine-tuning and some of those other things that are going on with AI. It really gets complicated, but let's bring it back to the customers. What are some of the outcomes that your customers are seeing that you can talk about?

Sam Newnam

>> Yeah. No, that's a great question. I think we really look at outcomes in a couple of manners. So, first is we think taking on the pipeline, not just the data, brings customers probably 50%, or better, acceleration to a project launching, to the MVP, to production to those users getting access to the data that that particular agentic or generative workflow needs. Two is that we really look at the TCO portion of it. Being able to utilize the existing hardware you have, whether that's the NVMe in your GPU servers or your favorite NAS platform that you've been badged in for the previous couple years. We think being able to let them start their journey without this massive outlay of cash for net-new technologies is a massive advantage and outcome for those organizations. And I would say third, it really comes to the amount of time saved. Every organization is looking for AI talent and not everyone can go forward that new agentic architect. And so, being able to use your staff with technology you have today and being able to very quickly AI enable your organization within a new data platform without changing everything underneath, it really means that those GPUs are used faster. You're getting outcomes quicker from the application standpoint and you're your organization, which is I think what's hard to put a number to.

Rob Strechay

>> Yeah, I think so. And I think like you were talking about, when we talk to organizations, a lot of times they're sitting there going, "We need to simplify this. We can't find the people. Our platform engineering team has become very thin and they wear multiple different hats. They're not specialists in any one part anymore, and they need to be able to be broader across all of the different technologies that we're using." And that brings them to, "Hey, I need to bring in vendors that are partners that are going to help me simplify this as we get forward." Because to the earlier discussion, a lot of the AI pipelines and being able to bring the data to the AI, in a lot of cases, because they don't want to copy hundreds of terabytes of data from project to project, and then, "Oh, now, I have to go and figure out where it all is," and all of these different things and the different metadata becomes really complicated. One of the other things though is as you start to look at the entire stack is governance. A big part of the data platform stack is that governance and making sure that the right people can see the right data at the right time. How does Hammerspace play into this for organizations?

Sam Newnam

>> I think it's a great question. I was talking with a customer the other day and he's like, "I have this 26-year-old data scientist that shows up and wants God-level access to everything to go work on this new AI strategy." And he said, "The trouble was," he's like, "I had to give him a list of 20 different logins." The tentacles into all these data areas were a problem. And so, governance becomes incredibly important. Auditability, being able to understand what data was used by whom, when. And we really think the first start of that is a single point of access. Every user deserves global collaboration, whether in Tokyo, San Francisco, you name it, to that same set of data. Every tool set that you connect because it's not just about us. You talked about the stack. We have lots of ecosystem partners. Part of our open strategy isn't just about Hammerspace. It's like, how do I connect into all the data labelers? What am I doing for integrations into schedulers? We want people to use data from the interfaces that they're commonly using it from, not having to always jump into our interface to do something, but it really comes down to thinking through, if I have a single point of access for every app, I have control. I'm not worried about observability across multiple platforms and all these different endpoints. Part of that simplification you talked about is that global namespace, is starting with a single point where I'm dealing with access controls and security.

Rob Strechay

>> Yeah, I couldn't agree more. I think that when you start to look at it, especially in the unstructured space, I think that in the structured space, they're trying to address this with things like Iceberg and OpenTable formats and Delta and other stuff. I think in the data layer space for an unstructured perspective, I think definitely file and auditability and things of that nature makes a lot of sense because the standards have been there and it's not new, and we know how to go and put the different governance layers on top of that to that exact point. Bringing it all around here, final thought. As you look into the future and into Hammerspace's future, where do you see the roadmap taking you in respect to enhancing customer's data platform strategies?

Sam Newnam

>> So, while it may sound a little conceited, we think people have to build on a platform. Anything you build well has to have a foundation, a base that's solid. And so, I think as much as anyone sits across this table and tells you what AI is going to look like in 12 months is making it up. We see things moving so fast, the amount of podcasts we watch every day, the readings we have to do to keep up with the net-new technology that's coming out. So, for me, it goes back to there's really three core tenets I think about as the future, right? Is AI have to build in a foundation that's scalable, flexible, and open. We've got to be able to adapt to what's coming. Two, as you think about what's coming next is some of the stuff will leapfrog. We see enterprises that are skipping generative and just chatbots and the textual output to the agentic side of, "Hey, I want things to start making decisions for me. I want to help augment humans' mistakes with better data and information." And I think physical is going to be here before we really think about it. We read China had its first humanoid soccer game and all sorts of stuff that's out there, and we laugh and think this is fun, but we really think about supply chain, manufacturing, these robots that are aligning a tire to a car, measuring the torque specifications of every single nut that's getting put on there. I think there's so much data that's going to be created that we're not prepared for. I think we all look at the statistics that come out about what data means, but we're using terms that I don't think you or I ever thought we would use in our career. When we talk about know wekabytes and zettabytes and all these different things of data, that is a massive problem that it's going to take a true platform to help manage.

Rob Strechay

>> I couldn't agree more. I love it and I'm looking forward to it and looking forward to more conversations around this. Thanks, Sam, for coming onboard.

Sam Newnam

>> Absolutely. I really appreciate you having me today.

Rob Strechay

>> And thank you. Stay tuned. We got a lot more coming from the Future of Data Platform Summit. Stay right there.