Data Protection & AI Summit | Mark Ward, Congruity360

Clips
News
More from Data Protection & AI Summit

Christophe Bertrand

Principal Analyst

SiliconANGLE & theCUBE

Mark Ward

COO

Congruity360

play_circle_outline Mark Ward of Congruity 360: Revolutionizing Unstructured Data Management and AI-Driven Cyber Resiliency at the Data Protection Summit

play_circle_outline Enhancing AI Performance: The Role of Smart Data in Eliminating Redundancies and Optimizing Infrastructure Costs

play_circle_outline AI can be both beneficial and harmful, depending on its application in data management.

play_circle_outline Organizations are moving irrelevant data to reduce cyber attack exposure and storage costs.

play_circle_outline Navigating the Future: Convergence of Data Management, Governance, and Cyber Resilience in AI and High-Performance Computing

Info
Transcript

Mark Ward, Congruity360

Christophe Bertrand

Principal Analyst SiliconANGLE & theCUBE

HOST

Mark Ward

COO Congruity360

In this interview from the Data Protection and AI Summit, Mark Ward, chief executive officer at Congruity360, joins theCUBE Research’s Christophe Bertrand to explore how smart data management is reshaping cyber resilience in the AI era. Ward outlines why eliminating redundant, obsolete and trivial data can cut storage footprints by up to seventy percent, slash AI infrastructure costs and feed models with cleaner inputs that drive better outcomes.

Ward explains how Congruity360’s content-level scanning and CDM Hub give customers granular control over ... Read more

explore Keep Exploring

What is Congruity 360 and what is its focus in relation to data protection and AI? add

What strategies are being utilized to improve the effectiveness of AI outcomes by managing data quality? add

What are some concerns and topics related to the use of AI and data that need to be addressed? add

What trends are being observed in the intersection of cyber resiliency and AI, particularly in relation to data management and backup costs? add

What are the emerging trends in data management, governance, and related fields, particularly in relation to high-performance computing and artificial intelligence? add

bolt Powered by CUBE AI

Mark Ward, Congruity360

search

Christophe Bertrand

>> Hello, everyone. And welcome back to the Data Protection and AI Summit. We have a great guest today, Mark Ward, who is the CEO of Congruity 360. Mark, welcome.

Mark Ward

>> Thanks, Christophe.

Christophe Bertrand

>> So Mark, tell us about... remind us about Congruity 360, what you do, and then we'll get into this great conversation around, well, what needs to happen to protect data from AI or with AI?

Mark Ward

>> Christophe, thank you. Our company, Congruity, is a leader in unstructured data management, with specific focus on cyber resiliency and use cases for AI. So I think we'll have a great conversation.

Christophe Bertrand

>> And thank you so much for joining us. I know you're traveling, you're actually in Ireland as we speak, so that's probably why you have the orange going as well. All right, good. So let's bring up some research. I have a great chart that's a research that actually Congruity sponsored a few weeks ago, it was in the context of cyber resiliency. But what's interesting here is that we wanted to understand what IT pros and cyber pros really worry about and what will essentially keep you awake at night in the next 12, 18 months in the context of security. And what we found, of course, is cloud is a big topic, but what was number two, almost number one in the list here of options was AI-driven cyber attacks. And I bring this up because when we talk about AI, AI can be your friend or it can be your foe. The other thing I want to point out is data is really front and center, data encryption, exfiltration, you name it. But the most important part of this chart, I think is also in the lineup of the top five, where you see number five, regulatory compliance due to those incidents. So this is sort of the negative aspects potentially of AI here, but that's how it really connects with what Congruity does. So Mark, let me ask you maybe a first question here, which is AI, obviously, it's a foe, it's an enemy, but can it be your friend, too? And what do you think organizations are seeing today based on your customer base?

Mark Ward

>> Yeah, I think it has to be a friend moving forward. And I think the early adopters are getting ahead of the issues that could potentially make it an enemy. And let me share a couple of examples on how. We, at Congruity talk about smart data drives smart outcomes, AI obviously being the intelligence in between those two comments. In order to get smart data, you have to basically limit the amount of, I'll call it garbage that is potentially available for your AI outcomes. We do that by identifying through metadata what information is redundant. So copies upon copies or, as you and I both know in the storage world, snapshots across snapshots. We eliminate obsolete data. We eliminate data that is potentially the wrong data for the AI engine or the AI outcome being used. So we make the information actually quite a bit smaller. We then feed it to those AI workloads, and there's two values there. Immediately, obviously, you're giving the AI machine better information to learn and yield outcomes on, but the one that's often overlooked, and really where we're seeing the early adopters leverage our technology is in the reduction of infrastructure costs. By eliminating anywhere from 60 to 70% of the data, by eliminating rot, we're able to reduce the amount of AI compute and AI storage required on the backend. And as you know, with the cost being the cost, that's a big, big outcome. So reduce the amount of information that is garbage and reduce the amount of infrastructure costs and AI, it's a win-win.

Christophe Bertrand

>> Absolutely. And that's really how you get to be able to use data. So I want to double-click, because I think one of the biggest topics or biggest issues for our viewers is the fact that when we say AI and data, it can mean many things and it can also mean is this a good thing, is this a bad thing? I'm worried about those AI-fueled cyber attacks? I'm worried about not being able to use the right data for my outcomes. And I'm also looking at the vendor community to leverage AI to make me more efficient. So these are the type of themes that I want to double-click on with you. So we covered here one aspect which is very interesting, which is efficiency. So the ability to use your solution to go get rid of the redundant and obsolete data, but more importantly, there is a need to build trust in data, right? That means that the data cannot be corrupted, hasn't been affected by a cyber attack, and it's compliant. So how do you play in that space? How do you help me as an end-user use data that's compliant that I can go reuse for intelligent processes?

Mark Ward

>> Yeah, it's a great question and really one that, I'll call it physics had been fighting, I'll call it, with the development of the technology for a long time. So really two things that work here. The first is the overall size and locations of enterprise data. So we all know traditional on-prem data center data sitting in the large storage environments that exist. We have the cloud that obviously has become a very important part. But we also have SaaS applications that are managing enterprise data in a very important fashion. And oftentimes, that SaaS data is a primary tool for AI learning. So you get those three different environments. So you need to have solutions that look at all of those different environments. You also need to have the ability to figure out how to eat this data elephant. And that means by actually reducing the size of the data environment. And we start again by eliminating duplicate data, eliminating obsolete data, eliminating trivial data. What that allows us to do, Christophe, is actually focus in on that content as you mentioned, that either under regulatory compliance governance or governance within the actual corporation, or focus from the AI scientist on only getting access to data that is appropriately available or made available to them from a security perspective and from a compliance perspective, we need to shrink that. And we do that through our content-level scanning that allows us to use AI to find PII data, PHI data, compliance data that should be eliminated, NYDFS, if you're a financial services, high trust, if you're in the healthcare industry. We use our AI learning machines to actually identify and then remediate that data. So again, what the outcome is we're providing to the AI team is what we call smart data. Smart data is less data, data that's only appropriate for the AI outcomes, and data that has been basically cleaned of the risks associated with governance and regulatory compliance.

Christophe Bertrand

>> Right. And this is actually fundamental because in order to get an outcome from AI, well, of course, you need data, but you need data that's compliant. So you're answering that the first point. What you just said is very interesting. You actually leverage AI in your own solution. So maybe we could double-click on that because I think it's one of the things that is becoming or should become a requirement. If I think in terms of a high-level RFP, as I'm trying to build my information architecture, my data architecture, what should I be looking for in a vendor like you in terms of using AI? And am I still in control as the user of Congruity? Is the AI agentic or is it more like ML? Am I still controlling how much it's sort of governing access? Those are important questions. Can you maybe walk us through that?

Mark Ward

>> Yeah, I'd love to. And it's a great question as folks look to make this decision. So history is always a great kind of storyboard for us to figure out how we get to where we are. So a lot of the previous compliance and regulatory work done for large enterprises was done through the big four, it was done through offshoring, it was done through professional services. And the primary, I'll call it objective that companies have in the compliance and regulatory, I'll call it issue resolving, is identifying those key terms and data profiles within their data that could flag a potential regulatory compliance risk. In the old days, that was done through a lot of people reviewing a lot of data. Took a lot of time. It was prone to, I'll call it inaccurate reporting. And it was expensive. What we've done, and we've been doing this now for over five years, is we've taken the concept of a data dictionary, so a group of data terms that have been identified by the company who we're working with to identify those particular datatypes that are creating either a compliance or a regulatory issue. Our system gets smarter. So what our system does is go out and not only look at those terms, but terms like that, and it provides a very quick identification. But more importantly, it also provides the customer, and this is something unique that we have, we have what's called the CDM Hub. It's basically an end-user review terminal or portal that allows the users to self-identify and self-create that data dictionary. Those set of terms that will help the business either obfuscate, remediate, delete, move that data which is going to create a risk to them. So a couple of different implementations, as I said. Taking that professional services heavy workload and turning it into what I would refer to in your language as a more ML-driven approach. And then taking our software technology from our CDM portal and allowing end-users to actually work with us to make our system smarter based upon constantly improving the terms and data that we use to identify the risk.

Christophe Bertrand

>> Right. And this is really interesting because my next question is going to be about what happens next in the evolution of AI itself with this notion of agents and agentic AI in general, frameworks with agents, essentially automated processes making potentially decisions for you. So now, and I want to sort of go back to the cycle here, you have a solution that allows an end-user to really figure out, optimize what data they can use, so get rid of all of the extraneous stuff they don't need, get rid of old data that they don't need to be able to use it or reuse it for AI. Two, you then need to make sure it's compliant so it can be actually used for AI. So you're adding a lot of intelligence. And we're talking about in your case, data that's unstructured, which is the majority of data out there. So there's a lot to go through. And you're still in control at that point, so I think so far so good. Where I'm getting a little bit worried is now I'm going to give up control to an agent that is going to manage this intelligent data for AI purposes. Isn't there or should there be guardrails around that? Because I could see that as becoming an issue in the future potentially. Or more importantly, from your standpoint, what are you looking to do to make sure that when you actually publish those agents and leverage agents, your end-users will be safe from any liability issue, that everybody will behave the way they should? Because agents, they're kind of like humans in a sense, in this context they're going to be accessing data. Do they have the right to do that? They're going to be making decisions, maybe talking to other agents. And are you in the end removing humans from the loop? So I know it's a lot, but you're really at the heart of this right now.

Mark Ward

>> So I'll refer back to that product I mentioned earlier called the CDM Hub. The CDM Hub was actually developed from one of our large European customers who is managing their GDPR exposure. And what that does is that not only gives the end-user owner of the data, but it's actually a hierarchical interface so that the management organization for that end-user for that department has ultimate say on what data is being used for the discovery and what data will actually be remediated based upon the rules that they put in place. So this hierarchical approach to making sure that human intervention at the appropriate levels is applied to what the machine learning engine produces, it seemed to be a great outcome for large corporations that we're working with. Large and small. So I would point to that. The other kind of comment that I would point to, and this has been very frequently asked of me and others in the industry, is what actually is the accuracy of your outcomes in terms of identifying at the discovery point or remediating? And that's a very important, again, I'll call out for your audience that is looking to move to an IT data classification framework like ours. Our accuracy, based upon the data dictionaries that we leverage, that we build ourselves from a compliance management perspective or we create with our customers, is yielding a very, very high accuracy level. In fact, we manage this monthly, and it's in the high 90s actually, over 97%. So that yields 3% of the data that is potentially incorrectly classified for human intervention to take a peek at. Hey, the less that the humans have to do, the more we have the appropriate, I'll call it governors or hierarchical management systems in place to allow the right thing to occur, the better. So accuracy, very, very important. And the secondary item is having a system that allows the end-user and the end-user's management team to make the right final decisions for remediation.

Christophe Bertrand

>> Right. So should in the future agents be making some of these decisions, they would be using the 97% of accurate data. So you essentially build a number of risk limitation checkpoints before the data can be leveraged for other purposes. So that's an important point, because the other aspect also that we wanted to cover in this summit is, well, look, you're also protecting the business. When you say data protection and AI, of course we talked about AI as a foe, as an enemy with cyber attacks. That's one aspect. But at the end of the day, it's about business risk. And being compliant, protecting your data from disappearing or being misused, those are part of the same conversation. So now I'd like to maybe turn to the future, Mark, and really ask you, looking at your crystal ball and based on what you've heard from the many customers you have, where do you think it's going and how quickly and what are you going to be working on, again, without divulging anything too confidential, but what do you see in the next, let's say two, three years? Because I think things are about to accelerate in the next two, three years.

Mark Ward

>> I agree. I agree. And again, I'll just take you quickly back to where this all started. A number of months ago, or I guess it was back in January with the Cyber Resiliency Summit that we had an opportunity to participate in. Cyber resiliency is changing every day as we know. And I would say the biggest thing that we've seen over the past six months has been how corporations are looking at their data and investigating it, categorizing and classifying it to determine what information is retained in a production environment, which means that it is being protected by data protection products like Commvault and Rubrik and Cohesity. And what data is being removed from that, I'll call it data protection process, and moved to a safe, secure, what we refer to as a net gap solution. So moving data that falls underneath the category of no longer relevant to the business, it could be age, it could be copies of copies, it could be different types of data, but moving it to a changed access control permission system where no longer does the user have access to that data, the user has to request access to that data. What it does and what our customers are doing are reducing their surface attack footprint by 50% or more by moving this rot data off to a new instantiation. Could be very low cost object storage, could be older network attached storage, power scale or net app or whatnot. But it's drastically, again, limiting the risk exposure while reducing infrastructure costs. So we're really seeing the cyber resiliency world looking towards AI and in particular data classification to reduce their problem and reduce their costs. So what we're seeing is we're seeing customers say, "Hey listen, why am I backing up 2 petabytes of data when the actual data that I'm backing up has 50% of rot in it?" Again, redundant copies, it's over 10 years old. It is JPEGs and MPEGs that shouldn't be being backed up. Move that in one archived approach to a safe and secure solution and reduce your backup costs by 50%. Now, at the same time, you're cleaning up the data for AI. So this is where really your point around bringing cyber resiliency and AI together makes so much sense from what we're seeing our enterprise is doing. Eliminate cost, improve their smart data outcomes, and figure out how to reduce their attack surface at the same time. They go together. And it is something that we're seeing again, many, many customers kind of solve at the same time.

Christophe Bertrand

>> Right. And I should add that it's actually very critical to be optimizing the size of your data environment, because the other thing that we haven't really talked about much, Mark, is the fact that AI itself will generate more data. And we've seen projections of zettabytes of data being generated between now and the next five, seven years around the world globally. So it's going to be a big issue to manage and store all of this data. And I think there's only one prediction we can make about the future, which is, well, there will be more data. What you're saying is, yeah, there'll be more, but I'll make it less if I can to help. So I think these are all very... It's a confirmation of the trends that I'm seeing from my standpoint, which is really this combination of data management, data classification meets governance, meets cyber, meets disaster recovery, data protection, backup, recovery. All of these big trends are converging with one common factor, data. And in the end, that's what it's all about.

Mark Ward

>> An interesting point, Christophe, just because for your audience, I think they'll find it somewhat interesting anyhow, I hope. And that is, as you know, when you have very expensive compute and very expensive storage, you're forced to be a better custodian. So I give everybody the example in the high-performance computing world and things like hierarchical storage management, technologies like object storage being created. When you have very expensive environments, you have to make the decisions to protect and place the information in its right environment at its right cost point. Data classification gives you the ability to do it. And I would say that what's happening or what happened in HPC some 10, 20 years ago is going to be what happens in AI going forward.

Christophe Bertrand

>> It is amazing. We did not talk about this, but we're going to be talking with some players in the tape space. And we'll be talking about tape, about this very topic because I do believe it's exactly what's going to happen. And having spent a little time in HPC myself, I can see this sort of history repeating itself as you mentioned earlier. Mark, thank you so much. It's been great having you on the summit here talking about these key issues. So thank you so much for joining us.

Mark Ward

>> Thank you very much. Have a great day.

Christophe Bertrand

>> And to our viewers, thank you so much for joining this segment. My name is Christophe Bertrand, principle analyst at theCUBE Research. And stay tuned for more.