theCUBE + NYSE Wired: Physical AI & Robotics Leaders | Leonard Tang, Haize Labs

Clips
More from theCUBE + NYSE Wired: Physical AI & Robotics Leaders

Speakers

Leonard Tang

Co-Founder & CEO

Haize Labs

play_circle_outline Navigating POC Purgatory: Haize Labs CEO on Fuzz Testing AI for Quality Assurance

play_circle_outline Focus on serving enterprise customers in New York

play_circle_outline Striking the Perfect Balance: New York's Tech Scene Focused on Customer-Centered Innovation

Info
Transcript

Leonard Tang, Haize Labs

Leonard Tang

Co-Founder & CEO Haize Labs

Join us in this insightful episode of theCUBE, hosted at the NYSE CUBE Studios in New York City, as we explore the complexities and innovations within artificial intelligence and robotics. This session features Leonard Tang, the Chief Executive Officer of Haize Labs, in conversation with theCUBE Research team and video hosts. Together, they explore the forefront of AI technology and its implications on modern business practices.

Leonard Tang of Haize Labs discusses the critical topic of identifying and addressing vulnerabilities in generative AI models... Read more

explore Keep Exploring

What is Haize Labs working on in the field of GenAI machine learning and fuzz testing AI systems? add

What is the speaker's perspective on the tech scene in New York and why they prefer to be there rather than in Silicon Valley? add

What is the balance that New York Tech companies are managing between being polished for their customers and being unique in their thinking? add

bolt Powered by CUBE AI

Leonard Tang, Haize Labs

search

>> Welcome back, everyone. It's theCUBE here live in New York City at the NYSE CUBE Studios, part of NYSE Wired, an open community of leaders, trust network emerging. Of course, we've got our Palo Alto studios at theCUBE bringing technology and Wall Street together with theCUBE and the NYSE Wired community. We're talking about all the top AI technologies from robotics, all the way down to the most cutting edge algorithms. Of course, generative AI and agents are hitting the scene. Starting to see massive shift in both the entrepreneurial opportunities as well as companies refactoring and resetting their business models and their platforms all happening in real time. Leonard Tang is the CEO of Haize Labs. Leonard, great to have you on theCUBE. Welcome to our show here on our podcast.

Leonard Tang

>> Awesome. Super, super excited to be here.

>> Yeah. I met you at one of the New York event meetups and presentations where it was kind of like my first couple months in New York and I'm like, "Man, the New York scene is hopping."

Leonard Tang

>> That's right.

>> Silicon Valley is great. Obviously been there for 25 years. But 25 years ago New York wasn't hopping as much as it is now.

Leonard Tang

>> It wasn't.

>> Now it's here. So there's so many customers here too.

Leonard Tang

>> 100%.

>> I mean, there's such an active ecosystem. So I'm psyched to have you on, because I want to get into the New York scene in a little bit because I think that's super important to tell that story. But first, talk about what you're working on because you're at the cutting edge of some of the GenAI machine learning. And some of this stuff has to get smarter algorithms are going to be taking over, agents are going to have agency and delegation to do tasks. So things are happening super fast. What are you working on?

Leonard Tang

>> Awesome. Well, first of all, super excited to be here. And we're very, very extant fans of the New York City tech scene. I'm Leonard Tsang, co-founder and CEO of Haize Labs, and we're in the business of hazing GenAI models, which is our term for fuzz testing AI systems, to proactively discover and define all the different vulnerabilities and bugs before they go out into production. So concretely, what this means is we basically simulate user interactions with your AI application, we analyze the responses and we figure out iteratively, how do we break down your AI application and find all the corner cases?

>> Yeah. And this comes up a lot when people talk about putting guardrails around the AI. Most people who aren't in the weeds see hallucinations, drift, context poisoning, but there's a bigger picture. What's the core problem that you guys see that's happening right now in the sense of it's evolving? So it is rapidly changing, but what is the core things that you guys work on? What's the core problem?

Leonard Tang

>> Yeah, for sure. So what motivated us to start the company, aside from just doing our research as we normally would have done, is we saw a huge push towards what I call demo-ready AI products around 2023, 2024, right? It was super easy to spin up something that looks really sexy, looked really great for Twitter, but actually just never worked in production, right? And there was a huge chasm between what was seemingly and ostensibly good versus what people could actually use in enterprise. And so the core problem that we are focusing on is how do you articulate and operationalize a measure of quality, quality of how good your AI application responses are, and then how you go and test with confidence and with broad assurance with respect to that measure of quality?

>> How hard is this? Because I mean it's complicated. I mean, I can only imagine, but it's probably even more complicated than I think, it's not trivial because generative is a generative thing.

Leonard Tang

>> That's right. Yeah.

>> Okay. So take us through the complexity, scope the magnitude of the complexity of how hard it is, and then what you guys see as how you're going to take this apart and make it work.

Leonard Tang

>> For sure. I think you bring up a critical point, which is, yeah, we're dealing with generative model which is giving you unstructured dumps of raw text. And you know, for the past few decades, people have been focused on measuring not deterministic, but relatively well scoped and narrow outputs from machine learning models. Now we're in the space where it's fully, fully, fully unstructured. And if you think about the right way to actually measure these systems, the gold standard would just be have a human oversee this AI application, right? It's almost like, you know, AI is this really brilliant but somewhat naive intern and it's going to be right sometimes. Sometimes it's going to be incredibly impressive in how it's right. But other times it's going to be so off the rail and so unintended in how it performs that you're just like, "What the heck is going on?" And so yeah, the gold standard would just be have a human oversee this AI intern, AI application, but the question is how do you operationalize this and scale this up in a meaningful way? And a lot of how we think about this is one, yeah, how do we distill in customers' subject matter expertise into what we call judge models that judges the quality of the AI application? And then, yeah, how do you test to respect to that judge? How do you figure out all the different corner cases that break down vis-a-vis that judge?

>> Yeah, I love the intern thing. I've heard people say it's book smart, you know, someone's book smart. But like you said, if you don't have all the context, like an intern might not have street sense around how corporate governance, says the wrong thing in a meeting.

Leonard Tang

>> That's right. Yeah.

>> Oops. Go back to training class. Take us through what you guys do because this judge piece of it, it's almost like its own algorithmic thing that you guys are executing on. How are you thinking about that problem? And what's the inputs to it? So I can imagine you have a wide observation space that you need, then you have to probably contextually narrow that down quickly and low latency?

Leonard Tang

>> Very much so. It comes down to where is the customer starting from? A lot of customers fall into one category, which is they have this like fuzzy articulation of the principles they want their AI system to abide by. We call this an AI code of conduct. And so it's the same idea as like a safety posture or a threat model. You know, your AI system shouldn't, for example, mention competitors, or you know that needs to comply with the Consumer Financial Protection Bureau or whatever have you. So there's some fuzzy notion of the rules that it should follow. Given this rule set, we can basically define judges that adhere to basically operationalize any of these rules. There's some amount of work that needs to be done to densify and really make that judge robust but that's a good starting point. Another set of customers fall into this category where they have no idea where to start. They have the sense, they have the taste to know whether or not a response is good or bad, but they don't know how to articulate a priori what the rules of that AI system should be. And so a lot of our job is basically how do we surface as few and as efficiently data points to give to a human, really quickly, integrate that feedback and then elicit and then infer what is the right rule set that we should derive from their preferences?

>> So you almost have a policy based rule set on the fuzzy side as you call it, and then iterate and narrow that, almost get it non-fuzzy?

Leonard Tang

>> Yes. Exactly.

>> You mentioned the concept earlier, which I was laughing inside because that's so true, a demo that's good enough on Twitter. We all know what that looks like. You can show the use case and it gets the wow factor. One of the things we're seeing and I want to get your reaction to this is, and certainly in New York, a lot of the large enterprises, whether they're banks or big companies, have a huge hurdle on many levels. One's resilience on security, the other one's also quality. If generative AI is coming in, they've been doing machine learning for a long time, so they know the narrow scope of, say, fraud detection or whatever, so they have a high bar. And so all the startups and companies trying to sell into the company-

Leonard Tang

>> 100%....

>> sign one-year contracts. All the VCs on Twitter, "Look. I got a $10 million ARR." But if you look at the contracts, they're not literally 10 million because it's going to may or may not renew.

Leonard Tang

>> That's right.

>> So there's a huge POC, proof of concept, purgatory.

Leonard Tang

>> Yeah, that's right.

>> And I think this is going to break through, this is why I want to connect the dots with what you're doing. Because it's not so much the startups' problem. Maybe they have to figure out the stacks to run on, whether it's a Dell system here or HPE. No one's going to throw away their Dell servers just because they bought them a few years ago and then new systems are coming in. So these startups are sitting there on the doorstep, and so they'll either die on the vine or they'll break through. So the question is: where's that pressure come from? Now, I think you would be a good fit on the Haize Labs side to say, "Hey, enterprises, project your syntax onto the startup." But then the startup's going to figure out how to make that work. Do they do it on an AI lab? Do they even have the gear? So you have this structural industry problem. What is your reaction to that? Do you agree with that? And then what, how do you see that unlocking or unbreaking through? Because it's a log jam right now.

Leonard Tang

>> Yeah, I think that's a 100% right. The term you mentioned, POC purgatory, that's 100% how we view the AI ecosystem right now. I will say that our technology is meant to serve both builders within enterprise and also the vendors that are trying to sell into enterprise. We do think ultimately there will be almost this SOC Two for AI in the coming years. I think there's a lot of noise right now around what that would look like. But I think somebody is going to be able to test with respect to whatever that compliance rule set is, or test with respect to that quality rule set. And I think that's going to be us.

>> I love how you brought up two security references, Leonard, because it's saying on theCUBE pod, and love to get your thoughts on this, if you look at all the departments involved in these big companies, which department has more stress? You really see some platform engineering out there. Okay, they're going through that. But every cyber department is living the AI nightmare tears ago. So they're like multi years into massive data tsunami with attacks, red team ,blue teams, you mentioned SOC. So they've had to deal with all this asymmetrical or changing conditional or contextual data with all these tools. So I think you're right on there. I think this idea that cyber security is a tell for where AI is going. What is your thoughts on that? Can you share your commentary or vision? Because I think you see the same thing.

Leonard Tang

>> For sure. For sure. Yeah. There's the notion of the modern CISO, they're very much thinking ahead of the curve. I will say when we deal with enterprises, we mostly get brought in through the security venues or GRC or trust and safety. But it's this cross-functional security alliance where there's some members from the product owners and the engineering teams. But a lot of it comes down to, yeah, the CISO is calling the shots on what is permissible or not permissible for how AI should behave. And there's plenty of good reason for this. In the last year alone, there were several high stakes, asymmetric downside outcomes of AI. Like Air Canada being one of the great ones that I always reference. There was real manifestation of legal damage to the company because the Air Canada chatbot offered a free discount to a traveler that was just looking to ask some other question. There's very much real, real problems, security-esque problems that are around in GenAI.

>> You mentioned some of the cross-functional things. One of my observations we've been seeing is that in every company different, you mentioned the CISO's in charge. I would agree. That's totally right on. It's cool. But a lot of companies, the data is handled by some of the ... The data is a key piece. So a lot of people, there's a lot people raising their hand, "I'll run AI." Everybody wants to run ai. It's like, why not be the cool kid on the block, right? I mean, who wouldn't want to run AI for the department or the company? So there's a lot of alpha competition, if you will, inside these companies. It's my words, not any other third-party data, just an observation. But it's a real strategic decision to decide who gets to make the calls. Because when you go steady state, if you can imagine five years out, maybe some steady state, it will be multi-stakeholder across the piece. Is there an area you see most? Can you put it into buckets? Just somewhat adopting AI, they got some experiments working, and then the full on leaning in full-throttle AI. In those scenarios, what's the persona leading the charge? Is it platform? Is it a database? Data services? In some companies, data was simply analytics which is just dashboarding. I mean, those people aren't, I don't want to be negative, but they won't be running AI. I mean, they're analysts. They're not engineering. So like what's your reaction? Small, medium, and large? Who's running what and what progression based on your data?

Leonard Tang

>> It is a great question. And I would say there's no consensus yet. It's sort of all over the place, depending on what size of the company you talk to, what industry, et cetera. I'll say one concrete data point we have is we've seen the emergence of this chief AI architect, or chief GenAI architect start to appear in the larger global 2,000s of the world, right? It is sort of a very business savvy, but still very on the ground and technical person, architect, who can communicate and cross the function between business and product owners versus the engineering teams. I think that's the key persona a lot of people are leaning into. And adjacent to this are the digital transformation teams, the innovation teams, and so on.

>> Has there been anything that you've seen on the system architecture of a business that's changed radically? I mean, obviously, I mean, you go to NVIDIA's conference, they're handing out the Kool-Aid big time. They're the AI infrastructure company, not the GPU company. But they bring up a good point that it does change the, with abstracting away complexity, with all the supercomputing capability and the horsepower, it gives software an advantage. So you're going to see abstractions, you can see all kinds of ways to make software work. Is there something that you see from an systems perspective? Like clusters, edge? Is there a distributed paradigm you see that's emerging in these large companies? Or is it simply more the same?

Leonard Tang

>> Yeah. I think it's early innings still. I mean, a lot of enterprises, of course, are still not even on cloud, so it is early innings for everything. But I will say I think enterprises have been surprisingly willing to engage with SaaS platforms, especially if they're GenAI SaaS platforms. I was under the assumption that we have to do self-hosted for all of our customers all the time. But I think depending on the use case, it is that they're actually pretty amenable to having multi-tenant SaaS options.

>> They will? Okay, cool. Well, give a quick description of what you guys are working on now. You've got the Judge Me algorithm. I'm calling it that. Was there a name to that product? Or is it-

Leonard Tang

>> Yeah, so it's literally called The Judge.

>> The Judge? Okay. So, okay, The Judge. So what are some of the cool tech things you guys are doing? Can you share the coolest thing you're working on right now?

Leonard Tang

>> Yeah, for sure. So I will say very concretely, we have four main features on our platform. We have The Judge, of course, which operationalizes the metric of quality. We have Haze, which is the simulated testing against that metric. We have Monitoring, so runtime monitoring, evaluation monitoring. And then we have what we call Robustify, which is a way to take in all that data and then take in all the scores attached to that data and figure out how to tighten and improve and robustify the underlying AI application. A couple of cool things, I mean, many cool things that work on the technology side, and this is what gets me so excited about being part of the company and running the company. So of course a lot of people know of our work as basically red teamers for the frontier labs, right? So we go out and red team OpenAI's models before release. And we go out and red team Anthropic's models before release. We go out and red team AI 201s models before release and a bunch of other frontier labs. And it's always really a joy to get to play with these systems early on and also get to be featured on people's system cards or their release papers and whatever have you. Something else I'm very excited about as it currently stands is what we call active alignment. And this is the idea of how do we align our judges as rapidly as possible to the human subject matter expert in enterprise? I think where we're headed right now in AI and the world more broadly is the era of the third-party labeler, the era of the third party tester and annotator is gone. There will be less of a need for the Scale AIs of the world as the models get more mature and more readily adaptable to downstream use cases. And that means the leverage and the power is going to come from the customer. The product owner, business owner, the person who has the most sense of what the product should be doing in the enterprise. And our job is sort of how do we use this active alignment workflow to bake their sensitivities into our judges. And this is something we've crushed, I'll say nobody else is working on this problem because it's such a low data regime problem, and also it's such a high subject matter expertise problem that the current way people are solving this is basically starting a company in each of the verticals that they want to go after. But we're going for a full platform approach where we scale up and lever up the downstream subject matter expert to create our judges.

>> Yeah, because you give leverage to them and they get leverage from you. Similar to the previous guests I had on from OpenTrons where the benefits flow back and forth. Brian and I are doing a series called MOE, Mixture of Experts.

Leonard Tang

>> I love it.

>> Which is a pun intended. You know, mixture of experts is a big part of some of the algorithmic software and understand you pointing out the expertise. But in the enterprise, the domain expert has been a word that's been around for generations. That's the people who know what the hell they're talking about. They know the source code, they know the buttons to push switches to turn. And again, that's the IP of the workflow. So running an app, an end-to-end workflow, the person who knows that workflow is the person closest to the action on all context. So I totally love that idea. Okay. So I'm loving that right away. But now how do I as the domain expert, I'm at a company, it's almost like you're speaking up for the small guy. Like, "Hey, don't forget me. I'm the one running things here." But from a data, they're kind of small as you mentioned. So how does companies plug that in? Because horizontal scale is there with the cloud, but vertical domain-specific data has always been dismissed as, "Oh, yeah, it's in a database." But now when cross-pollinated becomes massively valuable because it changes the color of things, so to speak. How do I connect that? How do you recommend companies do that? Because I don't think people see that as obvious as we're just talking about it because it's clearly the model. Why wouldn't you do that? So what do I do if I'm a large company? Do I set up a data plane? Do I just plug in? How do you see that playing out?

Leonard Tang

>> Yeah. So I'll say everything ultimately flows through our platform. And of course we're happy to have this be self-deployed by our customers. But essentially the data that we're most interested in is interactions with the AI application, and then also how the subject matter experts scores those interactions. These two things flow through our platform. We basically analyze the preferences and annotations on each of the data points, and from there we build out this ... we unfuzzify what it is they actually care about in quality. And we spit back ultimately a judge model, which is both a rubric for how the judge should be creating its decisions, and also some really efficient parameter updates on the underlying LLM.

>> So you've basically got a circular flywheel on reinforced learning with the expert?

Leonard Tang

>> Yeah. Potentially.

>> But also creating a digital twin of that expert that could be replicated into the system?

Leonard Tang

>> Yeah, that's right. For the technical folks in the audience, I like to call this really customized and aligned reward models. Hyper-specific reward models for their use case.

>> Yeah, yeah. Awesome. All right, so what's next for you guys? What's happening now? I'm sure the business is booming. How's the funding levels? What's the latest round at? How much have you raised? What's the plans? Give us an update on what you're working on, what you're optimizing for, what's the focus?

Leonard Tang

>> Yeah, for sure. So these days it's all about enterprise, enterprise, enterprise, which is why I love being in New York. And we're in New York for so many reasons. Chief amongst them is we're here to serve everybody else in the world that wants to use AI. There's certainly a lot of pull for us to go out into the Bay given we're ostensibly an AI infrastructure company, AI eval ops company. But if you think about who we would serve in the Bay versus who we serve here, in the Bay it's just other tech nerds like myself, which is great. But I also do want to ultimately serve, you know, the lawyers and the financiers and the musicians and media people and the sports teams and everybody else that's here in New York. I mean, this is the home of -

>> I mean, if you want to win the tech companies, you go to Silicon Valley. They're all there. But you're one subway stop away from 10 more customers. There's hundreds of customers here that can gather. I mean, the New York scene, again, growing up on the East Coast in New Jersey, 20 years ago, I'll tell you Leonard, it was anemic tech scene. Yeah. The internet happened here. There's media here, but that never really crossed over to pure deep tech. But if you look in the past 10 years, really I think maybe I'd say 20 years it started, I think it kicked in when Hadoop came along, the big data wave started happening, the FinTech started happening so we saw all that innovation. And since then it's just been such a great concentration of people, entrepreneurs, young and old. So there's room for everybody here because systems, a lot of people have distributed computing experience. Certainly a lot of the customers have built distributed systems, all run on data. What would you say to folks out there about the New York tech scene if you had to describe it?What's the vibe like? What's the hangouts look like? What are people doing for fun? Yeah. Put a plug in for the NYC tech scene.

Leonard Tang

>> Yeah, it's a good question. No, that's a great question.

>> There's always a hackathon.

Leonard Tang

>> There is always a hackathon.

>> There's always something happening here. It's Climate Week, one week. I love it here because theCUBE just eats up all the content.

Leonard Tang

>> Yeah, yeah, yeah. I'll say New York Tech companies to me are balancing this weird dichotomy where they're simultaneously extremely buttoned up and polished because they have to be for their customers, but they're also very unique and iconoclasts in the way they think about their companies. I think SF actually has a mode collapse on philosophy of the world and mode collapse of who they should serve and what sort of technology they should build and what the most important problems are. I think here in New York we're by construction much more isolated from more broadly, "the tech Twitter scene," or the tech world scene.

>> The bubble.

Leonard Tang

>> Yeah.

>> Or the crew talk.

Leonard Tang

>> That's right. And this helps us be a lot more grounded in actually serving customers and customer needs. In terms of the culture here, I mean, we've been pushing a lot on trying to have New York AI reading groups and New York AI demo groups as often as we can. I think there are builders here and there are researchers, and there are technology enthusiasts, but we've got to draw them out of the woodwork.

>> Yeah. And the AI is infusing everyone's lives, so certainly it is a melting pot of more customer use cases and also market opportunity. Leonard, thanks for coming on theCUBE. Really appreciate you coming on theCUBE.

Leonard Tang

>> Awesome. Thank you very much.

>> Again, this is all the action in New York. This is the East Coast CUBE here at the NYSE with the NYSE Wired. Again, only here is the Wall Street action. Of course, we've got the Palo Alto connection and the lab there, and the studio connecting tech and money and Wall Street of course. This is our access point. Again, open network, join us, hang out with us and check it out. I'm John Furrier, host of theCUBE. Thanks for watching.