theCUBE + NYSE Wired: AI & Retail Trailblazers | Nikhil Simha, Zipline AI

Clips
More from theCUBE + NYSE Wired: AI & Retail Trailblazers

Nikhil Simha

CTO

Zipline AI

play_circle_outline Chronon open source project for ML and AI data platform explained.

play_circle_outline Benefits of Chronon platform: velocity, data lineage management, automated monitoring.

play_circle_outline Use cases at Airbnb, Stripe, Uber, OpenAI for Chronon platform.

Info
Transcript

Nikhil Simha, Zipline AI

Nikhil Simha

CTO Zipline AI

Nikhil Simha, CTO of Zipline AI, discusses the company's data side of AI platform that accelerates data processing time, saving months to weeks or even days. He talks about the transition from supervised machine learning to unsupervised with the use of embeddings. Zipline AI's open-source project, Chronon, automates data processing for ML and AI workloads, making it easier for developers to generate training data. The platform offers benefits such as increased velocity for developers, better data lineage management, and automated monitoring. The potential for... Read more

explore Keep Exploring

What does the product do in regards to data processing for machine learning? add

What are the three main benefits of using ML in data processing and model development? add

What are some examples of companies using Airbnb and Stripe in their machine learning models? add

bolt Powered by CUBE AI

Nikhil Simha, Zipline AI

search

>> Hello, welcome back to theCUBE. I'm John Furrier. We are here at the NYSC. A great lineup of folks here all week long as part of NRF Media Week. And we're going to end out the day with Nikhil Simha, CTO of Zipline AI. Welcome back to theCUBE. We were just chatting before we came on camera. Great to see you. Give a quick overview of what you guys do. Explain what Zipline AI does and what you guys have done for funding. I know you guys are right out of the gate here from AirBnB. Now you were at AirBnB and you're a co-founders at Stripe, or you-

Nikhil Simha

>> My co-founder is also at AirBnB.

>> So you guys were both at AirBnB.

Nikhil Simha

>> Yeah.

>> Okay. Give the quick overview of what Zipline does.

Nikhil Simha

>> Yeah, absolutely. So we started out as a open source project coming from AirBnB and Stripe. So what we do is we make the data side of AI faster. So we take that from months' worth of time to week or a day.

>> And then you worked at Facebook.

Nikhil Simha

>> I worked at Facebook before AirBnB and Amazon before that, all the time doing ML infra or data infra related stuff.

>> So was that pre kind of the deep learning, at the beginning of the deep learning wave? So you were doing unsupervised machine learning, supervised machine learning. What kind of machine learning were you guys working on?

Nikhil Simha

>> Mostly supervised machine learning that is in the production path for ads models or for search models or for fraud detection, that kind of stuff, which is very supervised machine learning based. But later on it ended up becoming more unsupervised because of embeddings. So by the time I left Facebook, there's a lot of embedding models everywhere.

>> So a lot more action going on, not as hardcore, rigid, more expansive opportunity. And so you're at AirBnB and you guys open source. What did you guys open source? What product was it specifically?

Nikhil Simha

>> So it's called Chronon, an Apache-2.0 licensed data platform for ML and AI workloads essentially.

>> Yeah. What's the purpose? To me, I hear Chronon, I think Cron jobs, but what is the product?

Nikhil Simha

>> Yeah.

>> Take me through what it is.

Nikhil Simha

>> Yeah. To do data for ML, you need to stitch together batch systems, stream processing systems, indexes, and services, and all of this is manual work today. What we do is we allow people to write their SQL for batch training data generation, and we generate all of this under the hood so people don't have to learn Flink or all of these other systems. They just need to learn the SQL, and we get the rest of the systems in place.

>> And that obviously saves a lot of time, obviously, but what is the benefit to that speed of integration? What's the main impact?

Nikhil Simha

>> Yeah, so there is three main things. One is obviously the velocity for developers. ML is all about having as many iterations in a given amount of time. The second one is we can truly manage the lineage of data processing all the way into the model, and because of that we can govern effectively. So if you have things like EU, ALR, whatever, all of these things can be automated into the system. So whenever there is a violation before that goes into commit, we can tell like hey, there's a violation here. And the third thing is we can monitor automatically. So instead of you having to connect with other vendors or whatever, this is all integrated. So you just write your definitions, and we figure out what the monitoring of that should be and we generate that.

>> All right. So what was the rationale for leaving the job? You guys said hey, we've got some real good action here? I mean, first of all, AI-taught, obviously a great opportunity to start a company. I'm not down on it at all. In fact, I love it. So Airbnb is actually a great pedigree. I mean, they're one of the hall-of-famers. Facebook, Netflix, Airbnb, Twitter, Lyft. I mean the list goes on and on of some amazing open source... Uber. I mean that's in that class of Web 2.0/SaaS crossover greatness, handmade large scale systems. I mean, you know what I'm talking about. You were cutting your teeth into that. So you guys are sitting there saying, "Hey, we just solved a big problem that maps to a broader opportunity and let's go for it." And then what happened next?

Nikhil Simha

>> Yeah. So we have seen a few companies try to use it, and then we realized actually they would pay for helping them to do it and they would pay for a good managed solution. So that's what prompted us. We just followed their users-

>> Follow the money.

Nikhil Simha

>> Yeah.

>> Who were the first couple customers? Can you talk about it or no?

Nikhil Simha

>> I can talk about the open source users, not about the-

>> Talk about what they did, what some of the use cases and what was the benefit? What made you realize this is a huge opportunity, now if we can just get paid, but we could scale this, it's got real benefits.

Nikhil Simha

>> Yeah. I can start with Airbnb and what Stripe used it for. So Airbnb used it in almost every ML model inside the company. So we have search recommendations, fraud detection, payment fraud, shape stuff, growth, customer support for using LLMs. So all of that needs data going into the prompts, data going into the models. So Chronon is used to generate this data, like turn raw data into features and prompts. And at Stripe it's mostly used for transaction fraud, and there is new use cases coming up at Uber and OpenAI.

>> How are the marketing departments using it? When you say they're using machine learning to feed their proprietary data or their unique data, their domain data into a large language model or a frontier model, or is it they build in their own language models to feed it into? Or both?

Nikhil Simha

>> I have seen both actually. So I have seen complex workflows where it's like a node of decisions at every step you make decisions. And some of the decisions are made by frontier models, and some of the decisions are made by custom models that are supervised, and some of the decisions are made by unsupervised models that are also custom. So there's a whole range of applications.

>> What is the most popular... I mean RAG is a great example. I call last year the year of retrieval augmentation generation. Now, "Hey, look how great this is." I mean, that's great, vector embeds and all that, but the build value's going to come into pipeline these ML data sets into pre... Either small language models or ones that are interacting with others. What's the hottest area right now that you think that people are going to jump on?

Nikhil Simha

>> So I think in terms of new cycles, I would say the language models are at the top. But in terms of where our users are coming from, they still tend to come from traditional machine learning models that are serving such, that are serving customer support, that are serving fraud detection.

>> So business stuff that's already been kind of... Went through the resilience framework tests?

Nikhil Simha

>> Yeah.

>> So where it's kind of been verified-

Nikhil Simha

>> Yeah.

>> In the enterprise at least.

Nikhil Simha

>> Yeah. So that's where the strongest business impact is. And that's where a lot of sophisticated ML teams are like... They're trying to build like-

>> Hardcore ML is basically laying down the freeways so the picks and shovels vendors can go to the customers basically.

Nikhil Simha

>> Yep.

>> Because that's just real... It's a critical path. I mean fraud detection. You're at JP Morgan Chase. Are they a customer yet?

Nikhil Simha

>> Not yet.

>> You know what their budget is? $17 billion a year.

Nikhil Simha

>> Wow.

>> IT budget. They do $10 trillion in transactions a day. I interviewed the CIO there, Lori Beer. They are hardcore. You know what her answer was when I asked what her resilience strategy was for AI? You know what she said? "Oh, it's easy. We just apply our resilience framework to it." I go, "What? AI?" He goes, "Well, it's just another app." So the mindset of the customer is LLMs or these language models, foundation models, the multimodal model, this is another app to them.

Nikhil Simha

>> We treat it the same way actually. So regular ML AI, fine-tuned small LLMs, we don't really differentiate between those. We just try to-

>> It's got a purpose. It's an app. Basically it's an approach, a workflow. What's the big problem that you solve on the workflow that you'd say that this was a game-changer? What would get Peter Wagner, who's a very tough VC to get through the knothole on because he's super smart. He's got a great team at Wing. He's been on theCUBE before. I love that guy. He's one of the old-school venture capitalists. I shouldn't say he's my age actually, but he's good. What did he like about this? I mean, they got a high bar over there at Wing VC.

Nikhil Simha

>> They do. They do. So I think the biggest application of the data pipeline stuff that we do is embeddings. Embeddings are super hard. Not from a model point of view, but from the data plumbing side of things.

>> Like how? What way?

Nikhil Simha

>> Yeah. So embeddings are essentially models that are outputting features to the next model. So you're chaining two models, and the pipelines involved is complex than a regular machine learning model where there's only one stage. There is hundreds of features, but there is only one stage. Embeddings make it twice as more complex, and changing anything here will require you to automatically-

>> So you're chaining the embeds, which are basically prompts to each other or prompt received.

Nikhil Simha

>> So the prompt could be at the end of the embedding. So the embedding could go into a RAG system. The RAG system will pull out like a prompt, and the prompt will go into a frontier model or an LLM. So we manage this whole pipeline, but embeddings is where most people cannot do it manually.

>> So it's got to be automated?

Nikhil Simha

>> Yeah.

>> This will feed into scalable agent infrastructure. So he's probably looking at this as a key fabric of an operating agentic system?

Nikhil Simha

>> I think so, yeah.

>> Yeah. I mean, if you can make that embeddings work, because models talking to each other probably will be happening. I mean, why wouldn't I want to talk to a smarter model that is more peaked or specialized and vetted, especially at the supply chain and lineage knowledge around the data, why wouldn't I trust that? So if I delegate to you authority, I got to trust that that's going to happen, right?

Nikhil Simha

>> Absolutely. Yeah. I think-

>> If I'm an app developer, I don't want to deal with that. I mean it's like why DevOps existed.

Nikhil Simha

>> Absolutely. I think a lot of our value comes from automating this for developers, automating this pipelining.

>> So how do I get my hands on some of this stuff? So thecubeai.com is our embeddings, all the videos, all the stuff. So I want some of that. What do I do? I want to have contextual programmability in the agents to know that if/then/else... I'm oversimplifying, but I want get into if things are happening in the neural net, I want to stay on a path where I have more knowledge, I'm smarter as I go.

Nikhil Simha

>> Yeah. So I mean you could talk to us. We are happy to help. But there's an open source project out there so you could try to use it. And that also supports embeddings, but you'll need to plumb together multiple systems to make that system production ready. But-

>> And the name of the project again is Chronon?

Nikhil Simha

>> Chronon.

>> Chronon. C-R-O...

Nikhil Simha

>> C-H-R-O-N-O-N.

>> Yeah, Chronon. Okay. So it's Apache 2.0. It's not yet posted on Apache or Linux Foundation. It'll probably be the Apache Foundation if they're going to support it.

Nikhil Simha

>> I think-

>> But you just use their license. You're not yet a designated project yet.

Nikhil Simha

>> Yeah, we are not in the foundation.

>> How many people are working on the open source project now?

Nikhil Simha

>> I would say about 20 to 25.

>> So hardcore machine learning folks who are kind of in this one area, basically you're friends basically, not friends, but I mean... The people who are doing the hard... This is down mission critical AI infrastructure we're talking about.

Nikhil Simha

>> Absolutely.

>> It's not like side project in the garage. I mean fraud detection, every bank's going to need this. We've seen that grow in open source big time. How many customers do you have? Do you have any customers yet, or no?

Nikhil Simha

>> We have two. Essentially we have signed two customers. I'm not at liberty to talk about who they are. But we have multiple other leads, but we are trying to be focused on-

>> Design partners, get into some folks who are going to really help you guys out.

Nikhil Simha

>> Yeah.

>> So it's a pure C deal.

Nikhil Simha

>> Absolutely. Yeah.

>> So it's a good seed, but it's not a series A.

Nikhil Simha

>> It's not.

>> Yeah. So it's a super seed.

Nikhil Simha

>> Yeah.

>> Well congratulations. I love what you're doing. Again, we've been talking on theCUBE for years about that. In fact, I think we were the first media company and research firm to put out the power law of AI models. This was pre-Chat GPT. They were about to launch. We said boldly at that time, now it's obvious, that there's going to be a power law of models where the proprietary... We called them proprietary then, now they're called frontier models because no one wants to say they're proprietary. It's a bad word, but they're calling them frontier models were the big ones, you know, expensive. But then the power law shapes down, and you're going to have no neck, then a torso will develop and you're going to start to see specialty models come in. And then more evergreen distinct domain models where... I mean, if you're a JP Morgan chaser, you're a big bank, why would I ever make my models public? That's proprietary information. That's intellectual property.

Nikhil Simha

>> Yeah. I mean-

>> I don't want... My engineer's going to work on that, so they're going to need to have some plumbing.

Nikhil Simha

>> Yeah, absolutely. I think-

>> That's where you guys are going, right? That's where you see this going?

Nikhil Simha

>> Yeah. We are working with customers who are super critical about where their data lives. They don't want it to exit their account. So data is not going anywhere basically. So we are not working with smaller companies with that

>> All right, so last opportunity. I'll end the day out. You're closing us out here on day one of our coverage. Give a plug in to the folks watching what you guys are looking to do. Are you guys looking to hire some people? What kind of problems do your engineers solve? What's the culture right now? So you're a raw startup, great funding levels, seed's great. Good amount. Great people. Good investors. Okay, so what are you looking to hire? What's the makeup of the individuals? What are some of your goals? Go ahead. Give a plug.

Nikhil Simha

>> Yeah. So we are a team of engineers mostly doing distributed data infrastructure from backgrounds from Amazon, Facebook, Airbnb, Palantir, Stripe, essentially all world-class people. And we basically are building this platform where we automate pipelines that include batch process, stream process, indexes, and services, all behind one query. So imagine if you're a user, end user, writing a query and seeing all of this infrastructure spin up. That would save them months if not years of time. And we have seen this play out at Airbnb and Stripe really well, and we are trying to replicate this for the rest of the world.

>> As a managed service.

Nikhil Simha

>> As a managed service.

>> Welcome to theCUBE. It's great to have your kind of brain power here on theCUBE. Again, this is the young guns, next generation coming from amazing companies, first generation, large scale. Now supercomputing for the masses is here. Capabilities are hitting the table. More and more machine learning. More and more plumbing. AI infrastructure is the hottest area. It's going to have to feed in the agentic layer coming in very, very fast. But you can't do agentic without AI infrastructure. So again, you heard it on theCUBE. Thanks so much for your time. Congratulations again. All right, fresh funding, startups. Here on theCUBE everything from startups to big companies. I'm John Furrier, your host of theCUBE. Thanks for watching.