theCUBE + NYSE Wired: The AI Factory - Data Center of the Future | Vipul Prakash, Together AI

theCUBE.net

- Links
  
  theCUBE Network SiliconANGLE Media
info Help

Clips
More from theCUBE + NYSE Wired: The AI Factory - Data Center of the Future

Vipul Prakash

Co-Founder & CEO

Together

play_circle_outline Introduction of the AI Factory series focusing on distributed computing and AI applications.

play_circle_outline Concept of AI factories and the evolution of data center architecture to support AI.

play_circle_outline Importance of a strong AI infrastructure to support scalable AI-native applications.

play_circle_outline Necessity of efficient data movement and storage for AI computation and model training.

play_circle_outline Scaling Infrastructure for AI: Addressing Workload Challenges and Enhancing Training and Inference Efficiency in AI Factories

Info
Transcript

Vipul Prakash, Together AI

Vipul Prakash

Co-Founder & CEO Together

In this conversation at theCUBE + NYSE Wired: AI Factories – Data Centers of the Future, theCUBE’s John Furrier sits down with Vipul Prakash, co-founder and CEO of Together AI, to explore how AI factories are redefining enterprise infrastructure. Prakash details the breakneck growth of AI-native applications – where usage that once took SaaS nine months to double is now happening in nine days – and why this demand is forcing a rethink of compute, storage and networking. He explains how leading apps segment traffic across closed APIs and fine-tuned open-source... Read more

explore Keep Exploring

What is the focus of the AI Factory series mentioned in the text? add

What are some recent developments in the concept of AI factories and how are companies responding to it? add

What is the importance of token consumption in the AI-driven video generation applications used by customers like Hedra? add

What is the role of data movement in relation to requirements in the context of large storage systems and AI models? add

What advancements are being made in infrastructure to efficiently handle large scale AI workloads? add

bolt Powered by CUBE AI

Vipul Prakash, Together AI

search

>> Hello, I'm John Furrier with theCUBE. We are here at theCUBE Studios at the NYSE in New York. Of course, we have our Palo Alto Studio connecting Wall Street and Silicon Valley tech and money together as part of our NYSE Wired program and open community. This is our AI Factory series we're kicking off. It's an ongoing series about the future of the data center. It's really the future about distributed computing and how that's going to apply into powering this next generation of applications and scale that will bring AI as a great utility for society and, of course, business and people. So our next guest is Vipul Prakash, co-founder and CEO of Together AI, CUBE alumni. Recently, we chatted in Paris, France, for the RAISE Summit. That video's on YouTube if you're interested. And he's leading the effort. He's pioneering the market, a whole nother level of scale. Vipul, great to see you. Thanks for coming on.

Vipul Prakash

>> Great to see you again, John. Thanks for having me here.

>> We're in New York. You're in the Bay Area. There's a studio right around the corner from you there, but I really appreciate you coming on because the conversation that we had in Paris really was one of the motivations around this series. Of course, we've been following NVIDIA for a very long time. Of course, just a couple years ago, Jensen Huang coined the term AI factories. And since, Dell Technologies, many others are adopting this large-scale notion that the server is the data center. The computer is the data centers, and now data centers plural. And then now services are becoming out there in terms of neoclouds, AI Acceleration Cloud, which you started that company here, Together AI. So this is a complete validation of the architecture of what will power these next-generation apps. And also, you celebrated your three-year anniversary. Congratulations. I think-

Vipul Prakash

>> Thank you so much. Yeah....

>> you mentioned that to me in Paris.

Vipul Prakash

>> It's been an incredible 10 years.

>> Yeah. So congratulations. You guys are hitting escape velocity. So first I got to ask you, what is the current state of the company? We just talked in July, a lot's happened. What's new? I guess what feels like 10 years of AI time. What's new?

Vipul Prakash

>> Yeah. No, it's incredible. I think one of the things that we are seeing, and I think the industry is seeing, when you had SaaS applications, the ones that were growing really, really rapidly, and I was... My previous company was doing search and searching social media, so I had a sort of view into how fast that particular app category was growing. And SaaS apps maybe doubled in nine months, and that was considered to be very fast growth. We are seeing that happen to AI-native applications in nine days. We have customers who are scaling up so rapidly, their products are so rewarding and really kind of getting distributed internationally. And that is creating this... what you're saying, immense sort of need for AI computation and efficient AI computation. So we've just been seeing this acceleration, honestly. And I think the next three years are going to be even more incredible.

>> Vipul, I have to ask you about some of the things that we've been seeing as part of this series, and others, by the way. We had the Crypto Trailblazers, which is... You know, it's an infrastructure on finance with blockchain, similar trend, certainly on the AI cloud side, the neoclouds, and then on-prem on the enterprise AI, is that the startups that are coming out of the woodwork are building AI-native applications.

Vipul Prakash

>> Right.

>> And there's a lot of retrofitting going on, for sure. I mean, everyone, like SAP and others, they're certainly bringing AI to the app. That sits on top of the models. And so the AI-native apps, whether they're an existing app that has infused AI into it or native AI apps, have to have an infrastructure to support it. Now, you know we cover cloud like a blanket, so cloud-native was a big term. That was Kubernetes. It fueled DevSecOps. That was cloud. A new era is here. So I want you to define, in your mind, what is AI-native and application? What is AI-native infrastructure that's required to support that?

Vipul Prakash

>> Yeah, if we think of AI-native applications where the fundamental functionality of the application is driven by an AI model, these are applications we... seeing like ChatGPT, Cursor. We have another customer, Hedra that does video generation. Really, the core aspect of the application, the value that it provides is driven by AI. I think you're right. This is different from sort of introducing some AI features in traditional application because the AI is so central to these applications, their requirements for efficiency, for scale, for growth of the underlying AI infrastructure are extreme. And they're really sort of driving the need for building these AI factories rapidly, which consume tokens to learn from them and then produce them at high throughputs, low latencies, and are really on all the time.

>> So on the token piece, talk about that dynamic, because the models are changing very, very fast. I was talking to one customer, and I won't say the name, but they said... A customer. A customer of you guys and others. Not your customer, but end user.

Vipul Prakash

>> Yeah.

>> They said, "We got so focused on the models that we lost track of the real goal." So I want to ask you, how should companies think about managing the ratchet game of the models and also the specialty models, too? Some models are better at some things than others. You have long tail power law developing around models. And the revelation to me was, this is what I want to get your reaction to, is that we optimize for the AI native piece as well as the underlying... and let the models be part of it, just be a flywheel, and then let the software figure out the models. What's your reaction to that?

Vipul Prakash

>> We see this as really the emerging trend right now is the applications that have really scaled up are using open-source models. These are off-the-shelf open-source models, which then they fine-tune and post-train on their own data. Because once you have millions of users, you are collecting a lot of data and success criteria for the results that you're producing. And that becomes a really great set of data to fine-tune an open-source model on. So this is what we are seeing, is that applications are using a closed source APIs, but then they are segmenting their traffic into self-built versions or adapted versions of open-source models which they're deploying with Together at scale. And you're absolutely right. I think in some ways the model... which model it is, what the architecture is, it almost matters less. What matters is the benchmarks and the quality these models produce. And there's this also search for finding the right neural network architecture or the right sort of modification of the transformer that produces more efficiency at runtime. There has been this evolution of the transformer architecture over the last, especially over the last year, I would say, where underlying mechanisms and had attention and how it's sharded into experts. All of that has been changing. And a lot of that is to figure out how to produce going from a hundred billion tokens a day to a trillion to 10 trillion to a hundred billion tokens a day more efficiently.

>> Yeah. I love the fact that AI native... Thanks for the definition, but the token piece is key. You bring up the architecture piece. I want to get your thoughts on this. Because in the AI world is bounded by powers. That's a whole different discussion. We'll get that in a second. Data drives everything. So one of the things that's coming up is data movement. Okay. That's a concept just in data generally, especially ages we hear a lot about that, harmonization layers, semantic layers, whatever. So you got data movement. Data movement, for what you're talking about, is also very relevant. Because when you talk about where the memory sits and how the interconnects work... So large-scale systems, as you guys look at your systems, this is a fundamental infrastructure enablement opportunity. Can you share your thoughts? Because data movement has been around for a while, it's been a challenge, but it's more critical now and it's acute when you start thinking about latency. And you say runtime, which is generative, right? You're generating-

Vipul Prakash

>> Yeah....

>> something that's running at runtime. Talk about the role of the data movement and what that does for the requirements.

Vipul Prakash

>> And we are finding that the data is sitting fairly... It's fairly adjacent to the computation. So we provide to our customers large storage systems that are fabric-connected, fast parallel storage systems right next to the models. And they're using this for context. They're using this for fine-tuning. In fact, all the data that is being generated as part of the interaction with the product is coming and sitting next to the models more and more. So we are definitely seeing this. And for systems like... Now we have embodied systems that are models for robots that are being created, which have a fairly large data set both as a starting data set and the generative data set. So we'll see AI factories being fitted out with vast amounts of storage in the coming years.

>> Yeah. Yeah. And the memory stuff's key. I got to ask you, one thing that came up before you came on the remote was we were chatting about the needs for testing at scale. Could you share what you're learning? Because you mentioned some of those things around models. What are some of the changes on the testing side? Because testing is huge. Obviously, security, the bar for security is high in most applications and environments. Take us through what's the scale environment look like or need to be or what you're seeing change in testing requirements.

Vipul Prakash

>> Yes. Yeah. Testing is really complex for generative AI because you are generating novel tokens against sort of novel requests that are coming to the application. So there's a lot of work on testing. And I think this spans the stack from benchmarks. Like, companies will create their different kinds of benchmarks, and you're running them against every model update, every software update. But you also have... These are statistical systems, so they do have... They're sort of like what I described as Bitrock that happens in... A running system will start losing accuracy, so there has to be continuous testing. We do entropy testing of these models. We are looking at every piece of the infrastructure from networks to GPUs to ECC because errors introduced anywhere can really reduce the quality of the model. And it's a field that is being really... There's sort of a lot of novel work happening here as well as using things like A/B testing. Many of the apps that we support will introduce a new model and introduce it to a small percentage of their users and look at the results before they deploy it to the entire AI factory. And then that deployment has to be done really rapidly.

>> You guys have been very successful in three years in your product. I want to talk about customer use cases in a second, but talk about the product and the technology. The industry's seeing the financials financing of big data centers. I mean, it's like every day another a hundred billion dollars over there goes here. North Carolina, I got 20 billion going in there. It may not go all in one year, but there's always a capacity forecast. You got to buy equipment. You got to buy gear. Again, a data center is the computer, so it's not like buying a Dell server and throwing an operating system on it. It's Dell and NVIDIA. I mean it's complex. You've done that. Where are you on your AI-enabled, AI-native infrastructure service? Any changes? Obviously, NVIDIA is pushing more and more products. I mean, it shifts from TSMC, "Can I get in line?" Now it's, "Can I get more Blackwells? Can I get more NVIDIA?" Now it's, "Can I get more networking and storage?" So it seems to be evolving very quickly. Can you share as the CEO and co-founder, as you look at your system, how you're building it out, what are some of the things you're seeing, constraints, opportunities, challenges?

Vipul Prakash

>> Yeah. I think the biggest constraint today is power. And these data centers have to come up in three to six months. Traditionally, data centers are built in 18 to 36 months, so you are... I think the first thing that you're looking for is data centers that are already energized. Many of these data centers happen to exist with crypto mining companies who are creating this adjacency into AI infrastructure now. So we work with many of them to do capacity planning, figure out what data center we can transform into an AI data center. And we have been doing this. We have built two new AI factories, one in Maryland, one in Memphis. There are two more coming. And really, the biggest focus here is how quickly and efficiently we can bring these up. And yes-

>> Get the cash ....

Vipul Prakash

>> there is... These are massively expensive endeavors and have to be financed.

>> Talk about the AI factory. Because remember, back 16 years ago when theCUBE started, I think you were doing your Topsy venture at that time. Actually, earlier than that. But the big data movement happened around Hadoop. And the phrase was cliche now, but at the time it was, "Data's the new oil. We need refineries. Well, guess what? We have factories now." So you're building an AI factory. You have it. As enterprises come online, they're going to have AI factories. So what's your vision, Vipul? Because this is a growth opportunity because it's not an either-or. We're seeing big AI factories, smaller AI factories. They're all big, but massive factories.

Vipul Prakash

>> Yes,

>> You have a massive AI factory. But a mid-sized bank might say, "I need an AI factor. That's where my data is. That's where my crown jewels are. I want to have a AI factory on premises."

Vipul Prakash

>> Absolutely. And what we provide at Together is a really end-to-end solution from the data center, hardware, the layer of software, developer experience, security, and the sort of most efficient technology, software technology, to train models, to fine-tune them, to serve them at scale, all in a sort of easy-to-use cloud interface. And we think that this is going to become... As enterprises are applying their proprietary data... And they're not just applying this data, they are operationalizing this data in some ways with AI models. And then these models will become some of the most important and strategic assets for companies. So we think the requirements for sovereignty and security are going to be sort of stronger here and being able to do it efficiently as... You know, if we can produce two times more tokens, then an enterprise can do themselves from the same infrastructure that just has very, very direct economic impact on how effective their AI program is. And that is something we are... I consider Together to be the world's leading research lab in how to make these models more efficient.

>> Well, you guys certainly are doing a great job on the factory side. I have to ask you, as a leader, building out this next generation infrastructure, about the role of the chief AI officer and what customers are thinking. I recently spoke with John Roese, who's the CTO at Dell. He's also the chief AI officer, which he told me he wanted to be out of that job as fast as possible because he wants AI infused everywhere. He doesn't think it should be a central position. Dion Harris came on theCUBE yesterday from NVIDIA. He covers the old HPC and then AI infrastructure solutions. So you kind of had that old HPC high-performance computing world moving along inch by inch and then, boom, AI comes in. Wow. Large-scale supercomputing. Actually, there's a show called Supercomputing that started in 1988 when I graduated college. It's still around. That actually is the best show right now because it actually has supercomputing. Right?

Vipul Prakash

>> Right.

>> So okay, we finally got the product market fit for the conference. So Dion Harris was talking about things from liquid cooling. John Roese talking about the chief AI officer. You're building AI factories, enterprises are looking to go there. What is the role of an organization right now as you talk to customers? When you say, "Hey, this is the future," they all agree. There's no debate probably, but there's probably a discussion around execution.

Vipul Prakash

>> Yes.

>> Can you share your thoughts on this? Because you're on the front lines. You're doing it. So is Dell. So is NVIDIA. So are others. And startups are coming in. Whole nother motion to the enterprise, like a startup. I got to do a POC. But wait a minute, I got to get in the stack. I got to balance security. What are the conversations? Take me through. What's the discussion about operationalizing the AI factory?

Vipul Prakash

>> I would say it is... what we've seen is the companies that have chief AI officers are generally being more successful in AI adoption. I think there is organizational inertia around adopting new technology. There are all sorts of questions about policy and security and safety that have to be tackled. So I think it's not just a technological problem, it's an organizational. And there has to be someone who is really cutting through that and helping enterprises adopt AI. So that's one of the things we see. When we are looking at customers, if they're the chief AI officer, we just feel that this customer... They're actually going to... they exactly going to-

>> They have their act together. They have their act together. I can say it.

Vipul Prakash

>> I think that's important. And I hope that this will become a more common role across enterprises.

>> Yeah. Organization culture's different. I've seen it be a competency center. I've seen it be a recruiting center, a learning center. I've seen it been policy. It kind of each has their own cultural view of it. Some want to make it technical. It kind of depends. But I would agree. The people who have chief AI officers tend to have their act together or are moving aggressively and leaning into the operational playbook. Could you share some customer stories with us? Stories drive movements. This is one big one. What examples can you share that illustrate the AI factory momentum's real, legit, next level?

Vipul Prakash

>> Yeah. We work with several enterprises where this is true from companies like Zoom to... We've worked with a company called VFS that does visa processing and, in a very regulated environment, work with 156 governments around the world. And they appointed both a digital officer and an AI officer and have been able to take this very sort of manual, very exacting work that was being done through human processes and took an open-source model, fine-tuned it, built a lot of model risk management around it, and that has become a part of their product. It's really kind of incredible. And we're seeing this happen in companies where, one, they see this as a really either a very effective technology for efficiency inside the company and for digital-native enterprises where it's really a disruption. It could be a disruption to their core business, and they have to make it a priority and not let it push quarter over quarter.

>> I loved your line that the application and infrastructure value is about being powered and driven by AI. Totally relevant. I know you guys just celebrated your three-year anniversary. Congratulations. What's your focus now? Put a plug in for what you're working on. You guys are doing some pretty cutting-edge work building out the factories, but also on the business momentum side as well. You got great new AI-native app developers coming on board. What are some of the things you're working on? What's your focus on? What's your goals? Put a plug in. What are you looking to do?

Vipul Prakash

>> Yeah. We are really set up for large amounts of scale. That's what we've built over the last year so we can take on the largest AI workloads and with all the sort of builds that are coming up this year and next year, as well as our software stack. One of the things that we are working on now is we are seeing that as we are in large scale inference for many of these customers, there tends to be this sort of peak load and average load that it's high. During the days, they hit peak loads. And at nights and weekends, they are on the average load. And we are building systems that allow them to train and do inference on the same infrastructure efficiently. So if you have 10,000 or 50,000 chips in your AI factories, really combining the training and inference into a single operation I think is going to be really important because, otherwise, you are... Today they're different things. They're different teams. There's different infrastructure. But it really should become one. And that requires developing software, coordination systems, demand forecasting system. So we are building that stack. We are actually about to deploy it for our first customer, and we think this is going to become a trend for AI factories.

>> Love it. Train is where the value's established. Inference is where the money's made, as they say, because the application's going to be inferring and then reasoning more stuff around transformer coding. Awesome. Thank you so much for coming on. I really appreciate your time. And thanks for being part of our inaugural AI Factory week. And again, motivated our conversation in Paris. Thanks so much for coming in and being part of it.

Vipul Prakash

>> Thank you. Thank you so much for having me. All right.

>> I'm John Furrier with Dave Vellante. We're kicking off our AI Factory series, an ongoing series, talking to the leaders who are making it happen, building the next-generation infrastructure, enabling the AI-native applications. It's transforming the stack. It's changing business, and the value is being created and the AI driven by AI technology. We're doing our best part to bring you the data. Thanks for watching.