theCUBE + NYSE Wired: AI Factories - Data Centers of the Future | Vladimir Stojanovic, Ayar Labs

Clips
More from theCUBE + NYSE Wired: AI Factories - Data Centers of the Future

Vladimir Stojanovic

Co-Founder & CTO

Ayar Labs

play_circle_outline Introduction of theCUBE and AI factory series at NYSE studios.

play_circle_outline Enhancing AI Infrastructure: The Role of Ayar Labs in Optical I/O for Low Latency, High Bandwidth, and Energy Efficiency

play_circle_outline Unlocking AI Potential: The Role of Optical I/O in Scaling High-Density GPU Clusters for Enhanced Interconnectivity

play_circle_outline Ayar Labs’ product details: optical engines co-packaged with AI components for high performance.

play_circle_outline Partnerships with TSMC and Alchip to facilitate high-volume manufacturing of optical I/O.

Info
Transcript

Vladimir Stojanovic, Ayar Labs

Vladimir Stojanovic

Co-Founder & CTO Ayar Labs

In this conversation from theCUBE + NYSE Wired’s “AI Factories – Data Centers of the Future,” Ayar Labs co-founder and CTO Vladimir Stojanovic joins theCUBE’s John Furrier to unpack how optical I/O is redefining AI-scale infrastructure. Stojanovic explains how Ayar Labs’ co-packaged optical engines (chiplets) attach to AI accelerators, switches and extended memory to deliver ultra-low latency, extreme bandwidth and strong energy efficiency, stitching together thousands of GPUs to operate as one system. He details why the next decade of AI will be won by inter... Read more

explore Keep Exploring

What event or series is being launched that focuses on the impact of generative AI and AI-native applications on data center infrastructure? add

What is the focus and product of Ayar Labs in relation to AI infrastructure? add

What is the importance of scaling up and supporting scale out in computation? add

What factors are important for the product's form factor in relation to high volume manufacturing and performance? add

What developments are being made in AI accelerators and optical engines for AI infrastructure by TSMC and its partners? add

bolt Powered by CUBE AI

Vladimir Stojanovic, Ayar Labs

search

>> Hello, I'm John Furrier, your host of theCUBE here at our New York Stock Exchange CUBE studios. Of course, we've got the CUBE studios in Palo Alto connecting Silicon Valley and Wall Street. Of course, part of the NYSE Wired program and community, an open network of experts and leaders making things happen. This is kicking off our AI factory series, The Future of the Data Center. As generative AI and AI native applications hit the scene, the infrastructure that provides them is the large scale systems or AI factories that are produced in the outcome and the value. So as all those benefits will be realized and captured, the creation is really what's happening at the chip level and again, these large scale supercomputers. We are living in a supercomputing era and we got a great guest here, Vladimir Stojanovic, who's the CTO and co-founder of Ayar Labs, about to accelerate into mass production of their optical device switch. Is it a switch? It's not a switch, it's a connector. It's co-packaged optics.

Vladimir Stojanovic

>> The fabric

>> Fabric. It's a switch. Yeah, its software, hardware. Vladimir, thank you so much for coming on. So let's get into it. You guys have, I won't say been in stealth. You've been building intellectual property and product in really one of the hottest areas right now. Explain what you guys do. Just set the table. I know you're ramping up with TSMC and just getting ready to hit the market big. What's it all about?

Vladimir Stojanovic

>> Yeah. Ayar Labs is all about data movement and accelerating that data movement for AI infrastructure. We do so by building a product that is called optical I/O. This is a product that is co-packaged with AI accelerators, with switches, with extended memory modules to allow you to have super low latency, extra high bandwidth, and really good energy efficiency in these AI fabrics. It essentially allows us to stitch together a collection of thousands of GPUs working together as one and crunching through these big AI problems.

>> So you support the super computing era, right?

Vladimir Stojanovic

>> I would say it's a worldwide super computing because this is no longer some supercomputer in a government lab. It's really a world accessible super computing.

>> We really love this series because AI factor is a nice marketing term that Jensen Wong put out there a couple of years ago as a conference, as a GTC conference, but it really abstracts away all the work being done under the covers, under the hood. Because the outcomes are going to be realized, but there's so much going on, Vladimir, in these new systems. It's a systems architecture rethink. And if you look at the data center of the enterprise, they all went to the cloud, then they come on and rebuild their on-prem. But then you got the data centers that everyone's watching in the news, a hundred billion dollars, 50 billion in North Carolina. Another state gets something, billions of dollars. There's a massive build out of infrastructure for these systems. An old data center had servers on a rack. Now their servers, their one big computer, the data center is the computer. So this is the megatrend. Okay, you guys are at the center of it. And so I had asked, is it a switch? You said no, "A fabric." Explain the convergence that you guys are on because it is a convergence of scale out and scale up efficiency.

Vladimir Stojanovic

>> Absolutely. And you could view it in a way that the last 10 years of AI were about models and compute engines, and the next 10 years will be about how do you interconnect all these most efficiently to be able to follow the trend of increased expectations from AI, larger models, smarter, faster, but also on the other hand, more efficient in terms of performance per total cost of ownership. AI finally has to bring return on investment. So building systems that can be more higher performance but also bring you that performance per TCO up is really what it's all about and it's all about interconnecting that space.

>> Yeah I mean, because everyone sees the supercomputer chips, the Blackwells, whatever XP use. You had mentioned that you guys are categorically calling this optical I/O for scale up. Talk about what's going on there because scale up is a hot area because scale out, certainly we cover that too, but there's a little bit of scale out too because you want to have multiple nodes, but also you want the density and the energy piece. That's where the scale out has got the most demand. Talk about the dynamics between why scale up is important, but also supporting scale out as well.

Vladimir Stojanovic

>> Yeah, exactly. Scale up really allows... It's so much bandwidth that allows the GPUs to look at each other's memory and really work together as one on a big problem. And you've seen this trend already. We started with one GPU on a card, then eight GPUs on a shelf, then 32, then 64 GPUs working together on a problem. And anytime you made that upgrade, you actually were able to not only crunch a bigger problem but also crunch it with better return on investment. And this trend just continues 256, 5, 12,000, 10,000 GPUs. That's kind of the scope that we can cover with optical I/O. Fortunately, electrical I/O or fortunately for Ayar Labs, electrical I/O cannot follow that trend. It basically is limited to a meter of reach and you simply cannot connect that many GPUs within a meter without blowing up the power density and cooling limitations in a data center. And this allows us to create a cluster wide AI inference that boosts the key performance metrics, both smarter AI, interactivity but also performance per TCO, which is really the key finally to getting a return on investment in AI.

>> And you made a good point. You want the throughput which you can deliver, and if you don't have optical, you're going to have electrical. That's heat too. So in these dense scale-up systems, talk about that dynamic. I know it's a little bit off the topic. Well, it's not off, it's on topic, but it's not obvious to many. The heat that's generated and the density of the scale up is a huge design factor.

Vladimir Stojanovic

>> Absolutely. And look, just looking over the last few years, we've gone from 30 kilowatt racks to 80 kilowatt racks to 120 to now 180 kilowatt racks. Talk about 600 kilowatt racks in a couple of years. That trend is clearly unsustainable. It's essentially an exponential trend. And once you switch over to in-package optics allowing you to go anywhere with low latency and high bandwidth and not as much energy expenditure, you are able to flatten that. You can use a hundred kilowatt racks, but a hundred of them to give you that level of compute power that you need for the future, AI workload.

>> Talk about some of the efficiency. I mean, latency is one thing. Heat is another. Talk about network hops. That's another factor. Where does that come into play?

Vladimir Stojanovic

>> Absolutely, you're right on the spot there. With optical I/O, we allow both compute sockets and switches to have a much higher radix of connectivity and that allows you to connect to more peers as well as have a flatter switch network. So minimize this number of hops because really the key in these scale-up fabrics is the latency going through the switches and actually it's the queuing latency. With a lot of bandwidth and a lot of radix, you also allow path diversity through a bunch of parallel switches and that significantly cuts down this end-to-end latency. And that's one of the strong points of optical I/O as well.

>> Yeah. It's very nuanced point, but a lot of people don't understand that those hops can also affect GPU cycles as well.

Vladimir Stojanovic

>> Absolutely. GPU basically idles while that transaction is happening. And think of it as you have an extremely expensive compute resource, essentially idling waiting for the network to return the information. That's basically a very low return on investment on that expensive hardware.

>> So, Vladimir, I got to say I love your background on your zoom here on the remote. I can't stop staring at that product. Good branding by the way, IR lands. But if you look back at the product, it's very intriguing and I had mentioned switch out of the gate. It just came out. But you said no, it's a fabric. Talk about the product because I think this is super important as people start to look at these new architectures, the engineering involved in the fabric, it has a switch, so it's in there. Unpack the product real quick. I want to understand because there's a lot going on in that small package.

Vladimir Stojanovic

>> Yeah, definitely. Look, in terms of product, it's super important that the product is in the right form factor for two reasons, that the form factor is actually compatible with the high volume manufacturing capability of the right ecosystem. And then that form factor is also performant. So we build design and build chiplets. These are optical engines that are co-packaged with AI infrastructure. You can co-package them with the AI accelerator socket, you can co-package them with the switch. You can co-package these chiplets with the memory boards. And once you decorate these packages with these optical engines, you enable more than an order of magnitude higher bandwidth and radix out of these sockets. So that's the key. The way of building these chiplets needs to be essentially extremely compatible with the high volume manufacturing ecosystem. And so earlier we mentioned our partnerships with the TSMC ecosystem as well as most recently announced Alchip partnership. We're partnering with Alchip to actually bring these optical chiplets, optical I/O chiplets into the package that they're assembling for hyperscale data customers.

>> You beat me to the punch on the ecosystem. Talk about the Alchip and the TSMC relationship. What's going on there? What should people know about?

Vladimir Stojanovic

>> Look, this is the ecosystem in the world that cranks out the AI accelerators and optical engines for the AI infrastructure. The hoop process that we are using at TSMC really allows you to build a breakthrough performance optical engines. In terms of density, we're looking at essentially a decade of scaling ahead of us, both in energy and throughput efficiency of these optical engines, but also more importantly with partners like Alchip, we're getting these optical engines into very complex AI socket packages. You have a CoWoS interposer there with a lot of memory and compute. Next to that on a substrate, you have optical engines with detachable connectors. Someone has to put all of these things together. So we're working very closely with them to enable this type of ecosystem.

>> And that announcement, was what a reference design or reference implementation with Alchip? Because they're custom-based. I mean, TSMC gets that. They're the king of the castle right now. Everyone loves them. You got to stand in line. We know what they've done. Alchip is doing what? Are they putting a reference architecture together? What specifically do they have going on with you?

Vladimir Stojanovic

>> Think of it as someone needs to create an enablement, right? At the end of the day, Alchip does both the frontend and backend ASIC services. They work on one end with their hyperscale customers to build ASICs or accelerators, but they also do backend integration. How do you put these chips all together on an interposer? How do you get that interposer into a package? How do you get optical engine into the package? How do you test all that? So they enable a lot of that backend so that it's essentially a turnkey solution for a hyperscale customer.

>> And that's going to increase the volumes and quality. What's it take to do high-volume manufacturing for co-packaged optics?

Vladimir Stojanovic

>> Sorry. Can you repeat that?

>> What will it take to get the high volume up for manufacturing?

Vladimir Stojanovic

>> Look, we're going through that every day right now in Ayar Labs and a lot of it is about the backend manufacturing. This is what's so great about TSMC ecosystem that you have a lot of partners you can access to enable volume testers for TSMC cool process specifically some new kind of test capabilities are needed and where we're working very closely with top-tier tester manufacturers to enable these things. And the hardware we have right now is going through these pipelines, cleaning up those assembly steps. So you have that ecosystem you can rely on and put together to really get the high-volume product. But it's that backend that's very critical. We've solved the frontend. The design works, the platform works. What you really need is this high-volume manufacturing backend that a tier one customer can access and essentially use to scale to extremely high volumes that we see in these AI applications.

>> Okay. So you got the frontend, you got the backend. As co-founder, you got to be pretty pumped up right now, all that work. Take us through how long were you guys working on this? What was the origination date? What were some of the stories? Now, you're on the cusp of marketing it, getting volume going. Obviously, again, the market is perfect for you guys. Again, I think it's the convergence of scale up, scale out. It's just phenomenon. Scale up needs the most help right now, but clearly the data centers are getting bigger. They're wanting to be big. Big, medium and small AI factories that are coming to the scene. How do you feel? What's it like? Tell a story.

Vladimir Stojanovic

>> It's actually a great feeling being able to long ago, go read the tea leaves essentially and actually in few stages. It's easy to retrospect back how things worked out, but usually there's a series of recognitions that you need to make to arrive at a certain spot. Look, the first one we made starting back to our academic days at MIT and Berkeley, and CU Boulder partnership of the three schools to have a big DARPA project looking at essentially solving the interconnect problem for high-performance computing. Back in the day, let's say 10 or more years ago, AI was very nascent, but we did recognize that memory systems essentially, memory bandwidth problem for high-performance compute. We tailored the whole approach to essentially taking the photonics technology from the lab immediately into the fab. We had partnership with GlobalFoundries and actually IBM before that to actually demonstrate that but really focusing on this high-volume manufacturing aspect. And then the second realization was roughly around when transformers were born as a concept that this is going to grow immense in terms of... as any no workload previously that was ran in the high-performance computing. And the third realization roughly around, let's say, 2017, 2018, once we already started the company, was that the chiplet is actually the form factor that is going to be the best insertion for several generations into this type of computing because the Moore's law was dying out and you had to heterogeneously create an environment anyway inside a compute package. And that was the perfect entry for optics so that it's actually scalable and also high-volume manufacturable.

>> And the heat side of it, again the energy savings cannot be dismissed. It's so important. You guys got that down too. I love the optical marker right now. It's funny, Vladimir. Supercomputing is a conference that's so hot right now. It started in 1988 when I graduated college, and it was like in the HPC area, high-performance computing where you've been doing your research. It's like it moves along inch by inch and now it's actually a supercomputing show. We're talking about supercomputing, like real large-scale systems. It feels like we finally made it.

Vladimir Stojanovic

>> And a great driver set of applications that are really challenging both software and hardware. Look, the way we see it is that for AI applications, you have these two metrics that matter. It's how fast it responds to a single user and how smart it is. Or to a user could be another agent, AI agent or a human. So you have that interactivity axis, but you also have this performance per TCO axis. And interestingly, for any hardware generation, you have this Pareto curve that you can ride with software, but the only way to move upwards on both of these is to create a new hardware architecture that is optimized with that workload. And that's what we're doing here with photonic fab and our optical I/O powering that connectivity.

>> And it's so consistent with the market. That's the Pareto curve. Jensen's famous charts. He loves to talk about that all day long. This is why we love this market. Final question, what's the status? When are we going to see the product come off the line? When are you going to put it in the hands of builders? When's that all going to go down?

Vladimir Stojanovic

>> Look, it's a super exciting time for us at Ayar Labs. We are deep in the trenches really going through the execution phase, getting the hardware through the fab, setting up the testing backend infrastructure. The high volume aspects of it are critical. It's not like a lab infrastructure you're putting together. It really is an ecosystem set of partnerships that are driving these equipment enablements. And look, we are working as fast as we can. The demand is there. We're talking maybe 18 to 24 months is our window and we're trying really hard to make it.

>> I have one final question since it just popped in my head since I got you because you're an expert. I was talking with Sid, one of our analysts, former guard for 10 years, been in Bell Labs. We're talking about that same paradigm shift, that big shift and the vendors become the new vendors. And for the folks like the Dells and the server vendors, and you've got NVIDIA plowing the fields and innovating, you've got the optical engine fabric and device, these new systems are here and it's the new curve. It's the new way. It's clear to people. How do those existing server vendors and other suppliers, in your opinion, how should we be thinking about it? Because they're just more servers. They're connected together. So the AI factory to me is just a bunch of servers that no one sees or it's somewhere and connected. It's not like a server, load Linux on it and wrote an application. It's a whole other ball game. What's your view on that? What's your opinion on... I mean, the game is still the same, it's just the landscape changed.

Vladimir Stojanovic

>> Look, in one hand, you transfer one set of challenges to another set of, let's say, more manageable challenges. But you still need to solve them. It's an absolutely crucial question and something that we're engaged in very closely with our partners across the full stack. Think of it as you said, the whole rack or a cluster. We build an optical engine, laser source. How does that get integrated into the package, into the board, into the rack, their innovations at every level of the stack. And we're working closely with our partners to highlight where the opportunity for a new thing is and actually how to create these systems most efficiently, cost-effective.

>> And their systems. And it's a systems revolution. Again, it's back.

Vladimir Stojanovic

>> It's a system.

>> It never left.

Vladimir Stojanovic

>> Supercomputer.

>> And the young developers have that system's mindset too. They're starting to see the engineers start thinking about consequences. Second-order effects. Vladimir, thank you so much for being part of our community and part of our program. Congratulations to you and the team for what you do. We'll keep in touch. A lot to follow-up on for sure. Again, it's just beginning-

Vladimir Stojanovic

>> Absolutely. Thank you.

>> ... for you guys and the whole industry. Thanks for coming on our AI factory inaugural kickoff.

Vladimir Stojanovic

>> Absolutely. Thank you, John for having us.

>> I appreciate your time. I'm John for theCUBE. We are here at theCUBE NYSE Studio. Of course, we have our Palo Alto Studio connecting Silicon Valley and Wall Street. Again, the AI factories is a concept to simplify all the complex systems that are being engineered and connected to provide great data movement, high intelligence, and actually powering the AI native applications. Again, we will continue to cover this. We're doing our part to bring the data to you. Thanks for watching.