Cerebras SUPERNOVA 2025 | Russ d’Sa, LiveKit

Clips
More from Cerebras SUPERNOVA 2025

Russ d'Sa

CEO

LiveKit

play_circle_outline Enhancing AI Development Through Natural Human-Input Infrastructure

play_circle_outline Leveraging Cerebras' latency solutions for real-time voice and vision AI.

play_circle_outline Factors like tokens per second and time to first token crucial in AI performance.

Info
Transcript

Russ d’Sa, LiveKit

Russ d'Sa

CEO LiveKit

Russ d’Sa, founder and chief executive officer of LiveKit Inc., joins theCUBE’s John Furrier at Cerebras SUPERNOVA 2025 to explore how real-time AI is reshaping the future of communication. From the Fort Mason stage in San Francisco, d’Sa shares how LiveKit has evolved from enabling human connectivity during the pandemic to powering the multimodal backbone of ChatGPT through its work with OpenAI.

The conversation explores the growing demands of low-latency AI systems, particularly in voice and vision use cases. d’Sa details how Cerebras technology pl... Read more

explore Keep Exploring

What technology company started during the pandemic and is focused on connecting humans to machines using natural human inputs and outputs like voice and vision? add

What important challenge does Cerebras solve for voice AI and vision AI? add

What are the important factors in multimodal use cases, with a particular emphasis on voice applications? add

bolt Powered by CUBE AI

Russ d’Sa, LiveKit

search

>> Hello and welcome to theCUBE on the Ground. We are here at the pop-up CUBE. We're in San Francisco, Fort Mason for Cerebras' igniting the future of AI. It's Supernova Conference. It's kind of like a conference, but it's also some news. I'm John Furrier host at theCUBE. Russ d'Sa here, founder and CEO of LiveKit, innovative startup, who could use some more AI and more horsepower. Russ, thanks for coming on and dropping in as they set up the show here.

>> Hey John, thanks for having me.

>> You're one of the first partners of Cerebras.

>> That's right.

>> So you're like a legend, the OG in the Cerebras landscape. They've only been around for a while, but they're hitting the market. Oversold event here. I mean...

>> Yeah, it's amazing the turnout that's coming to this event. It's been a couple of years, but the progress that they've made is just incredible.

>> It's fun to see the tech scene in Silicon Valley kind of go global. Obviously we get the New York studio at the NYSE, President Trump and all the luminaries out in the Middle East. You're starting to see that whole AI global landscape changing all the geopolitical... So obviously AI is impacting the world. It's the tech that's emerging fast. You guys are in a very innovative area around where multimodal AI people mostly know is hot, LiveKit.

>> That's right.

>> Great story. You started it during the pandemic. What's up now? Give us an overview of what you guys are working on.

>> Yeah, so these days I kind of say we're the accidental AI company. We started during the pandemic to connect humans to other humans. That was primarily what our software did. And then along the way, OpenAI decided that they wanted to build a voice interface to ChatGPT and they found LiveKit and decided to use it to build all of their voice mode and advanced voice mode, and now we power all of their multimodal features in the ChatGPT app and now the infrastructure, we're really focused on not just connecting humans to other humans, but connecting humans to machines. What I like to say is if you are going to build an AI model that is as smart as a human being, the way you're going to interact with that human being is probably not going to be with a keyboard and a mouse. It's probably going to be with natural human inputs and outputs, which is voice and vision. And so we kind of build the infrastructure that allows you to capture video and voice and transport it to that AI model on the other end.

>> It's awesome that you said that, and while I was watching Sam Altman talk on a TikTok I saw this morning really talking about the difference between user experience and behavior of Gen Z and Millennials and Xers. Obviously Boomers, we'll probably just type commands and what is LiveKit to ChatGPT. Versus the younger generation, it's an assistant, it really is. Life decisions happening. So multimodal is the buzzword, LLM's, large language models, that's text, video, audio is native in the new AI kind of native entrepreneurial experience, so you're like an AI star. Now when I was younger during the web scene when I was an entrepreneur, it was called internet entrepreneurs and then web entrepreneurs, they were kind of classified. So there's a whole nother level of entrepreneurship going on, I call AI entrepreneurship or value entrepreneurship, there's a lot of value. But it's not as easy as just throwing an app on the cloud. There's a lot of engineering enterprise-like things going on. It's more complexity, but the upsides there, so I mean you've got the voice that's a native media model that's what people use and voice is the killer app.

>> Yeah. I think that people are really still exploring what are the kind of AI native applications. You have this magical LLM technology, what are the new types of experiences that it can enable? And I think voice and video are the native inputs in that world, and so we make it easy for you to get that input and the output from your AI model between the user and the experience that you're building.

>> We discovered this about 10 years ago on the CUBE, and we started having this, we call the video full problem. We're doing so much video on the CUBE, we're like, where do we store it? So we were just dumping it on S3. Nobody was putting video on S3. They had higher level services. They had some broadcast software. They bought some legacy packages. And so there really was no native unstructured data format. And then we realized, wow, we could do stuff with the video. Video is data. So voice and video, although great ways to consume on video, voice interaction as a prompt, everything will be voice. Hey ChatGPT, hey CUBE, find that video. I mean voice is the interface, but it's also data. So from an engineering standpoint, how do you look at that because you're building kind of this first generation I call native multimodal infrastructure. How do you think about that? What's your vision?

>> Yeah, I think that you were talking about video and how much data it is. It's the highest bandwidth data that there is. If you look at humans, somewhere around 50% of your neurons are dedicated to just visual processing. I think we're going to see the same thing with AI as well. Today, yes, voice is the primary modality for these types of applications, but we have only scratched the surface on integrating computer vision. Can the AI model see your screen? Can it sit as a copilot with you as you interact with software and help you as you interact with it? When we start to get to physical AI, embodied AI, where you're taking these models and putting them into humanoid robots, those robots are going to have eyes, they're going to have two cameras. And how do you actually allow these kind of AIs or physical AIs to see and hear and perceive the world the way a human does? And yeah, we're building the infrastructure that's going to power that future, hopefully.

>> The robotics is a great call out because I was talking to an entrepreneur and there's a lot of robotics action out there, and I would say kind of gen one robotics was get the robotics right, and now the gen two robotics is what's the software layer that's going to be injected into the hardware, which is basically open source based because the models are coming from open source, so you have this new thing. I was interviewing one of the founders of a company, they do drones. It was very counter-intuitive. He goes, no, no, we're making our drones smaller. What? That's kind of like... And slicker, they come from IDEO, so it's design oriented.

>> Yep.

>> But what his point was is that they want the form factor to have usability not be this big flying box or whatever. That poses a challenge for the developer because now it's software. This is your wheelhouse. You're in the middle of this intersection of smaller, faster, cheaper on the hardware side. Smarter, more intelligent on the software side.

>> Right. Yeah. We kind of think of ourselves as we're sitting right between that evolution of hardware and the evolution of applications that users are interacting with. There's a lot of difficult infrastructure challenges that you need to solve, especially when you are delivering these kinds of applications at scale that developers would have to otherwise build over and over and over for every one of these use cases. We're kind of solving those undifferentiated problems and tucking them into a platform that everyone can leverage.

>> All right. So talk about your business model. Are you guys targeting developers? What's the sweet spot? Obviously OpenAI is a massive win. That was a great module they launched. I can see a lot of headroom here. Are you guys just kind of rolling with the tide, so to speak, rolling down the river? Do you have targets? Developers obviously are hot right now. Is there a focus? What are you optimizing for?

>> Yeah. I mean, I think we're optimizing for growth right now. We want to become a platform that is pervasive and has high impact across many at-scale use cases. But on the business side too, and the revenue side, we monetize in a similar way to AWS. I sometimes joke internally that what we're building is AIWS kind of building this suite of services across storage, compute, and network that just make it super easy to build your application, deploy it, scale it, and then monitor it.

>> Is there a unification involved or you view it as more of simplify the core and let the developers plug into it? What's the ideal steady-state stack look like, if you could explain it?

>> Yeah. I think it's really having a core that maps closely to the software development lifecycle where so a developer comes and says, Hey, I want to build an application. I'm going to use LiveKit to build that entire application end-to-end. So I can build it. I can put it out there, start to scale it up as more and more users are attracted to use the application. And then I can also rapidly iterate on it. This is actually becoming very important for AI use cases in production that we're seeing is I make an update, how do I know that the AI is still behaving the exact same way? How do I evaluate it? How do I monitor it? How do I put in the right guardrails? And so we're building tooling to make that entire end-to-end development cycle just very easy.

>> And that's where I think the unification comes in. I was just talking some folks about cyber security and the problem with tokens is you don't know what's going to come out until you actually prompt it.

>> That's right.

>> So there's a lot potentially blind spots in there. How do you guys view that when you talk to customers? Because obviously data security is just embedded into the AI. How does that factor in?

>> I think the first step is really getting visibility. So we're tackling that first. You have an AI... And this is a harder problem to solve for voice and video than it is for text. Computers can understand text. It's just a string of characters. You can write assertions around text in deterministic code, but how do you do the same thing for an AI that is saying something, right? How do you know it's not saying the wrong thing? Or if it's generating video, how do you know what it's actually generating?

>> Or it's authentic.

>> Right. Right, right.

>> It's in my voice.

>> Exactly. Yeah. How do you know it's not deep faking and kind of violating kind of copyright and things like that? And so the first step though, to solving that problem is just getting visibility into the system. How do I actually hook into a live session? How do I see or get a record of everything the AI did? And so that's the part we're tackling first. And then after that, we will start to tackle the mitigation techniques.

>> Well, I'm super excited for you. I love your story. We do follow up for sure in Palo Alto.

>> Yeah, happy to.

>> We have a pool party on June 18th. Love to have you come. It's for AI leaders. It's at the Rosewood. It's not a pool party in that sense. It's at the pool. It's not like bringing your swimming shorts. Would be weird.

>> I was envisioning all kinds of stuff. I wasn't sure.

>> Would be weird. But we're going to do in-studio interviews. Love to have you on. Final question for you, just Cerebras has got the power, the fast chip, they call it the big chip, inference is hot. What do you guys gain out of the AI infrastructure advancements? Because it's like a relay race right now in the tech industry. It's like people are running really hard. Folks like you guys at LiveKit, you're building, you're moving fast. But also the AI infrastructures, they're getting their legs too, and they're just getting stronger every day it feels like another level of performance.

>> 100%.

>> How do you gain from that and talk about your relationship with Cerebras?

>> Yeah, I think that Cerebras solves probably one of the most important challenges with voice AI and vision AI, which is latency. If you're trying to build an experience that is convincingly human, I can respond to you in 200 milliseconds and most of the inference hardware out there can't kind of meet that performance envelope. Then when you start to think about reasoning models where you have to do multiple loops of thinking before you answer, you can actually do more thoughtful inference with something like Cerebras within the same performance envelope as another provider for the actual inference. And so I think it's a huge unlock for voice AI and vision AI, real time AI in general, and so we benefit really greatly from what they've put into the world.

>> Yeah, I love the vision so much in data coming in. I guess the final question I ask you, is a lot of people ask me, so I'll ask you because I don't really know the answer other than my observation of seeing people game the stats, but not all benchmarks are the same. I mean, I've seen stats, well, we're faster at inference and other people say they're faster, but I could juice the power and get the inference up. So there's techniques. So what do you look at when you say who's really fast? Is it the power envelope? Is it the tokens? What is, in your mind, the requirements for smelling out who's got the better product?

>> I think that the thing that matters for these multimodal use cases, it's two things. I think tokens per second definitely matters and Cerebras is amazing there. But the other thing that is important, especially for voice, that Cerebras also excels at, is time to First token, you want the data to come out as quickly as possible so that you can get a response to the user or start streaming a response to the user almost instantly. So time to first token and then that rate of tokens coming out, both of those things are paramount to voice AI application.

>> And the reasoning puts more pressure on it, obviously.

>> Of course. Yes, it does.

>> All right. Russ, thanks for coming on the CUBE, pop-up CUBE here at the Supernova event. Should be a great event.

>> I'm excited for it.

>> We're going to give a great keynote. We're going to have all the experts come through here, and of course, great party as well. Thanks for coming on. Appreciate it.

>> Thanks so much, John. Really appreciate it.

>> All right. And the CUBE here, talking to all the entrepreneurs and startups. Of course, speed matters and the chips matter and the who you choose. Cerebras got the fastest inference, we're here at their event, Supernova in San Francisco. I'm John Furrier. Thanks for watching.