theCUBE + NYSE Wired: AI Factories - Data Centers of the Future | Chris Stephens, Groq

Clips
More from theCUBE + NYSE Wired: AI Factories - Data Centers of the Future

Chris Stephens

Field CTO

Groq

AI agent expectations meet reality as enterprises seek secure, production-ready systems

While headlines about artificial intelligence project failures dominate the news, enterprises are seeing tangible success by honing AI in specific areas such as knowledge access. In that domain, search solutions such as Glean Technologies Inc. have become essential for democratizing enterprise AI across the workforce, embedding intelligent tools into daily workflows.Glean’s Arvind Jain discusses enterprise AI agent building with theCUBE

Glean introduces Enterprise Graph, new personalization features for AI assistant

Glean Technologies Inc. today introduced a new version of Glean Assistant that is better at automating multistep tasks and offers more customization options.Palo Alto, California-based Glean received a $7.2 billion valuation in June. It offers three artificial intelligence services designed to make business users more productive. One of those services is Glean Assistant, which enables workers to search their companies’ internal systems and automate repetitive tasks.The new Glean Assistant release that debuted today introduces several usability improvements. According to Glean, tasks that previously required multi-paragraph prompts can now be accomplished more quickly

play_circle_outline Introduction by John Furrier at NYSE Studios discussing AI infrastructure and investment trends.

play_circle_outline Distinction between two market opportunities: generative AI developers and sovereign AI builders.

play_circle_outline Groq's speed and efficiency in deploying AI infrastructure in various regions globally.

play_circle_outline Groq's Pioneering Innovations in Hardware and Software for Next-Generation Inference Compute Solutions

play_circle_outline Importance of software systems in deploying AI infrastructure and enhancing performance.

play_circle_outline Role of energy efficiency and power consumption in AI systems and infrastructure viability.

Info
Transcript

Chris Stephens, Groq

Chris Stephens

Field CTO Groq

In this segment from theCUBE + NYSE Wired’s “AI Factories – Data Centers of the Future” series, Chris Stephens, field CTO at Groq, joins theCUBE’s John Furrier from the NYSE to unpack how AI factories are reshaping enterprise infrastructure and sovereign compute strategy. Furrier notes Groq’s momentum (~$750M raised, a valuation approaching ~$7B, and a new partnership with McLaren) as Stephens outlines why inference is the “killer app” and now a market of its own. He details rapid standups of sovereign AI inference “factories” (~51 days in the Middle East, ~4... Read more

explore Keep Exploring

What is the current focus of theCUBE's coverage and who is present during the discussion? add

What are the main market opportunities being discussed in relation to generative AI and their impact on applications and startups? add

What are the recent developments and achievements in building sovereign AI and compute AI factories at Groq? add

What are the latest updates regarding Groq's market strategy and innovations? add

What is the significance of the software in relation to the innovative chip technology and its impact on customer experience in AI applications? add

What considerations should organizations take into account regarding power consumption and energy efficiency when building systems? add

bolt Powered by CUBE AI

Chris Stephens, Groq

search

>> Hello, I'm John Furrier, host of theCUBE here at our East Coast CUBE Studios at the New York Stock Exchange. This is part of our East Coast and West Coast Connections. Of course, we've got our Palo Alto studio and we're covering all the events around the industry. AI infrastructure's the hottest thing right now, powering the investment side, build outside and ultimate deployment of AI-native applications at scale. This is part of the future of our data centers, our AI factories, AI large-scale infrastructure series, part of the NYSE Wired program and community that we have a partnership with. We've got Chris Stephens, field CTO of Groq, here with us. Groq we've covered many times. Jonathan and Raj, you've seen them on theCUBE. Chris, great to have you come in remote. Thanks for coming in.

Chris Stephens

>> Great to be here, John. Good to be back.

>> Obviously, as folks watch theCUBE know, we've been covering you guys been following the inference, I think, three years ago. Inference is the killer app. We both said that at supercomputing when Jonathan first came on. Certainly mainstream. You guys 750 million in raised, almost close to a $7 billion valuation. Great momentum validation. Then just last week, you guys announced a deal with McLaren to be a partner of their car. Again, there's the hat. Look at that. Love the swag. Looking forward to my hat. Love what they're doing. Again, part of the ongoing momentum, okay, speed. McLaren good fit there, but talk about the market right now because the top news mainstream, you can't change the channel, see a video, read a blog post around chip investment, the importance of chips, the data center CapEx, investments that are being built around the world. I mean, the numbers are beautiful and straight up and it's not stopping there, and then ultimately, the scale of the app's requirements. So the three main stories, you guys are the middle of it. Great momentum. Give us the current state of the union. You're a field CTO, you're talking to all the designers, the builders. What's the story?

Chris Stephens

>> Yeah. Yeah. I mean, in my role, I'm fortunate to spend most of my time, just about all of my time out talking to customers. Building on your point, the customers that I'm talking to sit really in two big buckets, and those are all of these apps, innovators, developers, startups that are trying to build and power their applications through generative AI. Then the other half of the world are like what we're doing in the Middle East and what we did recently with Bell Canada where the notion of building sovereign AI, compute AI factories, as you talked about in the intro, there's so much momentum behind really both of these things, and what I love about Groq is that we're sitting in between both of these market opportunities. We talk at Groq about build fast and I think people immediately associate that with, "Well, I got to go build applications fast, keep my startup powered by the power of Groq and provide a low latency fast experience for my customers." But what we're seeing in the other half of that world is the same thing. I mean, the work that we're doing in the Middle East, if we stood that thing up in about 51 days from handshake contract to serving tokens, the work in Canada with Bell was fewer than that, around 40-some days. You probably saw something else recently we announced in Helsinki around the same time you and Jonathan were together in Paris, and we got that thing off the ground in about 30 days. So to be able to bring the power of the high-performance inference to customers, but bring that to them quickly, right? How do we stand up sovereign AI in Europe? How do we stand up sovereign AI in the Middle East or other parts of the world? Being able to do that fast is a huge part of the momentum that we're having. Absolutely.

>> Speed is not just serving tokens and speed of inference, speed of deployment. You brought up some good points, and this is something that's not talked about enough, but it's super critical is the idea of sovereignty. You mentioned the Middle East. You mentioned Helsinki. You mentioned Canada. I mean, these obviously countries, but when you start thinking about geography, sovereignty becomes key because there's a network. There's countries. There's deployments. We are predicting at theCUBE and theCUBE Research at Mobile World Congress this year, in this coming next year early in the year will be heavy sovereign cloud, even though it's kind of a telecom show. You got to move packets around. But these centers, these large-scale compute centers, this is what these data centers are turning into, not just some data center real estate play. This is a design solution, and the fact that you guys are flying these massive clusters out to these countries and dropping them in, it's like dropshipping a data center. I mean, I'm not oversimplifying it there, but you're talking about 60 days, less than 60 days.

Chris Stephens

>> Yeah. You could go find some kind of hype videos of the work that we did launching this thing in the Middle East or in Europe, and you're right, it's almost like dropshipping an AI inference compute center. What's interesting about your point about Mobile World Congress and the telcos is we're seeing so much interest from fill in the blank country name national telco, and a lot of that's built off the work that we did with Bell in Canada. These telcos are not ironically like a tight network.

>> They get a lot of data. They got a lot of data too.

Chris Stephens

>> They got a lot of data and they have a lot of trust with their customers. You can continue to take Bell as a use case. The large government entities, large scale enterprises across Canada already trust Bell with their networking and a lot of the needs they have there. So it's a natural extension to add to that inference compute and to add to that Groq naturally as a part of that. We're seeing the same thing play out with other similar telcos in conversations we're having in other parts of the world.

>> I want to bring up an old-school concept, not to show my age, but I did riff on this with Jonathan once on theCube. I want to get your thoughts on it. If you look at AI and the inference side and the AI-native apps, like you said about the software stack certainly works being done there, but if you go back to the '90s, late '80s, early '90s, the OSI model, the open systems interconnect was a seven-layer open stack concept, but it really had to start at the physical level, the transmission, the wiring, ethernet versus token ring, then TCP/IP. That solidified that movement. What you're getting at here with how you guys are deploying is you got to deploy the infrastructure, the physical plants. Okay. First, that enables the next level. It's like a classic physical infrastructure stack approach, so you got to figure that out first. What's your perspective and reaction to that? Because inference is the killer app. All apps going to want to handle inference, but you can't just blink it and have it done. You got to lay it down first.

Chris Stephens

>> Yeah, I think that's right. I don't want to lose your TCP/IP thing, sorry, I got to come back to that, but the first thing that we're seeing is in the year and a half or so since we launched GroqCloud, we've been talking about inference is a solidification of inference as its own market, right? You made the point better than I could, which is most organizations, governments, large entities, startups, innovators, all in between are not likely to be doing training. They're going to be doing inference. Every single time a customer's interacting with your application powered by GenAI, that's inference. We all know that and we've seen a solidification of that as a market over the past year and a half or so. Absolutely, your first principle there is spot out. Then when you take it to that OSI example and the parallel that you're drawing to the '90s, also seeing the same thing play out 100%. I said I wanted to come back to TCP/IP thing. If you look at what people are starting to do with agents, you'll found out with protocols like MCP, you play the internet backward. Before we had TCP/IP, there was not a standard way to share packets around, right? So the internet wasn't what we know it to be today, and then you wind up piling on top of that HTTPS. So now I can do that securely and I can have payments happening. For example, financial transactions happening over the web, as we all know, or at least those of us that are about as old as you and I are, John, understand that there was a time before that.

>> It's true. Yeah.

Chris Stephens

>> Now, seeing a very similar pattern playing out on top of GenAI systems where, again, you take protocols that are becoming standards like MCP and then how do you build the HTTPS sort of analogy to that so that agents are not only communicating in a standard way, they're communicating in a secure way, you can do payments and financial transactions by way of agents, those things seem to be coming to pass, right? But we see a lot of organizations that are building and experimenting in that space. To your point, all of that then has to make its way back down the stack to be powered by the right set of infrastructure, which then goes back to your point about sovereign, right? If I am Germany and I'm imagining powering all this agentic AI, GenAI economy inside of Germany, then I want that to be done in a way that's secure all the way back to the sovereign compute itself, and that's where, like I said, Groq existing in two markets. That space of building the things like we're doing in the Middle East and with Canada and now in many conversations around the world is how do I as a sovereign nation in a lot of it's, again, partnership through their primary national telcos is how do I build that for the citizens of my country? We're seeing that conversation play out in basically every country in the world.

>> Yeah, I'm glad. I want to lay that out because I was talking about some folks here in Wall Street, being from Silicon Valley, they're like, "How big is this market?" To your point, there's two markets going on. The inference is the huge market. I totally see that 100%. Everyone sees it, but they're kind of squinting through some of the noise. The signal is that every app will be inferred from using inference, but the app on the data center build outside is the enabler and I think the crossover point, so I want to talk about how you guys are seeing that. Okay. Deploy, silicon to API, I get that. Great. Boom, big CapEx market, deploy. I got my sovereign cloud, sovereign infrastructure. It has data sovereignty in there as well. You got that in there. What's the next level? Where do you guys see the inference starting to take shape? You mentioned agents. What are some of the use cases? What's some of the roadmap your customers are looking at that they're enabling and taking down that low-hanging fruit? Could you share the progression?

Chris Stephens

>> Sure. I mean, there's a lot of the early stage use cases are in workflow automation, a lot of looking at customer facing agentic systems, AI-powered chat to help customers solving problems. There's a lot of that happening. You think in the call center space and in customer service, customer experience, there's certainly a lot of those happening. There's a lot of really interesting things happening. Also, again in this, if you riff off of the comment that that SaaS is dead that was made whatever a couple years ago, two years ago or so, but what that begins to mean is that the operation of all these workflows that make up large enterprises and how those become disrupted by self-powered agentic systems is also a pattern that now we're seeing take shape. Customers are beginning to build applications in that space, piloting meaningful applications. Then so that becomes a space where you've got maybe sovereign inference inside of one nation, but there's multiple sites. Now, it's like how do I connect these things together or how do I connect together across, say, one sovereign instance to another maybe in another country, and how do I stitch all of those pieces together is another place where I see things going. Then there's all these native services that customers are asking for that we're starting to put into Groq. We just launched native MCP service. We have a product on Compound that does research. So bringing more and more capabilities into the inference stack so that the thing that customers are getting out of the box is a layer that's solving a whole bunch of those infrastructure problems, again, MCP, Compound research, these things that come out of the box in your inference service, so then you on that CX layer, application layer and the services that you need are coming from a provider like Groq.

>> Yeah. What I love about that reference is that you hit a couple points there of inference being kind of a horizontal. You could have inference working to manage intelligence around what data leaves the sovereign area or not. You can have inference on the app performance, which subsystem to go to in the cluster. But I think your point about SaaS is interesting because I've been asking everyone I talked to as an expert, is SaaS dead or is it an evolution? But what's come out of it, Chris, and I want to get your reaction to this, that you have pre-existing software applications, could be a SaaS app or even something like SAP running out there and then you got that are going to be re-infused with AI, so it's almost a retrofit of SaaS, and then you've got AI-native applications that are built from the ground up with models in mind. So as the speed game flywheel kicks in, they're building stacks to handle AI native. What's your reaction to that, the I call it the retrofitted SaaS or app market and then the net-new AI native? Because inference will be at everyone, but it'll be different and what that means to the cluster of the large scale system? Because this isn't your grandfather's data center architecture. This isn't rack and stack. The system has to be designed to handle fast token serving, intelligent routing, memory, all these things that go into the chips in the new way.

Chris Stephens

>> I think my point of view on this has evolved a little bit. I don't know exactly where I placed my bet to be honest with you right now. What I mean by that is if you gave SAP as an example. The benefit that SAP has or Workday, ServiceNow, Salesforce, all these enterprise applications is they do own that workflow today, and so it's incumbent upon them, and they're doing this, to power those applications. You called it retrofit, but if either of them I would .

>> Yeah. They won't like the retrofit.

Chris Stephens

>> It's amazing.

>> The same. Yeah.

Chris Stephens

>> It's incumbent upon them to optimize the workflow they already own by way of the GenAI services the customers are asking for, right? How can I power this SAP process that's running my business? Probably have a whole bunch of autonomous agents. Because on SAP and I already know the workflow, that puts them in a good spot to have an advantage in terms of potentially continuing to win in that space. But then you look at your AI natives and you think of the number of companies out there that are 10, 20, maybe 50 employees doing 100 million, some of them approaching a billion in ARR and their purpose in life is the disruption of that exact same market in that exact same workflow. I think I probably would've bet on the disruptors a little more heavily, say, 12 months ago. I don't know that I'm seeing something change per se, but I definitely seeing innovation that's coming from these incumbent players. Whoever ends up winning is definitely going to turn into this... Instead of managing what used to be a paper process by way of some digitization, that agentic systems powering these workflows inside of organizations is absolutely a pattern that I see all over the place.

>> I kind of agree with you. My opinion's changed too. By the way, I use the word retrofit. I don't think those guys would like that word, but the evolution because they have data. You mentioned Salesforce, ServiceNow, they've got tons of data. I see if the startups and the upstarts doing AI native could get the data and traction, then it's a level playing field. So that's a question. That's a good point.

Chris Stephens

>> Yeah.

>> Yeah. Let's get into some of the things that you guys are doing. Just give a quick update on some of the latest, greatest on Groq's speeds, reliability, performance. We didn't really get into the product side. What are some of the stats? Can you share some updates on the latest and greatest in the momentum?

Chris Stephens

>> Yeah. I mean, you started off with the round that we just closed is a validation around not just the market, but Groq's strategy and roadmap to fit into that market. The number of trillions of dollars that the inference compute market might be over the next five or 10 years luckily for Groq, that number of trillions keeps growing, right? So, we continue to hope and imagine that that's true. In order to meet that though, our customers are asking us for innovation largely in two paths. What I touched on already a little bit is on the software side. Groq is known for our chip and people tend to associate, rightfully so, Groq equals this LPU that brings all this innovation in price performance in inference, but Groq is really a system and we believe the benefit of using Groq is taking advantage of that whole system. Yes, the chips are custom compiler that makes the management and operation of all these different model architectures a lot easier wrapping that up in GroqCloud, right? So there's continued innovation down that software stack. I mentioned a couple things with Compound doing deep research. I mentioned the native MCP that we're building. So look for continued innovation in that software stack. Think of your enterprise type services and solutions that you want baked into your inference stack. So that's one thing. Then naturally there's continued innovation that we're driving on the hardware side. Look for step changes in performance that come out based on innovations that we're capable to do on the hardwares.

>> I love that point about software stack on going being a differentiator. Just scope the order of magnitude of importance of system software as you guys look at I call it the hardcore software market going down to the physical layer and then the enablement of what's happening at the top of the stack with code assistants. You're seeing more and more capabilities, more aperture opening up for what used to be traditional computer science. The alpha coders are going deeper to the system side. Talk about the importance of that and how does that evolve because MCP is moving very fast. There's pros and cons. People debate that all the time, but it's early. But that's going to get better. That's going to enable interoperability of agents, delegation, all cool stuff. Software enables that. Scope the importance of that for us, the software piece.

Chris Stephens

>> When you asked me that first 20 seconds ago, and I was thinking about how would I size that, and I don't think you could underplay how important that piece is, which again is why I'm always trying to reinforce with customers Groq is a system for inference. Yes, we have an innovative chip design. We'll continue innovating in that dimension over time, but you're not buying chips from DRAC. Unless you're building sovereign AI like Bell Canada or others that we've talked about, you're probably not even buying a hardware from Groq. You can. It depends on your use case. But even there, what you're consuming is the system, and the system is the management of the chips, the optimization of all these different model architectures, be they transformers, be they diffusion, you name it, onto that system of chips through our compiler, and then accessing that in a way that your engineering teams are familiar with, whether that's directly through the API or at least managing and dealing with that through software. To me, I can't underscore enough to your question or when I talk to customers that the fact that this is a system of software, yes, based on an innovative core piece of IP in our chip, but it's the software of the software, the software that brings all that power to a customer. Imagine the alternative is I'm buying chips and I've got to build a compiler or manage custom kernels and all the things that teams hadn't been used to doing to try to power their AI applications. We're taking all that away and bringing that on ourselves so that you as a customer are focused on your customer experience, the way you're going to deliver value to your customer and all that operational stuff that goes into running a large scale inference cluster. That's what all this Groq software is doing, and that's why I can't even... It's almost like incidentally important.

>> Yeah, it's a prompt that has a long answer because I think it's important because this market, this generation is a systems mindset. Every wave has a mindset. Agile, lean in, startup, whatever you want to call it, this is a systems revolution. It's not just the data center that's exciting, it's the edge, it's the distributed computing paradigm. The system software and the resources, the power required large scale, compute, inference, reinforced learning, computer vision, all these things, this is new. I mean, it's a generational 20-year journey.

Chris Stephens

>> Oh, at least. My personal background is more coming out of data science than software, and it's fascinating to me the number of conversations that I've had using the phrase jewels per token. If you talk to Jonathan, you've probably heard-

>> Yeah, go ahead.

Chris Stephens

>> You probably heard this many, many times. So contemplating the power element if you build your stack, and on the bottom of it is the power, as a data scientist 15 or 20 years ago, I never thought about that, right? Now you've got to contemplate as an organization all the way down to what's the power consumption of the system that I'm building, whether you're Bell Canada, and that obviously is fundamentally important to what you're doing or you're a consumer of those things. We see, for example, in other parts of the world a significant concern around how are these things powered? Where does the power come from? How much energy does this system consume? Going back to the Groq hardware layer, that becomes a significant advantage for us. The number tools that it takes, the amount of energy to produce tokens is a significant advantage that we have. As our availability to run these data centers and run these compute and inference becomes more and more in demand, the less energy it takes to power those, and advantage that Groq has, I think, is a thing that we're going to continue to lean into.

>> Well, Chris, everyone knows that getting more horsepower is a competitive advantage. It's bounded by power. So you can have the biggest, baddest machine system on the planet, it doesn't have the power, it ain't going to work well. So, congratulations. Of course, reliability, you guys have that. Congratulations on the momentum. Looking forward to following up with you and thanks for being part of our AI factories, Future of the data center, large scale systems revolution that's happening. Again, it's fun to watch some momentum. The CapEx market's hot, but the inference is going to open up, be massive for you guys. Thanks so much.

Chris Stephens

>> Yeah, for sure. Thanks for having me. Again, look for us on the tail of the McLaren car, Singapore Grand Prix this weekend. You'll see the Groq logo on the tail of the car on the wheel.

>> Can't wait. Congratulations.

Chris Stephens

>> We will power them to victory.

>> Yeah, I can't wait for the swag. Look for the hat. Tell Jonathan I said hello and congratulations. Challenge coin to you on this one. Thanks.

Chris Stephens

>> Thanks, John.

>> Bye. I'm John Furrier here at theCUBE. We're at NYSE Studios for theCUBE. Of course, we've got our Palo Alto studio connecting Wall Street and Silicon Valley, tech and money coming together. Large scale build out is happening. This is going to bring in a new generation of applications, societal impact, and of course, better user experiences and moving the needle. Doing our part here at theCUBE to bring the data to you as fast as possible. Thanks for watching.