We just sent you a verification email. Please verify your account to gain access to
RAISE Summit 2025. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For RAISE Summit 2025
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for RAISE Summit 2025.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
RAISE Summit 2025. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to RAISE Summit 2025
Please sign in with LinkedIn to continue to RAISE Summit 2025. Signing in with LinkedIn ensures a professional environment.
In this interview from the RAISE Summit 2025, Matt Hicks, chief executive officer at Red Hat, joins theCUBE’s John Furrier to discuss how open source and enterprise AI are converging to reshape infrastructure. Speaking live from Paris, Hicks explains Red Hat’s mission to drive down token costs with smaller, more efficient models, making large-scale agentic workloads economically viable for organizations of every size.
The conversation dives into Red Hat’s latest releases: RHEL for business developers, the Red Hat Inference Server built on VLLM, RHEL AI...Read more
exploreKeep Exploring
What event is being covered in Paris and who is the host?add
What are the key achievements and contributions of Red Hat to its performance and collaboration with IBM?add
What are the challenges and solutions related to efficiently utilizing GPU resources for running open source AI models in enterprise settings?add
What are the parallels between traditional server technology and the emerging AI infrastructure?add
>> Welcome back everyone
to theCUBE's live coverage here in Paris, France. I'm John Furrier, your host of theCUBE. We are here for the RAIDs Conference 2025. This is where the global
community's meeting to really discuss the
future of AI infrastructure, AI software, AI development
for software, app dev, agent building. And of course, the
entrepreneurial scene is hot and the leaders are all here as they expand their
business opportunities in the enterprise and also out
in the consumer space, AI is front and center. Matt Hicks is here. He's the
CEO of Red Hat, CUBE alumni. Matt, great to see you in Paris. Normally it's at Red Hat
Summit or at your offices.
Matt Hicks
>> Yeah.
- Great to see you.
Matt Hicks
>> Good to see you halfway
around the world for us. >> Yeah. I just got to say,
first of all, I'm super excited >> and grateful for all the
work you guys do at Red Hat. The performance of the
company's been phenomenal. I've been watching the success
just in the past three years, just the needle moving across the board. I saw some of the Red Hat
Linux work you guys done with the mainframe at IBM. All the work that you're doing
with IBM just made IBM stock probably one of the
fastest performing stocks. Some will say Red Hat's a big part of it. It is a big part of it. Not the only thing, but
super significant part. You guys are continuing to innovate and we're here in Paris
where it's like a mainstream cloud meets on-prem, distributed
computing, sovereign AI, sovereign cloud, a lot of
mixing of these themes, but all point to one thing,
a shift is happening. What's your vision here in France? What are you speaking about? What are some of the
conversations you're having here?
Matt Hicks
>> I think a lot of it
is, it's AI focused of how do enterprises, individuals,
startups, entrepreneurs, how do they think about open source and AI, has been a major theme. Open source has those
advantages in sovereignty and other areas as an enabler to it. And then AI can amplify open source, but it also changes how code's developed. It introduces some other
challenges with it. And so, that has been the major
topic of how do we get here? How do we do this well whether
it's at an enterprise scale or a three-person startup. >> Yeah. And you guys are
continuing to release... I saw the news Enterprise
Linux for developers, RHEL for business developers launched
in line with RHEL's 10 GA, AI Inference Server, Red
Hat AI, expanded support, multiple models,
protocols, OpenShift, Lightspeed, GA, Advanced
Developer Suite, In-Vehicle OS, close to GA, partnerships
with AMD, NVIDIA, Meta, Google Cloud, Azure, Oracle, Red Hat, all strengthening your leadership. Got that out of the way. So that's all the announcements that you
guys been doing with-
Matt Hicks
>> Yeah. That was good. That was good. >> Was that good? Okay. Took
some notes. I was ready. >> Also, we've had the Red Hat
Summit, so that was good. Okay. A lot going on there, right? So I guess zooming out, the
big picture is you got a lot of entrepreneurial activity. The developer market is so hot. Open source has really opened up and democratized a lot
of the AI development. Still some experimentation, huge wave coming into the enterprise, we're seeing enterprises doing a lot, but it's not yet fully full throttle yet, and open source is going
to be the driver there. So as agents are coming,
what's your view on the models as models become programmable
and they interact? Which you've got MCP
server, you've got A to A. These are tech features going to bring people together,
agents are coming. How do you see that preparation needed for model integration into the software? How is AI helping? And from a Red Hat perspective, you're at the kernel. So if you think like a kernel
developer, you look up, you see all that innovation,
what's your view?
Matt Hicks
>> So for us, our role in
this is pretty simple. We are incredibly passionate
about making each one of those questions to an AI model or token counts as cheap as possible. That's why we focus on smaller models and open source models. The reason is when you
make one call, you know what you're dealing with, but
reasoning has been introduced, which does a lot of calls and a lot of tokens behind the scenes. And agentic work is going
to add another order of magnitude to that. If the unit price isn't small enough, we don't want enterprises
to hit a ceiling in what they can do because the
costs just balloon and explode. And so that is where we play
in the infrastructure layer. It's a really exciting time. >> And what are you doing
there? Because obviously, >> the reasoning, multi-step reasoning and then reinforced learning with human feedback all is going to... And then obviously
agents there, it's going to create a massive demand for tokens. So obviously, more tokens
as you pointed out, which means more infrastructure. I might have to over-provision
or buy a bunch of gear. You're saying get in front of that. What are you guys doing specifically because
that's a good mission.
Matt Hicks
>> It's a simple Red Hat inference server, which is built on VLLM, is
allow any open source model to run on any GPU provider you want and solve the enterprise
problem, which is, I bought some great AMD or NVIDIA cards, I'm running this model and I can only get to 20%
utilization and I don't know why. That's a kiss of death
for a nascent AI project. We want to get you to 90% or 100% so you can use what you bought. And then this scaling
is we want to be able to let you make models smaller
to make them cheaper just with math and drive more
efficiency techniques. Inference server is that
any card, any model. And then we announced the
LLMD work with a bunch of other partners in this to
make this work across a cluster as well, which you'll need
to ask a lot of questions and get a lot of answers
out of an enterprise. >> I wish I had an hour
with you because one >> of the things I've been
saying on theCUBE with Dave Vellante on theCUBE Pod is it was easy. Back in the old days, you get a server, you load Linux on it. All right? Okay. That's what you do. Now you have Linux everywhere. So I'm guessing where
you're going with that. So you're really thinking
about distributed Linux kernels around maybe we're oversimplifying it or maybe getting it wrong,
but it's not one thing. It's not like the old days
where I'm running a server, everything's got servers in them.
Matt Hicks
>> Yeah.
- So is that where you're going with this? >> You could think of
there are two .
Matt Hicks
>> There's the old days, which
was servers that we know and love, CPU servers, and
you would put Linux on them and you could run any app. And then a ton was built
in that middleware, Kubernetes, all of these things. And then servers, we added
things like Kubernetes that we can take a thousand servers and make them operate as a unit. We envisioned the same
thing for AI where instead of CPU servers, you'll have GPU servers and different architectures. Instead of RHEL like we know it, you'll have Red Hat Inference Server or RHEL AI that is
purpose-tuned for GPU models. Instead of middleware,
they're going to be LLMs that you're running
and you will still need to run them in a cluster. So there are a lot of parallels. The technology just happens to be different at almost every tier. But we like the de facto
standards are close enough that it's in our wheelhouse. We know this space, we've been
through it for 20-plus years and processes and CPUs. Being able to extend that to AI models and GPUs, it's exciting to see what people will build on that. >> And also, one big theme. >> I was talking to someone at an event, here outside the event at an
event party or dinner, and they said they think
that we're maxing it, not maxing out, but close
to getting the step- up function on the hardware. Yeah, NVIDIA's getting better,
CPUs are getting better. The software innovation is
where the action's going to come from and then it points to DeepSeek and many other things. So software is predicted
to be the innovative area. How do you see that interfacing with Red Hat's Inference Server? And by the way, power is a bounding function bounded by power. So if I have some X86 or compute, I might want to manage workloads across
those two, their resource.
Matt Hicks
>> Yeah. Well, and your
X86 estate is running the applications that will need to make the calls into these models. So these estates have to coexist. When you look at the Red Hat side, VLLMD to make those models as efficient from a software
perspective as possible. That's new forms of memory management, KV cache capabilities. But then LLMD, when you
expand that across a cluster, it's only a cluster in
the word is the same. Learning how to do that
well with AI models and just a vastly different structure, those are the two areas
where software will unlock for enterprises, a typical enterprise, what companies like OpenAI have been doing and we're passionate
about that of being able to let a company achieve the
same results that the larger- >> Where could people go >> or code with, to connect the
dots to take what you said? Because I think a lot of people
are in discovery mode right now trying to figure out, okay, I'm building a generational
infrastructure. I know Linux, everyone's
comfortable with it. They love where Kubernetes has become de facto on orchestration,
platform engineering, check, check, check. Now, the AI stack is looking
a little bit different. How could people
understand it differently? Is there a open source project? Are there certain things at Red Hat that people can get
involved in to learn more?
Matt Hicks
>> VLLM is a technical project and you have to have
NVIDIA cards and those, but it's a great starting point. If you're a sysadmin and
I have infrastructure and I can deploy this, it's
an incredible community. It's a really good starting point there. If you're an OpenShift user and you're trying to
figure out how to make that next tier up, starting
at OpenShift AI makes for a much simpler introduction for a lot of these pieces across the cluster. That's where we'll land this technology. VLLM or LLMD is a technical entry point to it. Relai as a single server or OpenShift AI would be the two starting points from a
user perspective on it. >> Got it. All right, so
I have to go back in time >> because a lot of the trends,
AI Ops was hot a few years ago. A lot of conversations here
at this show I would say is above more AI like native, no one's really talking about a lot of the infrastructure
other than the Neo clouds and the picks and shovels,
vendors and that GPU clouds. So ops have to run everything. So the big question amongst the hallway conversations
is will these companies ever stay alive? Will they be durable? Because
operationally, it's not just AI talking to prompts, you
have to run stuff on it. As an operator, not an operator. You're CEO, of course you're
not an operator in that sense. >> I was an operator.
- But as someone operating >> infrastructure, this is
Matt Hicks
>> where I see platform engineering bringing a lot to the table. What's the operating infrastructure
requirements from Red Hat's perspective and your perspective that needs to be in place? Because a lot's coming on top. One is the calls, you mentioned that. Is there anything else you can share?
Matt Hicks
>> We're starting to
reimagine even our core products for this world. RHEL-10 to oversimplify to a
really important feature set, it runs immutable as image mode. That's a really big step of
if you're running thousands of these in containers and they're volatile, being
able to roll back and forward and controllers will be critical. So rethinking the admin
perspective of RHEL has been a really important area there. And then second, we announced
we will be adding MCP capabilities on top of our
products so that if you have orchestrators of MCP, you can
talk to those RHEL instances and do things with them. I think that's a level
of maturity we know of. We know you will run a
lot, it'll be volatile. We're going to take what
we learned in OpenShift and amplify it. It'll be different ,
but we know containers are a common path. We think MCP will be a
very strong standard in talking across these. And then part of it is just
keeping up with the pace, seeing what people do, learn from it, build it into the products,
enable the next level. >> Yeah, MCP's been a
nice surprise this year. >> It's almost become a de facto
standard from the community. And I think that's a rallying point and that's what we're trying to tease out is do you see a
rallying point in the developer community above Red Hat
in the AI stack above you, MCP was one. AUA's getting some traction. There's some difference
between state and stateless. They'll be one of those things
that they'll let Jim Zemlin at the Linux Foundation figure that out. I know he's going to
probably add a project there. What's the rallying
point in the enterprise around the up stack more? How do you see that evolving? Because startups and enterprises, there's no
yet kubernetes-like vibe. Is there something that you see or needs to be in place that
could motivate the community to, "Hey, let's just all
decide, let's do this and then we'll move to the next thing."
Matt Hicks
>> I'll be honest. I think the thing that will motivate communities
is the first real hybrid successes of enterprises saying,
"I can use frontier models, but I can also do these
pieces myself on small models. " And that's very early
right now in enterprises being able to solve
really concrete problems with the infrastructure bets,
the running them at capacity, that's where we are,
is in that early stage of getting the first wins. Because I'm pretty confident
if you can solve simple problems on one model with singular calls, you will move into multi-step reasoning. You will move into agentic and you will solve more and more problems. But I think most
enterprises, they're not past that first phase yet of, I know
my architecture, I know how to run OpenShift AI. I know my balance of NVIDIA
and AMD or Grok going forward. Those are the pieces I
think, we need in place that will then let a agentic
work really thrive on it. >> Yeah. Leverage the foundational
bedrock, if you will. Could you explain the one call thing? Because I think that's worth revisiting and I thought that was good. So your thesis is minimize the
cost per token through calls. Number of calls. Explain that.
Matt Hicks
>> It's minimize the cost per token just on a raw software perspective, because for you to do
more powerful things, which we know works, will
make exponentially more calls and more token usage. And so, if we can't keep the token costs at a rock bottom low efficiency
level, you will never get to agentic because to solve it, your GPU costs will be higher
than solving another one. >> It's a utilization.
It's just efficiency. >> It's just good engineering. Matt, I know you got a hard stop. Thanks for coming in
theCUBE, great to see you. >> Awesome. It's great to see you too.
Matt Hicks
>> Congratulations on a very successful >> Red Hat Summit recently. Of course, theCUBE was there. theCUBE's in Paris right
now getting all the action. We've got to get those token costs down because the context windows,
the reasonings required, there is a demand for compute. There's demand for GPUs. The
software layers are coming. Of course, theCUBE is
here. Thanks for watching.