We just sent you a verification email. Please verify your account to gain access to
KubeCon + CloudNativeCon NA 2024. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For KubeCon + CloudNativeCon NA 2024
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for KubeCon + CloudNativeCon NA 2024.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
KubeCon + CloudNativeCon NA 2024. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to KubeCon + CloudNativeCon NA 2024
Please sign in with LinkedIn to continue to KubeCon + CloudNativeCon NA 2024. Signing in with LinkedIn ensures a professional environment.
Distinguished Engineer and Chief AI/ML StrategistRed Hat
Sally O'Malley
Principal Software EngineerRed Hat
Snowy Salt Lake City, Utah hosts day one of KubeCon North America coverage, led by Savannah Peterson and Rob Strechay. Kube VIP veterans Sally and Jeremy discuss Red Hat's collaboration for deploying AI-powered LLMs. Working groups focus on efficiency and cost reduction in AI model deployment. Collaboration with IBM on InstructLab aims to simplify fine-tuning AI models. Red Hat emphasizes open source's role in cost reduction and complexity. Projects like Podman Desktop AI Lab and Backstage by Spotify facilitate AI app development. Integrations with PyTorch en...Read more
exploreKeep Exploring
What has been the focus of collaboration between data scientists and software engineers in the past year?add
What working groups were spun up in the Paris timeframe to address the killer workload for Kube in the next couple of years?add
What technology out of IBM research called InstaSlice was discussed in the OpenShift Commons keynote alongside DRA?add
What potential benefits can be achieved through utilizing open source platforms like Backstage from Spotify in combination with productizing it through tools like Developer Hub?add
What are the two areas that are super interesting to the speaker within InstructLab and what project are they discussing that gained 8,000 GitHub stars in a week?add
>> Good afternoon nerd fam and welcome back to Snowy Salt Lake City Utah. We're here barreling through day one of three days of coverage of KubeCon North America. My name is Savannah Peterson with my favorite co-pilot, Rob Strechay, steering the plane here to a very exciting conversation up next. But we've had a cool morning.
Rob Strechay
>> Very cool morning. And I think, again, even just comparing this back to Paris and how many advances we've made and how much people are leaning into AI and where you see, yeah, a lot is being done on bare metal and Linux in the big hyperscale-type GPU deployments. A lot of the enterprise is looking towards Kubernetes and in a big way for being able to do their AI, which is perfect for what we're about to talk to.
Savannah Peterson
>> I know. I'm really excited. We've got two Kube VIP veterans over here, both from Paris and from Chicago, I believe is where I saw you.
Sally O'Malley
>> I think so.
Savannah Peterson
>> Yeah. So we've got Sally and Jeremy. Thank you both for taking the time.
Jeremy Eder
>> Awesome. Thank you for having us.
Sally O'Malley
>> Yes.
Savannah Peterson
>> Super busy week for both of you, I know, to the point where you're already starting to lose your voice, Sally. Poor thing.
Sally O'Malley
>> I know. That's why I have my tea here.
Savannah Peterson
>> You're good. We've got your mics. Don't even worry about that. You could be whispering and we'd be able to hear you, but I'm super excited. One of the things that struck me right away when we had a chance to chat and while we were getting set up is how much your role has changed over the last year.
Sally O'Malley
>> Year.
Savannah Peterson
>> I mean, and that's not that long of a time.
Sally O'Malley
>> No.
Savannah Peterson
>> It feels like it's been 10 years in Techland, though.
Sally O'Malley
>> At least.
Savannah Peterson
>> And I think that's reflective also of some of the shifts and true focus right now of what's going on at Red Hat. Tell me a little bit about the journey and what you're working on right now.
Sally O'Malley
>> Yeah, so what's happening right now is data scientists have been doing their thing for a decade longer and us software engineers have been doing our thing. Data scientists don't really know a lot about containers and how to deploy things and I don't know a lot about data science. So this past year has been us coming together and really creating solutions for, now that we have these LLMs that are going to change the world, how do we deploy them? How do we scale them? How do we make them accessible to everybody? That's what we've been working on at Red Hat.
Savannah Peterson
>> It really is, it's a culture of collaboration, but I think this whole community is about celebrating collaboration. Jeremy, I know that's something that you are very passionate about and something that you've been working on since the last time I saw you only six months ago. Two new working groups, a whole lot going on. Give us the breakdown.
Jeremy Eder
>> So in the Paris timeframe, so March, February of '24 timeframe, it was pretty obvious that the killer workload for the next couple of years for Kube was going to be some flavor of inferencing that requires very particular management of hardware that is new to Kubernetes. And so, in a true open source fashion, we worked together to spin up a couple of working groups to do this. Go back eight or nine years, we had similar things in Kubernetes for, we did the resource management working group, which was folded into SIG Node eventually. And then we also did something called a Network Plumbing Working Group. And the idea behind that one was we understand some really heavy hitter workloads need specialized networks. And so that plumbing is in place and has gotten downstream into product and that was workload-agnostic. Now that we have a particular workload that has sucked all the oxygen out of the room and the execs have our attention and so forth, we can go back to some of those principles. And what we did in that timeframe earlier this year was spin up Working Group Serving and Working Group Device Management. And each one of those has a different focus. So Device Management being how do we drive efficiency at a density for the utilization of these really exotic hardware, let's call it. And maybe just plainly speaking, it's expensive.
Savannah Peterson
>> Yeah, exotic was a really choice word there, though. I loved that. That was an exotic car rather than saying it's exorbitantly expensive. That was well played, Jeremy.
Jeremy Eder
>> Very. Yeah.
Savannah Peterson
>> Yeah.
Jeremy Eder
>> Well, I don't write checks around here, so it's fine by me. Call them exotic. They're handling like DRA. That's where DRA is coming in, and I'll come back to DRA in a second. And then in Working Group Serving is where we're trying to, let's talk about an engineering KPI like cost per million tokens served and what technologies are we missing right now to drive that number down, let's call it two orders of magnitude? Some number to make it actually open up the market for people that can come in and actually not make it this have and have-not scenario. I can talk more about Working Group Serving in a second. One thing we're doing in the DRA space is, and DRA was part of our OpenShift Commons keynote yesterday, along with another technology out of IBM research called InstaSlice. And InstaSlice is attempting to solve this problem while DRA matures. So it will eventually be replaced by DRA once that thing comes up in feature spec. But our customers are asking for it now, so we'll have a time-bound solution for it. That's a really exciting thing. And all of that is open, as you can imagine. We just demoed it live. So you can do really impressive efficiency and density improvements, all Kube-native.
Savannah Peterson
>> Which is compelling and matters. And speaking of doing that, driving costs and complexity down, something everyone in this room wants to have, to see happen. You hear every company, they want AI to be easy to deploy or Kubernetes or whatever it is that we're talking about in our product development. How does the open source community help a big company like Red Hat do that efficiently?
Sally O'Malley
>> Yeah. Well, Jeremy was talking about collaboration with IBM research. One of the other projects we've been collaborating with IBM on is InstructLab. So InstructLab is an open source project where you can take a small, by small, I mean seven or 8-billion parameter, LLM-
Savannah Peterson
>> Casual, small.
Rob Strechay
>> Tiny.
Savannah Peterson
>> Yeah.
Sally O'Malley
>> And you can bring your own data to it easily. You don't need to be an expert, you don't need to hand your data off to a team of data scientists. So it enables a user to bring their data to easily and fine tune and align a smaller model to make it super useful for a narrowly-focused, domain-specific use case. And I really think that's where we're going with LLMs. You were talking about the cost of running them and the smaller models are, if we can show that they can be super effective, that's going to bring the cost down and make it more accessible to everyone.
Rob Strechay
>> And I think a lot of that has to do with the development that you're doing. I mean, you have two hats on, right? You have your CNCF hat, I guess you could say, in the projects, and then you also have your Red Hat developer hat on. And what you're doing is bringing a lot of this, hey, we need to support other things like different GPUs or different providers and things of that nature. Talk to how you're building that in, because I think it is a bridge between where you were a year ago and where we're going, which is, hey, these smaller language models, not that one, two trillion. But hey, I'm going to train these on my data and I'm going to then put them out at the edge or something like that where maybe it's on RELAI or on OpenShift AI to be able to serve my customers more directly.
Sally O'Malley
>> Right. And what we've been doing at Red Hat is providing this trusted, consistent way of moving things around and deploying things everywhere. That's what we've always done. And so with RELAI, same tools, same packages, same models are going into OpenShift AI so that you get that consistent experience. And all of our LiteSpeed offerings are, making sure those feel the same across our products. Those are things that we're working on at Red Hat today.
Jeremy Eder
>> I think getting to small language models is a necessary conclusion. I think what we're seeing is that customers are looking for the foundation models to provide them with the basic knowledge, arithmetic, how to speak certain languages, and then want to operate on their own proprietary data in a very controlled fashion. So that's where the small language models come in. They obviously have a cost impact. Where the bleeding edge of research is now is making sure that we don't lose fidelity or quality as we try and use smaller and smaller models. So IBM shipped a bunch of smaller models called Granite 3.0 I think a month ago, and they're down to a 2.6 billion parameter model. And the question is what use cases might smaller models be useful for and in what scenarios do you need, actually, larger models? And then do we even have the hardware to serve these other things? So one thing we're adding to OpenShift AI at the moment is distributed inferencing using Ray Serve from Anyscale, another open source project, and we use Ray for training already inside OpenShift AI through KubeRay. All these things coming together slowly are opening, again, we're increasing the market and we're trying to meet the market where they want to be, which is a cost number that doesn't turn them sheet white and allows them to continue to have faith in Red Hat and IBM from a security standpoint. I'm really glad you mentioned that. It's a huge leg in the stool. Data transparency is a huge leg in the stool. We saw the OSI recently ship a open source definition for AI and let's just say that that needs to evolve in a certain way and I think there's other frameworks that may make sense to couple with it. One we're looking at is called the Model Openness Framework. So those conversations are for lawyers mostly, but as I understand, when you ask an LLM and you get an answer, you don't expect that. To me, that's how of users see data transparency, not so much through the legal lens.
Savannah Peterson
>> I think you bring up a good point. It goes to our earlier chuckle-y conversation about the pizza and cheese sliding off in the glue. That's an interesting, a harmless example of data transparency. Although don't eat glue. Please don't eat glue folks. I'm actually just going to say that right now. We don't need a Tide POD moment, but I think you're absolutely right about that in terms of transparency. Something I want to bring up, so I want to ask you this question. I know you have a strong, I mean you are a developer. You have a great developer lens, and when Jeremy was talking about why it's so beneficial to work with open source projects, both of you actually, in terms of driving down cost and complexity, this question is not expected, but it came up as I was thinking about this. Do you think that some of these open source projects are so widely adopted by big companies, like Red Hat or some of the huge massive enterprises that we see here behind us, because essentially for developers by developers? There's been the mindfulness to make it not as complex out the gate, in terms of building out these projects.
Sally O'Malley
>> Yeah, I guess so. With open source, it's at the core of everything we do at Red Hat. It's what we were founded on. You can't really separate. I can't even think about not being open source because I've been at Red Hat for 10 years. It's just how we do things. It's how we know it is the right way to do things. It's the only way.
Savannah Peterson
>> I love that.
Sally O'Malley
>> It's where all innovation happens. I'm not sure I answered your question, but we're all-
Savannah Peterson
>> I think you did.
Sally O'Malley
>> Very passionate about open source at Red Hat.
Jeremy Eder
>> One point I wanted to make is that we have this Backstage open source thing from Spotify. Huge Spotify fan, but especially what they did with Backstage. We productize that through, we call it Developer Hub. Here's where I can see us going. When I was with you in Paris, I asked you a question. I said, "Do you think there'll be an eventual maturity of the data science persona and the MLOps persona into something new and unique?" I see that actually happening before our eyes. And one of the ways we're trying to catalyze that is recasting the data scientist as a bit more of a developer persona. And so if we can use things like Backstage to ... an LLM is useless without an app with it. They're essentially developed together by potentially different people, them converging on a platform like OpenShift AI or others and utilizing Backstage to provide the governance around it and templating and developer efficiency around it. That's what gets us to that mindset where I don't have to fight with the tooling very much. So to your point about developers for developers, that's what Backstage is. Yeah.
Sally O'Malley
>> If we want to talk about developers for developers more, Podman Desktop has AI, this is something that I've worked on over the last year, the AI Lab for Podman Desktop. It's this amazing tool that you can just launch local, LLM-powered applications and see them in your browser. And I teach a course at Boston-
Savannah Peterson
>> Wow-
Sally O'Malley
>> Yeah, at Boston University.
Savannah Peterson
>> That makes it so accessible.
Sally O'Malley
>> Oh, yeah. So I teach a course at Boston University. It's software engineering career prep, and I have 40 students and last week we introduced Podman Desktop AI Lab. They're like a mix of Windows and Mac users. And so I got to see them all download the model. It's a GGUF model, quantized model, like four gig. That's what these Podman AI lab models are. Download them from Hugging Face and it took like 15 minutes because the wifi at these universities are so slow. So we had to wait and wait and they're like-
Rob Strechay
>> So the Terriers need better wifi. That's what you're trying to say?
Sally O'Malley
>> . I know you -
Rob Strechay
>> Note to BU.
Sally O'Malley
>> I love BU, they're great. And then as the LLM was downloaded, it was a matter of then building the model server and the students got to see that, okay, an AI application requires a model and a file server for it, a model server, and so that's running. And then they get their front end, this little stream-lit chatbot, and they go to the local browser and they're just learning what a port is. And so they put it in and the chatbot is live and you see them light up. And that's how I know we're onto something really special here. And it's actually the same thing I experienced in college back in the '90s when Linux came about. I didn't study computer science in college. I had a lot of friends who did, and they were so excited about Linux. I remember they showed me this sed command line thing and I could tell that was really special and that's something that has kept with me and why I went back to the industry all those years later.
Savannah Peterson
>> It's about making it real. I mean, the experience you just described was when somebody realizes they have the power to build anything they want.
Sally O'Malley
>> It's that hello world moment that never goes away for developers, right?
Jeremy Eder
>> That aha moment for InstructLab, I mean, I will never forget it. It was the type of thing where someone was wrong on the internet and I was able to fix it myself. It's empowering. It's just so empowering.
Sally O'Malley
>> . You did it with a baseball team and I did it with the hockey club in Salt Lake City. There's a new NHL team in Salt Lake City, so I used InstructLab-
Savannah Peterson
>> Oh, cool. .
Jeremy Eder
>> They're playing the Hurricanes tonight.
Savannah Peterson
>> Yeah.
Sally O'Malley
>> I wonder if there's tickets. Anyway-
Savannah Peterson
>> There are tickets. I know the production guys are watching it.
Rob Strechay
>> Former Coyotes, because they used to be at ASU, so that's where they used to play but go ahead. Yeah.
Sally O'Malley
>> I trained the LLM to know about the Utah Hockey Club. The first time I asked it about it, it was like, oh, it's a recreational team that was founded in 1920. And then I trained it and it was like the Utah Hockey Club was started in 2024. So I'm like, it works. Not that I didn't know it worked, but ...
Savannah Peterson
>> It still has that magical, there is that magical feeling. No, I know exactly what you mean. I used to work in 3D printing and the first time your digital design becomes a physical object. That was my favorite part of the experience was watching someone hold their, having pressed print on your brain. And that's essentially what you're doing with this as well.
Sally O'Malley
>> Yeah. And okay, so Red Hat is a whole company of people like this, and we're truly excited about open source and the possibilities. And right now, we're creating a new playground for ourselves that's going to be here for decades to come.
Jeremy Eder
>> The business people are going to come calling at some point.
Sally O'Malley
>> Yeah. I mean that's fine because you know what? Great open source projects create great products. The healthier the community is, the open source community, the easier it is to create.
Jeremy Eder
>> Can you put that one on a T for me so I can jump in. There's adjacent communities that matter too. So PyTorch, which is not necessarily under CNCF, but is going to be critical. And then Linux Foundation AI & Data. Incidentally, that's where VLLM is going to be incubated, which is a key-
Sally O'Malley
>> VLLM is-
Jeremy Eder
>> VLLM is the key-
Rob Strechay
>> .
Jeremy Eder
>> Technology for us.
Sally O'Malley
>> .
Jeremy Eder
>> It's part of our, I keep calling it the LAMP stack for AI. So VLLM will be one of those, PyTorch will be one of those. And the key pieces, those are ecosystems that we feel safe participating in, that have governance in place and stuff like that where we can get an equivalent seat at the table. So PyTorch in the future, integrations between CNCF projects and PyTorch is going to be critical. And then there's other ecosystem projects like LangChain and others that have to be wired into those, I think, into those governance things, ideally.
Rob Strechay
>> Well, again, there's nothing to talk about yet, but you guys are making investments in that VLLM space as well, with acquisitions. And we'll talk about that probably in London, down the road when actually it's real and stuff like that. But I think what's interesting is, like you said, again, it's getting turned into support for AMD, support for other GPUs, later for Intel, for some of the GPUs already doing Nvidia, the time slicing and being able to use that. You're trying to give the easy button to organizations as part of-
Jeremy Eder
>> Absolutely. One of the key pieces, one key differentiators, I think, that RELAI provides is all of the Nvidia stack built in completely QE'd by us, supported by both companies. That's an easy button to the extent we could do it so far. And we're just going to keep, that's a blueprint that works for our customers and they don't want to fight with that. We did our best on Kubernetes with the GPU operator that Nvidia ships, so that's all fine. But being able to embed it and QE it on site would , not ship a build chain, that's an entirely different supply chain security story that we can start telling for our customers.
Rob Strechay
>> Great.
Savannah Peterson
>> What an exciting time. Okay. I have one last question for you because definitely-
Sally O'Malley
>> That's it?
Savannah Peterson
>> I know. We rush, already over time, but this has been fascinating. I could talk to you all for the whole afternoon. Your poor voice might not make it, but it's done well so far. What do you hope to be able to say? Because obviously so much has changed since the last time I had the pleasure and honor of talking to both of you. What do you hope to be able to say next time we get to sit down, whether that's in London or Atlanta or both, that you can't yet say today? And obviously you don't have to violate any privacy or anything, but what are you hoping to see within the community?
Sally O'Malley
>> Gosh, you want to go first? No?
Jeremy Eder
>> The two areas that are super interesting to me is the bookends for InstructLab are evaluation frameworks and data transforming, data engineering ETL side. So InstructLab itself requires data as an input, and then we really do need to be able to apply traditional software engineering discipline to the evaluation frameworks by the next time we are together. We have a really interesting project which went from zero to 8,000 GitHub stars in a week called Dockling-
Sally O'Malley
>> Dockling.
Jeremy Eder
>> Yeah.
Savannah Peterson
>> First of all, that is quite an impressive-
Jeremy Eder
>> That's pretty wild.
Sally O'Malley
>> I was going to mention that. Yeah.
Savannah Peterson
>> That is wild.
Jeremy Eder
>> I'm glad someone remembered.
Sally O'Malley
>> So Dockling is around the data ingestion piece. I said within InstructLab, the user can bring their own data. Well, right now it requires the user to know a little bit about Git and a little bit about YAML. We want anyone to be able to contribute their data. We don't want that to be an obstacle. And Dockling is the project that's going to abstract that away.
Savannah Peterson
>> Oh, cool.
Sally O'Malley
>> Yeah.
Savannah Peterson
>> Wow. I mean, it's not surprising that everyone got so excited about it right away.
Jeremy Eder
>> Yeah. And it was well-built with ... this team is not new, that built it. They've been doing image recognition and data transform for many years out of Zurich, I believe. So people who look at it, who know, understand this is a team that isn't from scratch. So maybe that's one of the reasons. At the same time, it's an open source project. Anyone can take it and use it today. We're thinking about productizing it as the ingest side of the problem and Red Hat tries to solve that.
Sally O'Malley
>> Yeah.
Jeremy Eder
>> Yeah.
Savannah Peterson
>> Oh, exciting. What about you, Sally? Anything you want to be able to say secretly?
Sally O'Malley
>> I actually was going to talk about making data ingestion easier ...
Savannah Peterson
>> Oh, perfect.
Sally O'Malley
>> Making that more accessible to everyone.
Jeremy Eder
>> Well, it came up.
Sally O'Malley
>> It did.
Jeremy Eder
>> .
Savannah Peterson
>> What a lovely team, the two of you are. Jeremy and Sally, thank you so much for hanging out. It really is a pleasure and I feel like we've learned so much from you both. And Rob, same. Likewise with that. You're always full of fun facts like I said.
Rob Strechay
>> Always fun.
Savannah Peterson
>> I know. Always fun. I hope all of you are having fun, wherever you might be, on this beautiful day here, day one of KubeCon in North America in Salt Lake City, Utah. My name's Savannah Peterson. You're watching theCUBE, the leading source for enterprise tech news.