Understanding Today's Digital Business With Dynatrace | Alois Reitbauer, Dynatrace | KubeCon + CloudNativeCon NA 2025

Clips
More from Understanding Today's Digital Business With Dynatrace

Alois Reitbauer

Chief Technology Strategist

Dynatrace

Rob Strechay

Dir./Principal Analyst & Host

theCUBE Research

Increased testing of models and agentic AI leads Dynatrace to reshape observability in the enterprise

Dynatrace Inc. built its place in the technology world by providing observability and security for traditional workloads. That mission has now come to include support for artificial intelligence workloads with AI-powered observability, which has given the company a prime position from which to observe the latest trends.Dynatrace’s Alois Reitbauer speaks with theCUBE about observability.“We see a change in how people are building applications,” said Alois Reitbauer (pictured), chief technology strategist of Dynatrace. “In the past it was basically OpenAI, you used OpenAI and then it started to switch to other models

play_circle_outline Transforming Business Value: AI, Kubernetes, and Observability Insights from KubeCon 2025 on Maturing AI Applications and Engineering

play_circle_outline Understanding AI's Impact on Business Outcomes: The Essential Investment for Competitive Advantage in Today's Market Landscape

play_circle_outline AI agents can assist SREs by enhancing decision-making and optimizing systems.

play_circle_outline Enhancing AI Development Through a Collaborative Feedback Loop: Accountability and Real-World Insights in Production Data Sharing

Info
Transcript

Alois Reitbauer, Dynatrace | KubeCon + CloudNativeCon NA 2025

Alois Reitbauer

Chief Technology Strategist Dynatrace

Rob Strechay

Dir./Principal Analyst & Host theCUBE Research

HOST

In this KubeCon + CloudNativeCon North America 2025 segment, Alois Reitbauer from Dynatrace joins theCUBE’s Rob Strechay to dig into what it really takes to move AI applications into production on Kubernetes. Reitbauer explains how “AI-native engineering” and “AI-native operations” are emerging as teams A/B test different models, wrestle with token costs and capacity constraints, and design smarter guardrails for sensitive data such as medical records. He introduces the idea of an “agent scorecard” to track whether AI agents are actually delivering business v... Read more

Alois Reitbauer, Dynatrace | KubeCon + CloudNativeCon NA 2025

search

Rob Strechay

>> Hello and welcome back to KubeCon, CloudNativeCon North America, still in North America, and it's 2025. We're still in Atlanta. We're finishing things up strong here on day three where we're bringing things to a conclusion. I'm excited because I get to talk to Alois Reitbauer-

Alois Reitbauer

>> Yes. - ...

Rob Strechay

>> from Dynatrace. Again, you're out talking to customers all the time in your role there and you're really looking at what's coming next, especially with observability and how that really addresses business value back to customers. I love how you guys really make those connections back. What have you been hearing this week as we look at, the big thing that has been talked about all week long has been AI and inference and how Kubernetes is really going to be the platform for that going forward? How have you seen those conversations going this week?

Alois Reitbauer

>> So our conversations are actually twofold. It's a lot of AI related conversations right now. Number one is really helping customers now to move things into production or their AI applications into productions. As we know, a lot of them are not in production today or don't even make it to that level of maturity. I think that is still going on.

Rob Strechay

>> Right. - We also see a change in

Alois Reitbauer

>> how people are building applications. In the past it was basically OpenAI. You used OpenAI and then it started to switch to the other models. Now we see people experimenting way more, like A- B testing models and the practice of, I would say, AI native engineering is evolving and at the same time we move towards this AI native operations SRE practices. What do we deal with in production? And what we see there is obviously that's still the conversation about performance, about problems, people having, obviously token usage in two areas, cost, AI is expensive if you run it, but also capacity, where can you actually run it? This is also still a scarce resource. But the main shift in the conversation is can you tell me that this actually has value, that this actually works for people? Are they happy with the resource that they get? Are the applications really paying off or do we still need to fine-tune on those applications by the output that they're providing?

Rob Strechay

>> Yeah, we were talking before we went live here and I think I liked how you talked about, hey, you've got to find product market fit for your AI before. Which those of us who've been at startups, that's always the challenge is doing that. And I think that's why when I looked at the MIT study where it's like, oh, 5% is in production, that didn't bother me because I think that's people trying to find product market fit for their AI agents. But for that 5%, the stuff that does make it to production, there's a lot of stuff that, there's a lot of guardrails and governance and visibility that people need to have into that. What are you hearing from that? Because you guys really play deep in that space.

Alois Reitbauer

>> Guardrails are a key and guardrails started to emerge very early on. I think we will also see a lot of innovation still happening on guardrails. Sometimes they're a bit too static, yet we will see more dynamic analysis happening on top of that data. Just to give you one example, guardrail can catch PII data like your medical health records. You could say, okay, the guardrail would flag this information, but it's fine if you see your own medical data as intended. If your doctor sees your medical data, that's also fine. If the assistant at the doctor that you're going to, it might still be fine seeing it. But if another doctor sees it is not. So depending on which type of information, sensitive information we deal with, I think we still need to figure out how do we track this? How do we understand this behavior? And I think right now we are still pretty much statically tracking as we work through systems. Now really thinking to the next step about agentic, we have to track against goals, and I think that's where business observability comes in. You're delegating a task. You're not looking AI over the shoulder saying, "Hey, you have done this, this, this and this and you shouldn't have done this, this is not good. " With moving towards agentic, you would give it a goal, like a business goal. "I want you to onboard this customer onto a platform, help them do something with them. " How many customers get onboarded? So I have this high level concept that I'm using almost like an agent scorecard that will say, okay, how good are you achieving your business goals? How often do you run into guardrails? How much did it cost? What was the performance that you were using? What was the response time that you were getting? The response time to the end user, not so much on the technical sense, but assume you're in a chat, you need something. If this takes you 25 minutes, you don't seem to get any value out of it. And I think we start to establish this practice. And especially now moving to agentic, I think we don't have those best practices yet to that point even with the simpler rack-based applications, we are just getting there, establishing it, how do we measure the value? But I definitely see the shift from do they technically work versus do they provide value to the end users of those applications?

Rob Strechay

>> Right. And I think I've had the pleasure of talking to a bunch of the Dynatrace customers about a number of things. And I think one of the things that always has really impressed me and is exactly what you're talking about, how you connect the business outcomes and things like customer satisfaction, or CSAT, back to how the application or in this case the agents are actually performing. Are you seeing that customers are starting to lean in and try to make those connections so that they can understand are they getting value out of that?

Alois Reitbauer

>> I think they have to because you have to hedge your bets. The beginning when everybody started to work on AI there was this big FOMO, we need now to take budget from here to there to make AI. But not only that, you took budget away from another project, that project had associated business goals. Now you need to meet that same business goal. And I think also I recently did a talk and one of the questions I had on the screen was, to the audience, do you think AI is going to be a competitive advantage in your industry? And what do you guess people said?

Rob Strechay

>> I'm going to assume that almost 100% said yes.

Alois Reitbauer

>> Yes. And my answer is no, it isn't. It's going to commoditize, it's going to be the standard of how we build things. So you have to get it really good really fast. So you now have to really build up the expertise and the practice in a field that is evolving super fast. Like with the model example, people are now testing different models, you have to test them in production because that's where you're getting the data from. You need to work with better data sets. I think that's what companies have to realize. What you assume is a competitive edge by building something will be commoditized at the speed at which we're working on very quickly. So it's not a nice to have to be good at building AI native application, it's going to be a must have if you want to succeed in the industry that you're in. And that's why we have to very quickly learn and fail, I think quicker than with any other technology that we have been using in the past.

Rob Strechay

>> Yeah, I mean I liken it that I use it that everything that's old is new again kind of premise. So we've gone through these cycles of, hey, we went from three tier applications, then we went to distributed services and now we're going towards, hey, these services can actually build themselves and things like that with these agents that are codifying agent skills potentially and building them and you got to kind of make sure that they don't go rogue on that kind of stuff. To me, that would seem to be that if you don't have the visibility into what's going on in those black boxes that you've now built, you could get yourself into some very difficult times either for not understanding why it's acting the way it is or it goes off and does something it's not supposed to do.

Alois Reitbauer

>> And I think the engineering practice is also pretty young. So I think just understanding... I think one of the very positive trends was that AI models now actually share a train of thoughts that helps you to understand how the model arrived at where they arrived, pretty much. I think that helps us a lot on the debugging side. But debugging AI in agentic applications is kind of different. Why did it behave one time like this and the other one? You need to look at individual transactions. And the more we move into more dynamic systems, like going more into this agentic world, the more the individual transactions will be different. So detecting anomalies will get harder. There's every transaction is almost a snowflake. So how do you see what is very similar to what is really significantly different? I mean, all of those things can be built. But that's what I think what people have to learn and how to work with the data and obviously how to properly adjust to that type of system behavior. So I think really where we are learning as we are rolling it out. And the other trend that I see, it's again what we see very often in technology, and you mentioned platform engineering before, I see something similar happening now when we talk about observability and using agentic AI in observability that everybody's to now do custom build up platforms. Just as we had at the beginning of platform engineering. Now I'm going to build my own IDP, I'm going to build my own IDP. And people realized while they were stitching stuff together, it worked quite fine, but it turned out to be a lot of work, a lot of engineering time went into it, and there are actually products out there that can do it. I see the same actually happening right now on whole agentic application space that's not core to your business especially.

>> Yeah, I agree.

Rob Strechay

>> I think I was talking to somebody and in fact they were talking about did you build your own IDP or did you roll one of the ones there? And everybody at BackstageCon talking about all of that type of stuff. And I think pieces in making it reusable, because I think a lot of the toil falls on SREs and what they're trying to... It's almost unknowable how they have to have these skill sets that are so broad these days. Are you seeing that in the customers you're talking to and prospects that, hey, they're looking for things that can help them just understand the landscape, bring it back in a way and tie it back to the business that makes a lot of sense?

Alois Reitbauer

>> Yeah. I wouldn't necessarily say that the SREs... Well that's maybe the wrong statement. But cares too much about the business, but the business is the number one thing to think about it in the morning, that they do think about the business of their company, but they need to keep systems up and running. We keep adding more and more complexity and they have to run them. And one conversation that we have a lot when we talk about agentic applications or AI native applications, we want to monitor the same way as the rest because we can't add more complexity at certain layers and we need also AI to assist us. I think that's where for me I'm not so much I'm in the house of, okay, AI is improving efficiency. I think nobody who's building an AI application or is using AI should think about efficiency first. I'm more do you improve the quality of the work output? If you can make an SRE more efficient by using AI, that is way more helpful. You're trying to trigger, for example, a remediation action on the system. Yes, AI can help you to trigger that action. It can prepare it for you. You're only reviewing it, executing it. That is one step. That's what hear a lot of people talking about. But where I think we eventually need to get to AI challenging you and telling you, "Hey, the last time somebody tried this on the system, these were actually the side effects. Let me check why this might happen again today. " So your overall decision process, not for the immediate mitigation, but remediation might actually get longer, but the decisions that you take and how you modify, tune the systems are actually getting better. And that's what I don't see is a lot of people looking at, okay, we get more efficient, we can do more with less, but there's very little conversation can AI to actually do something better by more or less submerging information to people who have to take decisions more efficiently out of this very large systems. I think that's way more the conversation that we should have. How can we help people to take more informed decisions, still optimizing the ones that we can automate? But the other part I think is very much neglected the same way where, I think that's an interesting shift right now, everybody was talking about development, help people to be more efficient writing code with AI. Now we see the shift, "Hey, actually 80% of our time is spent on actually running and maintaining them. If we get even half the advantage in there, that has an order of four magnitude more impact."

Rob Strechay

>> Right. Yeah. And I think that's exactly it. I also think there's a huge, not just skills gap, but lack of skills and skilled people in that area of running the stuff. And people, it's a place that honestly, most companies don't want to invest more money in. So if you can have an AI agent or as I keep calling them AI buddy, to help out those SREs to really achieve more efficiency, that to me is really key. I mean, you guys have had AI in your product forever. I mean, again, and I think when you start to look at the learnings you have internally going through this process and then applying it out to customers, how do you see the things that customers are coming to you and asking? It's like, yes, we understand that because we went through this process as we built our own software and we're actually doing this as well, and this is why we think if you go down this path you can be successful.

Alois Reitbauer

>> Yeah, I think that's where we help them. Number one step is you have to obviously get a lot of the load off people so they can focus on other tasks. But our focus was really, okay, there is a task. It is very well understood. It can be easily automated. There's no reason for somebody to do it. But from day one, we started to have explainability in there. That somebody can take, does this decision actually make sense? Can I trust a decision that was made and do I actually understand it. So it's not just it's just creation, it happened to be like this. Because also you want to have this feedback loop so that people also learn how decisions are taken. And also I think especially the output quality in the context you can provide becomes even more vital right now as we start to connect AI systems. Like, in our case, we take our predictive and causal AI, now the generative AI or AI agents that now can take or propose action. Suddenly that context becomes so important because depending on how much content you provide, the quality of this action is getting better. And that's where we see that they're interested. They want to go eventually to zero incident policy, that's what you hear from a lot of them, AI taking care of more. I think once they're done with this... We will see it in three phases. We see first it's remediation. If something's burning, you have to take out the fire. Step number zero, fix what's broken. Step two is going to preventative, we see more people are going to preventative operations, ensure that the system wouldn't even fail because you have predictive AI, you understand how a system might potentially fail, what might fail there, you're taking it there/ and then you move into optimization. I think it will up-level the entire industry, but so many people are stuck taking fires out on a day-by- day basis that they never get to this next level. It's almost like, I hate the word, but like a majority curve to move from A to B to C, and that's where we start. We free up their time that they can start to focus and even get strategic. If part of your infrastructure is down right now, or part of your applications are down right now, you're not thinking how could I optimize my system to run 20% more effective? And you never get there. And then we increase the pace, we increase the complexity, and that's where we need to help them.

Rob Strechay

>> I totally agree. And I think, again, it's always fun to talk with you about these things because you see it, we actually see it very alike, but you're talking to these customers all the time. We're talking to organizations. Last question here. When we're together either in Amsterdam or a year from now in Salt Lake City, what do you think we're going to be talking about?

>> I think the topics won't change necessarily.

Alois Reitbauer

>> We will see more people being in production. I think there will be, again, a shift throughout this year. It's going to be interesting what it's going to be. We will talk, I think, also more bit how do we get production feedback to developers in AI? That's a big topic. It's really the first technology that's so heavily focused on production data. It's so hard to test. It's almost the extend left but give them the data from the right. How can we do this? I think that's something we're talking about. And I hope we'll see the majority in the industry and also, but it's the coding side but also the operations side, where people don't talk about the vision and what can be built, where we will start to hear from the first people, okay, this is what we built. This worked really well. We tried this. We are not yet there. So there will be some, okay, we had this great vision. It didn't work. So some more reality check for people. I think it's also the majority, how the industry will go, what can actually be done. People thinking more long term. We see people even on our side going more into this auto-remediation. We have some who have it in production for moderately simple use cases. We have very few customers we have very advanced use cases, but we will learn as we go, how we want to go there. We will see more companies adopting it. We should have more conversations what have people learned versus how they anticipate to go in that direction.

Rob Strechay

>> Yeah. I can't wait to be talking about this again with you. Alois, you're awesome to talk to about this stuff.

>> Thank you. - Because I think it's moving so fast

Rob Strechay

>> and I think this value is key.

Alois Reitbauer

>> And I love that you brought up the explainability, which has been absolutely not around in this conference this entire week, and it's hugely important. So thank you again for coming on board.

Alois Reitbauer

>> Thank you for having me. - Yep.

Rob Strechay

>> And thank you for watching KubeCon, CloudNativeCon North America 2025 from Atlanta. We're done here, but you can go back and watch all the episodes on theCUBE. net and on YouTube. Stay tuned to more from theCUBE, the leader in tech analysis and news.