KubeCon + CloudNativeCon NA 2025 | Gaurav Saxena, Automotive Industry

Clips
More from KubeCon + CloudNativeCon NA 2025

Gaurav Saxena

Engineering Leader

Automotive Industry

play_circle_outline Enhancing Developer Productivity and Cloud Flexibility: The Role of OpenTelemetry and Unified Platforms in Open Source Solutions

play_circle_outline Telemetry challenges in hybrid/multi-cloud environments and correlated data normalization.

play_circle_outline Importance of compliance with open source licenses during software development.

play_circle_outline Enhancing Collector Management with Kubernetes Operators: Community-Driven Tools and Best Practices from CNCF Contributions

play_circle_outline Addressing security and compliance in automotive data collection and updates.

Info
Transcript

Gaurav Saxena, Automotive Industry

Gaurav Saxena

Engineering Leader Automotive Industry

In this KubeCon + CloudNativeCon North America interview, theCUBE’s Rob Strechay meets with automotive platform engineering leader Gaurav Saxena to discuss how open source is powering large-scale, real-time vehicle platforms. Saxena details why OpenTelemetry sits at the heart of a cloud-agnostic strategy spanning AWS, GCP, Azure and on-premises. By correlating traces, logs and metrics, enterprises get end-to-end visibility across customer journeys – from mobile apps to over-the-air updates in millions of vehicles.

The discussion looks at the practica... Read more

explore Keep Exploring

What role does open source play in your strategy and how does it affect your cloud capabilities? add

What are the challenges of integrating telemetry from diverse environments using OpenTelemetry, and how is platform architecture designed to normalize and correlate that data? add

What are some best practices for deploying and managing OpenTelemetry from a platform engineering standpoint? add

What challenges exist in managing OpenTelemetry collectors at scale, particularly in relation to global operations and data residency regulations? add

What measures can be taken to ensure that data collectors are managed securely, comply with regulations, and are deployed without vulnerabilities, particularly in sensitive environments like automotive systems? add

bolt Powered by CUBE AI

Gaurav Saxena, Automotive Industry

search

Rob Strechay

>> Hello and welcome to this AnalystANGLE. I'm Rob Strechay, Managing Director and Principal analyst with theCUBE Research. Really excited to be here gearing up for KubeCon and CloudNativeCon North America in Atlanta. It's really going to be the epicenter of what's next in cloud native innovation. This year will be a significant turning point for the industry. We'll see AI and Kubernetes come together in ways that are reshaping the entire stack. Platform engineering is transforming developer productivity. Observability is becoming more intelligent, and automated security is moving forward toward true zero -trust architectures and edge and hybrid clouds are unlocking entirely new deployment models. Over the next few minutes here, you'll get to hear us break down some of the trends, and we'll be talking to some leaders and users of this technology to really dive in deep and get a better understanding of what's going on. Right now, I'm excited to welcome Gaurav Saxena, who's the engineering leader in an automotive industry leader there. So Gaurav, really glad to have you on board. Exciting stuff going on. I know I'm going to get to see you pretty soon down in Atlanta just in a few weeks here, so welcome on board.

Gaurav Saxena

>> Thank you, Rob. Thanks for having me over here.

Rob Strechay

>> So as a platform engineering leader, how do you view the strategic value of things like OpenTelemetry and other open source projects in reducing fragmentation and building a unified internal developer platform? Is open source now foundational, or it's still supplementary to getting you into enterprise scale platforms and strategies for what you're doing with your platform team?

Gaurav Saxena

>> Yeah, open source is really a paramount to our strategy here. We actually embrace OpenTelemetry for all of our observability needs. We are one of the biggest users of all the CNCF projects, you name it, Crossplane, Kubernetes, OpenTelemetry. The reason it helps in our strategy and vision is because we are also cloud-agnostic in a way. So we can run our workloads in any of the hyperscale cloud providers, AWS, GCP, Azure, all of our on-prem data centers. So having the unified open source tooling gives us the power to deploy it in any cloud tools that we want to do. So that's our core heart of the strategy here.

Rob Strechay

>> Yeah. And we hear that a lot and I think that, that to me is one of the great things about Kubernetes and KubeCon and CloudNativeCon. I've been arguing that it should be just CloudNativeCon mean. Again, Kubernetes now being 11 years old and some of the platforms that have been out there are also over 10 years old now. I think it's a lot of fun, but like you said, let's dive in a little bit more because I think you have a really unique perspective and I think the world is hybrid or multi-cloud, and if they're not there yet, they're going there. Platform teams must support everything from Kubernetes and serverless to legacy and SaaS. What are the biggest challenges in bringing together telemetry from these heterogeneous environments using OpenTel or OpenTelemetry, and how are you designing your platform architecture to really normalize and correlate that data?

Gaurav Saxena

>> Yeah, OpenTelemetry, the way we are using in my industry is that we use all the three pillars, traces, logs, metrics. Profiling is right now in the special interest group. It's getting flushed out in terms of the spec, but it's coming out soon. We use all these three signals in as much correlated way so that when we have to do a troubleshooting for our any customer journey workflow, we are able to deep dive into the root cause analysis when at the time of either debugging a production incident or just trying to find out some analytics of how that user is interacting with our platform, whether it's our own internal teams or it could be the driver of a vehicle. So we also ingest the data from a vehicle back to our cloud platform to analyze the census data of a vehicle for our development teams to take a look at the data, to do the predictive maintenance that they can send over the air updates, commands back to vehicle to make vehicle more function properly. And these end-to-end distributed tracing helps us to navigate either the bottlenecks where if our systems are not performing, our internal systems are not performing well enough to make it better or track the downloads that we are sending to the vehicle in an efficient way. So as you can imagine, we have millions of vehicles on the platform. How you manage them at scale to make sure that those vehicles are getting the proper updates, those vehicles are sending the data that we need in order for us to make it reliable. And that's a distributed nature of our tracing placed into very important role. So we have a lot of OpenTelemetry fleet of collectors that power these applications. The challenge that has to answer your second point of question, but the challenge being here is that when we go and update our own Kubernetes-based infrastructure on OpenTelemetry collectors or operators to update them at a scale like a fleet of collectors, making sure that we are still serving the traffic and still collecting the traffic without making it a zero downtime is a challenge, because observability is not an afterthought, it's a first-class citizen. You want to make sure that when you are serving your live customers, any real-time data that you want to take advantage of it, it gets into a system as fast as possible. So latency is also a challenge in how fast we can get the data from vehicles to a cloud platform. So the upgrade maintenance and the latency are the major challenges that we are trying to play and then make sure that our systems are performing better.

Rob Strechay

>> I mean, I think what was great from what you were just describing and it was like it's all of the above that you have to deal with. You have a lot of homegrown stuff, like you said. You have the apps that are on phones for different information interacting with the car or interacting with service or support or what have you. So it's a very complicated set of systems to put it mildly. But with OpenTelemetry, and I think you kind of hit on this a little bit, driving almost exponential data generation, how are you managing the cost implications, ingesting, storing, analyzing observability data at scale?

Gaurav Saxena

>> Yeah. So there are two parts to answer this question. One, we call it an operational data, and one is called analytics data. So there are two fragments of data here. To make sure that your costs are not hitting the roof and are manageable, you want to fan out these two separate principles, the operations data and analytics data into separate buckets. So for example, if I'm serving a traffic, if I'm serving an over-the-air deployment to vehicle, that's an operational data for me, because I want to understand, is my update the manifest file that I'm sending to vehicle, is it getting downloaded by vehicle at what increments of time? And I also want the notification of that when that happens. So I want to track this in as fast as possible so that I can find out the issue faster than the user of a vehicle that either provides me the information. That data gets sent into what we call the hot storage, meaning that's where we need the low latency as much possible. Now, when the download has been done, the end results of that, we actually fan out by separate OpenTelemetry collector and put into the cold storage. So we can run analytics 30 days from now, six days from now because-

Rob Strechay

>> So you're sampling all the time and what you're doing is data tiering off so that you don't have ... You're processing it all centrally, but you're basically tiering it to the right place for the right cost and right amount?

Gaurav Saxena

>> Yeah.

Rob Strechay

>> Yeah, I think that makes a lot of sense when you start to look at it. I think it can get out of hand. I mean, I was talking on another podcast about this in the security realm and the security observability data and just the old thing was, "Oh, we're just going to all shove it into one big data store and we'll keep it forever kind of thing." And then people are like, "Oh, wow, our cloud bill is really expensive right now and it's all just this legacy data. Do we tier it? Do we start?" And this is stuff we've dealt with in "big data." You and I have met up at the Databricks conference and had a chat there about this as well, which we'll broach that in another AnalystANGLE. But I think from this perspective, and when you look at platform engineering and you look at it from a platform engineering standpoint, what do you see as kind of some of the best practices for deploying and managing OpenTelemetry collectors at scale? And how do you really ensure reliability and performance without overwhelming your infrastructure or your budget? Because I know it's a delicate balance.

Gaurav Saxena

>> Right. So fortunately, the open source community around the CNCF has a lot of tooling around managing your OpenTelemetry collectors at scale. Now, we as an end user of that have a duty to make sure that our contributions that we do here for open source gets back into community. So in this case of managing the scale of the OTel collectors. It's still a largely unsolved problem in my point of view. When we are talking about running thousands of Kubernetes clusters and not just in one region, the industry that I work for has a worldwide offsite, so we also have to take care of the data residency regulations, a GDPR compliance and data from Europe can't come to US, right? And stuff like that. So how do you make sure that not only you are managing the collectors at scale, but also making sure that data has a proper access and a proper compliance point of view? And it's also secured anyway. So the OpenTelemetry collector is just one piece of the puzzle. How do you deploy your collectors in those regions to make sure that they have no vulnerabilities? Because think about that, the automotive, the car that you're deploying to these can't run an unsecured software. You talked about that in your opening sentences about zero-trust, isn't it?

Rob Strechay

>> Right.

Gaurav Saxena

>> How do you make sure that your collector has authenticated with the right set of vehicles that you're trying to do it? How can you make sure that when you do upgrade, there is no malicious code that gets inserted either via at the time of deployment or at the time of building that image itself? So how can you do what we call software supply chain security in terms of provenance? Where did that image came from? What are the dependencies it came from? Can we attest that? Is there a git commit? Who signed that commit? And when we as an end user do it, are we doing the right tag and cryptographic sign it before we deploy to one of those regions of those collectors where it's responsible for collecting the data from vehicles? Now, in terms of, as you can see, there's a lot of moving pieces here. There's a lot of operational burden workload. The way to do that is to have a right set of monitoring tools. So when we upgrade our collector, we do it like let's say a batch of collectors at a time for the geographic region. We have a whole run book around that. If the things are not going well, we have to run book to roll it back to the old version of the collector, because right now we have to also play with the catch-up of the OpenTelemetry collector version because they get released every two weeks and we want to take a look at its release notes, what features we are trying to inculcate from that upstream to our version of the OTel collector, and make sure that it is still compatible with your current set of users. So that's a challenge. Fortunately here, the Kubernetes operators for OTel collector plays a very important role. It's basically a tooling that helps you to deploy that without much looking at the source code itself. So there's some automation done by the Kubernetes itself, by the CNCF, and we also have our own automation on top of that, that does the rollout in a .

Rob Strechay

>> Yeah. No, I think that's the key, right? Is being able to bring all together your stuff with the open source and bring it back up into the main branches as being part of that community, I think. But one of the things, and you kind of touched on this a little bit is, how do you create a shared incentive between development teams and the platform engineering teams so that OpenTelemetry is implemented in a way that supports both innovation and that operational excellence that you have to have for your customers?

Gaurav Saxena

>> Yeah. I mean, we really imbibe on this inner source contributions. So we as platform engineers are a little bit far away from the end customers, because we are the infrastructure team that's helping these applications to use our tooling to do the job that they are tasked with, which is let's say OTA deployments or get me data from vehicles so that I can use that data to build my machine learning models to do some analysis of the data set and then deploy that model back to vehicle. So my part of the role is basically to provide that infrastructure to cater towards both the AI, ML set of applications as well as getting the telemetry data from vehicles. To answer specific to your question, the inner source contribution, we as a platform team, work with our end users, which is our internal teams, to understand their use cases and provide the right amount of abstraction, right amount of interfaces that they need and they should not be getting bogged down by the internals of platform infrastructure. Meaning as if I'm an application developer, I have a business goal of deploying this application to the vehicle that can fetch me data from these many sensors. As a service developer, I should not be worried about how my application is getting deployed because that's a concern of operations. If I have to do that, then I'm basically spending my time on my CIC tables, operations, observability. That does not really move the needle for the business logic. So my part of role is basically to work with those service teams to understand their use cases and provide them the guidance for the right set of interfaces that I can expose from my platform services so they can use that to deploy wherever their workloads are or have the KPIs for. So as the developer, I don't need to worry about, "Am I deploying in my AWS region or GCP region?" I only care about, here's my chart limits for my request CPU memory, I need an ingress, I need a database platform team. You tell me where is the right side of both from the FinOps perspective as well as from the perspective of which region has more latency-sensitive applications. You figure that out and deploy the right set of infrastructure to cater towards that need, and that's where we bring in the value.

Rob Strechay

>> Yeah. No, I think that to me makes a lot of sense. I think, again, when you start to look at it, it's about servicing the end customer, even though you are, like you said, a little bit further removed from that, you're still pretty plugged in. I mean, you're touching their devices and all of that stuff with all of this telemetry so that you can get that information back and help them have a good customer experience or CX from that perspective. That also brings me, because I think I can see where you're going with this and I can think towards what it would be, but how are you measuring the impact of observability on developer experience, deployment velocity, and just the overall platform ROI as it might be?

Gaurav Saxena

>> Yeah, great question. I'll divide this into two parts. One being, let's say you are a driver of the vehicle that I help in producing it. How satisfied your driving experience has been? Like for example, if I am doing an OTA update to your vehicle, has it improved your driving experience? How do we measure that? That's one side of equation. Secondary equation is the teams, the internal teams that we help to use our services for their workloads, how do we measure their developer productivity or developer experience? So there are two sides of broadly classified in terms of the experience. The first part of the vehicle side, we actually measure through ... Since we have millions of vehicles on the platform, we measure in terms of our deployment times, our release times of those OTA software to vehicles itself. So we track for each of the manifest that we ask users to download in their vehicles, we keep a track of when they started, when the vehicle got connected, because vehicles are moving target. Sometimes it could be in the garage where there is no signal to receive the manifest. It may have received the partial manifest, but it has stopped downloading because it's no more connected to the platform. So making sure that the observability is providing you all that nuances around where when it started, why it stopped as an example. Can we get those events? So that we can time our OTA deliverers for that particular vehicle at a time based on their usage driving pattern as an example. So we get that data through all of our observability needs. In terms of measuring the developer productivity, we measure through in terms of for the same OTel collector, the same observability stack is actually checking your CI/CD pipelines. So for example, for that particular software release, what was the bill time where we spent more time as an example? Is that bill was cryptographically signed? Did we sign all the measures? Did we have all the security scans, like all license checks? Because we are an open source software corporate, so we should make sure that our license checks are still in compliance, right?

Rob Strechay

>> Right.

Gaurav Saxena

>> And before we then push it out the Artifactory before the vehicles can pull the image from there. So we measure for each of those steps, the latency and the request volume so that we have all the data for the betterment of next set of releases to make our software much better the next time when we have to do that deployment.

Rob Strechay

>> Yeah, no, I love that. I think to that point, I think again, like you said, you have to be in compliance. I've built many different SaaS-based platforms and on-prem platforms using software that's open source and you have to be within compliance of all those licenses to be honest. And I think, again, the fact that you're contributing back is great too. So kind of final thoughts here. We're both going to be in Atlanta. I'm excited to be there. I mean from what I'm hearing, the crowd is going to probably be the largest that they've ever had. It will be interesting. Usually, they get larger crowds in Europe than they do in North America, but I know a lot of people who are going to this one and I'm excited for it to put it mildly. What are you looking forward to? Because I know you're doing a panel and you're doing a couple talks there. You're going to be a busy guy to put it mildly, so what are you looking forward to in Atlanta?

Gaurav Saxena

>> Mostly, the use cases that I'm working towards today, and it's right not in production, but hopefully one soon be, is around the edge IoT, right? So I talked about so far is about the deployments from cloud to device and then getting the data from device to cloud, the bidirectional workflow. What if we can move some of these workloads at the edge of vehicle? That's where the power of the newest AI models can help us as close as the vehicle. When you are doing the transactions, you can reduce the dependencies from so many network hubs. How do we do that in the native Kubernetes way is what I'm mostly interested at getting out of the value while attending those conference presentations by various members of the Cloud Native Computing Foundation committee members.

Rob Strechay

>> And I think that's the great thing. And I did a panel in London at the European one, even though it was called EU, even though it was in the UK, and that was a whole bane of existence or constraint there. But I think was a great time and I think, again, it's great having you on Gaurav. Thanks for coming on board and I'll see you soon.

Gaurav Saxena

>> Thank you for having me, Rob,

Rob Strechay

>> And thank you for joining us and we'll see you soon in Atlanta for KubeCon, CloudNativeCon North America on theCUBE, the leader in tech analysis and news.