KubeCon + CloudNativeCon NA 2025 | Alon Horev, Vast Data

Clips
News
More from KubeCon + CloudNativeCon NA 2025

Alon Horev

Co-Founder & VP, Technology

Vast Data

How AI-Kubernetes integration brings cloud intelligence to the edge

The fusion of artificial intelligence and Kubernetes is redefining modern infrastructure, shifting how systems are built, deployed and scaled. Cloud-native now extends beyond centralized data centers to the intelligent edge, where real-time analytics and data sovereignty drive innovation, paving the way for a unified AI-Kubernetes integration framework.Modern infrastructure is shifting fast as cloud-native design stretches beyond centralized data centers to the intelligent edge, where real-time analytics and data sovereignty drive innovation. At the center of this shift lies a new kind of power: AI-Kubernetes integration

play_circle_outline Introduction to KubeCon CloudNativeCon and its significance for cloud-native innovation.

play_circle_outline AI and Kubernetes integration reshaping the cloud-native architecture stack.

play_circle_outline Unlocking Developer Efficiency: The Role of Open-Source Tools and APIs in Self-Provisioning Data Pipelines and Multi-Tenant Infrastructure

play_circle_outline Future trends focusing on improving resource efficiency and deploying resilient environments.

play_circle_outline Collaboration with open-source communities to refine production-ready applications and frameworks.

Info
Transcript

Alon Horev, Vast Data

Alon Horev

Co-Founder & VP, Technology Vast Data

In this KubeCon + CloudNativeCon North America interview, theCUBE’s Rob Strechay speaks with Alon Horev, co-founder and chief technology officer at Vast Data, to unpack how Kubernetes and cloud-native standards are powering real-time AI at the edge and across hybrid environments. Horev explains why Kubernetes is the “orchestration platform of today” and why open protocols (such as S3 and Kafka) are essential to avoid product lock-in while enabling developer choice and portability.

Horev discusses platform engineering, multi-tenancy and zero-trust as fo... Read more

explore Keep Exploring

What are the significant trends and transformations in cloud-native innovation leading up to KubeCon CloudNativeCon North America? add

What does the term "cloud-native" refer to in the context of infrastructure and processing? add

What is the role of infrastructure teams in enabling developers to self-provision and control their data pipelines through open source tooling and APIs? add

What are the ongoing efforts and considerations for optimizing AI inference applications, particularly in the context of Kubernetes? add

What strategies can be implemented to enhance inference frameworks that facilitate KVCache offload, such as Dynamo or LanCache? add

bolt Powered by CUBE AI

Alon Horev, Vast Data

search

Rob Strechay

>> Hello and welcome to this CUBE conversation. I'm Rob Strechay and we're here gearing up for KubeCon CloudNativeCon North America in Atlanta. The epicenter of what's next in cloud-native innovation this year will be significant turning point for the industry in my view. We're seeing AI and Kubernetes come together in ways that are reshaping the entire stack. Platform engineering is transforming developer productivity, observability becoming more intelligent and automated security is moving toward true zero-trust architectures and edge in hybrid clouds are unlocking entirely new deployment models. To help me unpack the goings-on and what you can expect, I want to welcome on board Alon Horev, who's the co-founder and CTO at Vast Data. Well come on, Alon.

Alon Horev

>> Thank you, Rob. Thank you for having me.

Rob Strechay

>> Yeah, I think this is going to be exciting. Both of us are going to be there. Vast is coming in a big way for the first time. Really into this. Even though you really know strangers to Kubernetes and Cloud-Native, I mean it's in your guys' DNA from the beginning. Vast has deep roots in both Kubernetes and cloud-native architectures. Explain to us how Vast is actively contributing to or integrating with open-source communities to help developers build AI applications more effectively and efficiently.

Alon Horev

>> Yeah, and that's a great intro. We go nine, 10 years ago when we just started a company, we went into this mission of enabling customers to build platforms that can store lots of information. What we've done over the last few years is also starting to integrate compute capabilities to bring compute capabilities into that data that have been collecting over time. What AI models bring to the table is the ability to understand what's buried in data, what's buried in unstructured data, especially with embedding models for video, for audio, and Kubernetes has been something that has been an enabler for customers to build environments, to build platforms across clouds, on-prem environments, colos. When people say cloud-native, it's really the ability for infrastructure providers to build consistency across different venues and locations where processing takes place. And what we've been seeing lately is that processing needs to get closer to the edge. When you have an AI model that can reason and understand what's happening live connected to a video camera, you can't necessarily move that data all the way to the other side of the country or to another continent in order to run inference. Sometimes there's sovereignty and security involved in being able to do those things at the edge, so being able to deploy in different locations. Kubernetes has been, it's very widely known right now that this is the orchestration platform of today. Being able to deploy both in the cloud and at the edge is really a liberating component for infrastructure teams, and we've been supporting Kubernetes day one in the sense that Kubernetes orchestrates compute, we orchestrate access to data, so being able to present customers and present Kubernetes clusters with access to data, be it block storage, file storage, object storage, all those different types of shapes of data, different protocols has been critical for customers to, again, being able to deploy and mobilize workloads across different environments.

Rob Strechay

>> I think that's totally exactly how we're seeing it play out as well, and I think Vast is very well positioned from that perspective. I mean, you talked about the infrastructure teams, which are really what platform engineering is all about. I know you guys have some multi-tenancy as a foundation of your product and the architecture underneath really help us understand how does Vast enable developers through open source tooling and APIs to self-provision and control their data pipelines without heavy infrastructure intervention? Because I think that's to your point, it's about how do you get to value on the data faster.

Alon Horev

>> That's correct. The transition that customers are making these days and infrastructure teams is from the classic IT way of doing things where you would provision a lot of different types of infrastructure just for a single project and into a more resource efficient way of sharing resources. Sharing resources is exactly what Kubernetes lets you do. It allows you to provision services, provision applications and basically give Kubernetes some guidelines around how many resources they can get. Doing that for the data layer is similar but different. On the one hand, you want to be able to use a single pool of compute power, of storage power and feed all of those different users workloads, applications. But then it comes with a bunch of new requirements around multi-tenancy, around security. How can I have governance? How can I know who's accessing a certain asset of data after making it available for a few teams? So being able to support different workloads that are competing for resources required this architecture that we've built that enabled us to implement key OS policies and zero trust in a sense that we can have strong authentication all the way to the end user without compromising security, without compromising flexibility. And we have customers that are deploying this for their enterprise where you can see multiple different teams and departments that don't necessarily need everyone to see all of the data in a big enough organization. You already have security controls even within the organization, and sometimes it's for cloud service providers that are building their clouds, and that's being a component there, enabling that multi-tenancy access to data services like again, file object event streaming and various services, additional services that we support.

Rob Strechay

>> I think you hit the nail on the head and I think again, when you start to look at it, you guys have people like CoreWeave and other large enterprises that are doing cloud native and really have open source driven environments that use Vast as an underlying platform. What lessons have you learned from working with these open source centric customers that you could share with those who are watching right now?

Alon Horev

>> That's a great question. So what matters to a lot of people that are leveraging open source is the freedom. They want to be free to replace different parts of their stack. They want to be able to deploy in different locations and standards like Kubernetes is great. When you look at the data layer, there's sometimes a conflation between a product and a protocol. When people are looking at S3 or at Kafka, there's a product called AWS S3 and there's a product called Confluence Kafka, but it's also a protocol, and I think one of the enablers for a lot of players in this community is being able to adopt and support the standard to make it easier on the end users and the customers to not necessarily need to change the application nonstop. So Vast supports and encourages standard protocols, Amazon's S3 and Kafka and all of those protocols are standards that we support, and that encourages an ecosystem to build tools on top of those and to keep improving and compute contributing without necessarily locking in the end user to a specific product. Locking into a protocol is not locking in a sense that you can really swap out the underlying implementation for different products or different implementations.

Rob Strechay

>> Absolutely. Having been over at one of the aforementioned hyperscalers there, I can tell you that when you go and design stuff, it's not necessarily for... have the ability to move around to put it mildly. So I love that you're talking about it that way because I think again, with the introduction of the data engine and the serverless orchestration that you've built on top of Kubernetes, you're leveraging a whole number of other open source protocols as you put it, and I think Kafka and functions frameworks or serverless runtimes help people understand how you're allowing developers to really take advantage of that inside the Vast platform.

Alon Horev

>> So with version 5.4 that we just released a couple of weeks ago, that's for the first time is diving into orchestration of compute, and we went there because our customers have been telling us and showing us that it's not trivial for them to run compute in real time over data that comes into a system. Imagine retail use cases, imagine smart city use cases where human lives are at risk, being able to react in real time, being able to invoke an AI model or an application on data that just came into the system becomes critical for those kinds of customers. So we went ahead and we implemented the serverless function API. We implement the triggers, the combination of those two things, functions and triggers allow us to invoke compute in real time as data comes into the system. That allows us to index information, but also to trigger applications to get their job done as soon as possible, as fast as possible. The way we want to deliver that functionality to customers is through Kubernetes because again, Kubernetes is that component that you put on top of your hardware or your underlying resources in order to use them efficiently to think about high availability, what to do when things break, how do you scale an application automatically. Kubernetes gives you all of that. So the level of abstraction that Vast provides with our data engine that we're just released now allows users to basically implement the function and then Vast along with Kubernetes can basically scale that automatically as load grows or shrinks thinking about high availability, all of the nasty things they need to take care of when things fail or things break, and really support the infrastructure team's initiatives in leveraging the same pool of resources both for functions running on Vast or other applications or agentic workflows that they would be triggering otherwise. Yeah, no, I think look at Kubernetes as that sole foundation for orchestrating compute across the data center.

Rob Strechay

>> I think that ties in really well because I think when you start to look at platform engineering and which will be a big thing, they're tasked with how they bring all these parts and pieces and there's a lot of toil as we like to talk about going into that. One of the things I think, again, it's another one of the big pieces and pillars that will be talked about at Nausium, I would almost say at KubeCon, which is, and CloudNativeCon is really AI. And I know you guys have some really neat special sauce that helps you help customers get to AI faster and see the value faster. Why don't you talk to Vast's shared everything architecture, enabling open source AI frameworks like RAG pipelines, vector databases and inference services to scale seamlessly across these multi-tenant environments while maintaining isolation and security, which like you said is so critical these days?

Alon Horev

>> When you look at an example chatbot, an example AI application that does beyond just using a foundational model that has trained on public information, things get complicated real fast. When you think about the simple thing of loading a model into a GPU and running inference against it, there's not a lot of state involved. There's loading the model and you can find great open source model these days, but if it's a simple as that consuming just a standard model based on public information, you might as well use any one of the inference services that you can find online. What the enterprises are doing today is they're trying to marry their assets, their data, their customer engagements, to what agentic applications can actually do, and this is where challenges start to pop up around governance, around security, around quality. How can you feel comfortable putting an agentic application motivated by an LLM in front of your customer, supporting your customer on behalf of you or selling to your customer on behalf of the organization? So we identified a few things that are really table stakes when it comes to building those kind of applications. It starts with the infrastructure that comes in with a bunch of different requirements. You need a vector database to index. Sometimes it's text, but you can today also index embedding for video and audio and anything that can really go through those AI models. That kind of use case allows an agentic application to pull data that has been previously indexed. So it could be for like an FAQ, if you have a support bot, it can use a frequently asked question guide to pull in relevant pieces of information. If you're a lawyer, you want to refer to past precedences and index a lot of legal documents. All of those use cases are what's called RAG, Retrievable Augmented Generation, is that practice of indexing existing information and being able to serve that to an application that would basically enrich the LLM with data is outside of the training set that it was trained on. And then when you review an entire platform like that, there's a process of indexing data ingestion. How do you keep the indexes up to date? How do you scale vectors to billions and billions of vectors if you have lots of information? So what happened is that when we reviewed RAG architectures, we saw a lot of moving pieces. We saw a data store, we saw an event stream that could trigger ingestion pipelines to say, "Hey, a new piece of data came in. We need to ingest it." We saw a vector database being involved. We saw a database for metadata being involved, taking care of who can see what. All of those different things, we got to be able to fold them into a single solution and architecture that gives us a lot of efficiency. It gives our customers a lot of efficiency because it's a single software stack that enables that kind of use case. And it's not that RAG builds an application, Vast provides a platform that enables anyone to build that kind of application easily. If you want to write a support bot, if you want to provide an assistant for your lawyer or any kind of that application that can be improved or accelerated through information brought into an agentic workflow, RAG provides a lot of value there.

Rob Strechay

>> No, I mean I think that's a great place from a perspective that you're getting at. Why developers, why do they choose the Vast platform with some of these services that are open source built in that they can get at through APIs versus going out and bringing all the components to them together themselves? That makes a lot of sense for what you're trying to do. So I would say you're trying to really simplify down the amount of parts, pieces outside of the platform they have to bring together.

Alon Horev

>> That's correct. That's correct.

Rob Strechay

>> And so where do you go now? I mean, again, we're going to be at KubeCon really soon. How do you see yourselves in Vast connecting more closely with the open source community over the next 12 months here?

Alon Horev

>> That's a great question. So we're going to see evolution in the world of Kubernetes coming from building higher and higher level primitives. When you think about production workloads as an example, and their requirement for disaster recovery, extreme resiliency across sites, we're just now starting to see the Kubernetes community starting to think about that and thinking about standards for how to push down the notion of keeping data in sync, keeping copies in sync across sites. So we're definitely going to see continued work there. Another aspect that we're going to continue promoting is where inference happens. So there is an explosion of inference happening over the past year where more and more users and companies are adopting AI, and this kind of push forward requires rethinking, reimagining all of the different pieces of the stack. So when we see an inference application these days, it's definitely getting deployed in Kubernetes. There is no question about it at all, but what can we do in order to kind of promote that and push that effort forward? How can we optimize for the KPIs, those kind of workloads need? So I mentioned Optime being one of them. Having an environment that can be resilient for failures, that can reroute users and workloads, even assuming big failures an entire region going down becomes critical. How can we support inference frameworks that promote KVCache offload like Dynamo or LanCache? So we have developers that are tightly integrated into those products and are contributing code on a weekly basis. All of those components of the stack, we look at them and we work with our customers and our partners, accelerating them, making them more secure, making them more resilient. We get to introduce some of the constraints and things that we hear from our customers around governance, around security, around what it really takes, bringing those workloads to production where often some of those open source products, they start off as a small group of developers that are not necessarily bringing things into production at an enterprise. They perhaps are bringing it into production for a startup or for an environment that's less regulated. So we're definitely trying to contribute there and making things, making everything that will be deployed on top of Vast carrier great and making sure that we are providing the right APIs in order to get there.

Rob Strechay

>> Alon, one of the pillars we really didn't touch upon yet is really observability. Help me understand and the people out there, how you're really involved in that portion of what's going on in cloud native.

Alon Horev

>> So observability is one of the most critical aspects of everything we do. When you look at the mission of scaling environments and workloads, when we're talking about infrastructure teams deploying massive environments, serving a lot of end users, observability is their Swiss Army knife in terms of understanding what's actually happening, what's changing in the environment? What are the users experiencing? And more so when you think about enterprise-grade AI applications, observability is about understanding how are those applications behaving? What is the customer experience? What is it the latency that they're getting? So observability has always been a fundamental part of everything we've done starting already from the data store, being able to observe every single operations knowing, "Hey, who deleted the file last night?" Or, "Why is my Slurm suddenly taking two hours instead of one hour?" Being able to answer those questions without telling someone, "Hey, go rerun this." That's been a critical objective that we had day one. The cost of the environments that are being built these days based on lots of GPUs are just so expensive. You don't get to have 10 chances of rinse and repeat and let's do a lot of iterations. You want platforms that can actually in retrospect, answer any questions that you had. And again, when you look at enterprise applications where you have AI models connected to what users are seeing and facing, observability is the key part of it because you want to upgrade a model, you want to upgrade your code, how do you get to compare the level of quality that the users are getting, the performance that they're getting? So observability is a big part of it. When you look at our data engine as an example, we described that as an environment where you can schedule functions connected to triggers. We have traces, we have logs, we have a bunch of mechanisms to visualize those so the end users users can really deploy something and continuously monitor it and know how it's performing, and then they can also upgrade it over time and being able to compare and making sure that everything is always heading in the right direction. We can also imagine agents doing that on their behalf, just testing out different models, trying to optimize for cost and efficiency and latency all at the same time. So the future is definitely going to be built on top of observability, on top of metrics, being able to drive compute automatically to the right place.

Rob Strechay

>> As we talked about. I think that really dives into where developers and platform engineering teams really are looking to go in the future. To put it mildly, I think observability and the development ecosystem as you tie in and contribute back is really great. Thanks, Alon for coming on board. I'm looking forward to catching up with you live down in Atlanta. Thanks again.

Alon Horev

>> See you there. Thank you.

Rob Strechay

>> And thank you, and I look forward to seeing you all at KubeCon CloudNativeCon North America in Atlanta on theCUBE, the leader in tech analysis and news.