AWS re:Invent 2025 | Han Xiao, Elastic

Clips
More from AWS re:Invent 2025

Han Xiao

VP of AI

Elastic

play_circle_outline Introduction to AWS re:Invent 2025 and focus on agentic AI and search systems.

play_circle_outline From VP of AI to Jina AI Founder: Han Xiao’s Vision for World-Class Search Foundation Models

play_circle_outline Enhancing Elasticsearch with Elastic Inference Service and Jina AI: A New Era for Developers

Info
Transcript

Han Xiao, Elastic

Han Xiao

VP of AI Elastic

In this interview during theCUBE's coverage of AWS re:Invent, Han Xiao, vice president of AI at Elastic and former chief executive officer of Jina AI, joins theCUBE’s Rob Strechay to unpack how Jina AI’s technology is reshaping the Elastic ecosystem. Xiao explains how Jina’s search foundation models – specifically embeddings, rerankers and small language models – serve as the "brain" behind Elastic’s orchestration framework. This integration aims to solidify Elastic as the essential computational layer for search, enabling developers to build highly accurate,... Read more

explore Keep Exploring

What are the goals of Jina AI and what is its focus? add

What year was Jina AI founded and what is their primary goal? add

What is the current status of the Elastic Inference Service and its relationship with Jina AI? add

bolt Powered by CUBE AI

Han Xiao, Elastic

search

Rob Strechay

>> Hello and welcome to theCUBE's coverage of AWS re:Invent 2025 where we're talking about all things agentic and AI. To help me unpack this a little bit more, I got Han Xiao, who is the Vice President of AI at Elastic, also Founder and former CEO of Jina AI. Welcome on board, Han.

Han Xiao

>> Thanks, Rob.

Rob Strechay

>> So before we dive into deep here, help us understand Jina AI, what your goals were before the acquisition by Elastic and what it's really about.

Han Xiao

>> Yeah, so Jina AI was founded in 2020. So our only goal is to build a world-class search model. So we call them the search foundation models. So that particularly includes the embeddings, rerankers, and the small language models that people can use to build better search system, high quality search system and high relevance search system. And so over the last five years, we have been extensively working on building the world-class model, make sure that they work on multilingual, multimodal data and that they can be used as a foundational building block when people try to build a highly scalable and highly accurate search system.

Rob Strechay

>> Yeah, I think that makes a lot of sense when you start to look at what is going on with AI and what is going on in this entire realm, it's so busy. What did acquisition really mean to Elastic? And what does it really underscore for Elastic as well?

Han Xiao

>> Yeah. So one of the observations that we look at is when people try to build a very high quality search system, they typically need a lot of building block and they also need an orchestration layer, which basically connects all the dots together. Elastic, and in particular the Elasticsearch is one of the most downloaded and usable framework for developers and business to build production-ready system, search system. So we want to be the computational layer behind this search system. So we want to use our embedding models to better represent all this document data, image data, multimodal data into vector representation so that people can leverage semantic search, vector database to query those data, get those data more accurately. And then we also have a ranker model, which basically serve as another computational block, happens after the first stage retrieval. So this basically boosts the accuracy and precision even further. So finally, when you think about today's agentic AI, so there will always be a large language model as it happens at the last stage, which basically generates answers and optimize the search result. For that, we're also providing some small language model to help the agent better understand the context and output more meaningful and more human-readable search result.

Rob Strechay

>> So Han, help me understand how bringing Jina to Elastic is really helping out in the ecosystem.

Han Xiao

>> So right now for developers and business to build a high quality search system, you need not only the orchestration layer, not only is the framework, but also you need the brain behind this framework. Who is going to provide this frame? So basically, at Jina, we have spent years of research and development on building the best search model that can be plugged into any search system. So this particular is useful in today's world when you try to handle multimodal data and multilingual data where there are a lot of things that cannot be implemented using traditional keyword-based search. So in this case, where Jina models and particular those deep neural network-based models are really helpful when they can better represent those multimodal data into a searchable format.

Rob Strechay

>> Yeah. That to me is so important to organizations as they look at having that control plane for their agentic operating system, which is these applications which are multimodal in many ways. But you also talk about context engineering. Kind of help us understand where Jina's AI team has really been pushing with that whole space as well.

Han Xiao

>> Yeah. So context engineering is a very heated topic today. So a lot of people talk about context engineering and how important they are in the search and any agentic system. So to me, context engineering is all about cherry-picking what are the best words that you need to send to the LLM before it answers, before it meets the output. So it sounds very simple, right? So you basically try to selectively copy the context and into LLM. But in reality, there are a lot of technical details. For example, how do you preserve the information while reducing the total number of tokens in the context? How do you rank different search result in the context so that LLM can recognize which snippet is more important to the answers? And how do you mask out the personal sensitive information before sending to the LLM API? So those kind of things basically fall into the context engineering category. To that, we actually have a lot of small language models, which basically can be very, very useful for those context engineering purpose. For example, one can use embedding model to compress the context by preserving the information overall. One can use reranker to rerank the number of passages to compute the optimal passages and put them on the top of the context so that LLM gives them more emphasis in the answers. So I think in general, this is very, very exciting time for small language models to be shine in this context engineering domain. So one of the observations that I see over the last one year is that, originally, those embeddings models and reranker models built for large scale purpose. So they're building the way that they can batch process billions of documents and they are used as the core retriever behind the search system. But since 2025, what I see is a lot of embeddings and reranker usage happens inside the context, inside this 1 million context lens of the large language models. And I believe these trends will continue in the next year in 2026, and we will see that. So people will look for more strong and smarter small language model that can be used to optimize the context in a very extreme way so that the large language model can give much, much better result.

Rob Strechay

>> Totally agree. I think that, again, it talks to the whole fact of specificity of a small language model and how it can really help with particular tasks as you build it out. Help us here understand the context of Jina with ELSER and the Elastic Inference Service and what it means going forward with Elastic.

Han Xiao

>> Elastic Inference Service is right now the default inference service behind all the Elasticsearch system. When developers try to use embedded surveys, reranker surveys or any small language model, they're basically underneath their coding, the ELSER or what we call the Elastic Inference Service underneath. So from now on, Jina is going to be the default model provider for ELSER. So developers can have very handy experience and access all the top models from Jina AI, but also ELSER will stay open. So basically, not only Jina AI model, but also other very strong embeddings, reranker models will also appear on ELSER. What we want to provide is the best developer experience for all the business to make sure that they have the immediate access every time there's a new embedding model, new reranker model, new small language model that can be used to build the search system.

Rob Strechay

>> That makes total sense and I think it fits in very well with what everybody is trying to do at re:Invent this year as they look towards building out these agentic workflows and how they are able to bring different types of data and different types of actions together. And again, I think that makes a lot of sense. So hey, Han, thank you for coming on board. This has been great. Really appreciate it.

Han Xiao

>> Thank you, Rob. Yeah.

Rob Strechay

>> And thank you for watching our coverage of AWS re:Invent 2025 on theCUBE, the leader in analysis and news. Stay tuned for more.