AWS Mid-Year Leadership Summit 2025 | Mai-Lan Tomsen Bukovec, AWS

Clips
News
More from AWS Mid-Year Leadership Summit 2025

Mai-Lan Tomsen Bukovec

VP, Technology, Data & Analytics

AWS

‘The bottom turtle’: Amazon’s storage workhorse is getting an AI upgrade

When Mai-Lan Tomsen Bukovec joined Amazon Web Services more than a decade ago, the company’s Simple Storage Service, better known as S3, was a humble online bucket for photos, logs and the occasional startup backup. Today it houses exabytes of corporate treasure, and Bukovec (pictured), now vice president for data and analytics, is busy teaching that bucket to think.“Every AI application is a data application,” she tells me inside a glass conference room at AWS’s Seattle Re:Invent headquarters. “Your bottom turtle is always going to be the data — and the world’s data is in S3

AWS S3 steps beyond storage to power AI’s next wave

Amazon Web Services Inc.’s S3, Simple Storage Service, has evolved from simply being a durable and scalable storage solution to a fundamental layer driving innovation across AI, analytics and data infrastructure.TheCUBE’s John Furrier talks with AWS’ Mai-Lan Tomsen Bukovec about S3 advancements.Following the rise of generative AI and the convergence of structured, unstructured and semi-structured data, S3 now powers a modern AI-first infrastructure that’s redefining the enterprise technology stack

play_circle_outline Introduction of the AWS Halftime Report with insights from experts and executives.

play_circle_outline Importance of data in AI applications and the rise of AI data agents.

play_circle_outline Mai-Lan on S3's Evolution and the Future of Effortless Analytics Across Diverse Data with AWS Integration

play_circle_outline Rise of S3 Tables and Iceberg for efficient interaction with tabular data.

play_circle_outline Metadata as the future of data architecture and its advantages in AI processing.

Info
Transcript

Mai-Lan Tomsen Bukovec, AWS

Mai-Lan Tomsen Bukovec

VP, Technology, Data & Analytics AWS

In this AWS Mid-Year Leadership Summit segment, Mai-Lan Tomsen Bukovec, vice president of technology, data and analytics at AWS, joins theCUBE’s John Furrier in Seattle for a Halftime Report on where S3, S3 Tables and metadata services are steering the cloud giant’s data strategy. Tomsen Bukovec explains why “every AI application is now a data application” and how Iceberg-powered S3 Tables are giving customers a zero-migration path to agent-ready data lakes. She also unpacks her expanded remit, which unites Redshift, Athena, file services and streaming to mak... Read more

explore Keep Exploring

What is happening during the AWS Halftime Report in Seattle, and who is Mai-Lan? add

What is the significance of data in the context of modern businesses and AI applications, and how does it relate to AWS and its innovations? add

What is the new role and how does it relate to data management and analytics services? add

What are the key considerations for organizations looking to leverage new developments in AI while maintaining their existing data architecture? add

What are the advantages of storing Parquet data in S3 Tables, and how does S3 metadata contribute to the management of large datasets? add

bolt Powered by CUBE AI

Mai-Lan Tomsen Bukovec, AWS

search

>> Hello, I'm John Furrier. We are here in Seattle for the AWS Halftime Report with theCUBE here at the headquarters of AWS. We're getting all the action from the experts, the executives, the leaders, finding out what's gone on in the first half of the year, and so much has happened. But then six months we'll have reinvented another tsunami of announcements. Mai-Lan is here, she's the VP of technology, data and analytics, and well known for making S3 the beast it is now, which has been powering all the workloads. Mai-Lan, great to see you. Thanks for coming back on theCUBE for our halftime report.

Mai-Lan Tomsen Bukovec

>> Great to see you, John.

>> So S3, it's had its birthday, it's been celebrated, continuing to thrive away. We talked about S3 tables and all the metadata that comes associated with it. Iceberg is all the rage. You're starting to see that first party data. S3 continues to be so valuable even in the GenAI development, how that's advancing and Bedrock and SageMaker, S3 is still cranking away with the innovation to enable up, you're enabling all that. So much has happened.

Mai-Lan Tomsen Bukovec

>> It is so exciting to work in the space of data now. And John, you've heard me say every modern business is a data business. Well, today, every AI application is a data application. These incredibly capable models like the latest that came out with the Claude 4 models, they can do so much and they can set up so much infrastructure for these amazing agentic workflows. But at the end of the day, the thing that really customizes and personalizes and gives a special sauce of a business to an AI application is their own data and the world's data is in S3.

>> You've been so close to the storage and the innovation at AWS from the beginning. Now you have a new role. Explain your new role for the folks watching, because I think this really kind of connects the dots because as data and storage doesn't become just storage, it's a data platform... And that's a word some people are a little over abusing in the industry, but the fact is the data is stored, it's being used, it's in flight, it's in motion, it's being inferred and trained. It's super critical. Talk about the new role and why that all ties together.

Mai-Lan Tomsen Bukovec

>> Well, if you think about the world of data, when people think about data, they think about structured data, they think about unstructured data, and they think about something called semi-structured data. And when we thought about how do we make things easier for our customers to use, our whole goal is to make it effortless to do analytics on any type of data. And so these organizations, like these services like Redshift and Athena, they're now with, they now sit in the same organization as Amazon S3 and our file services as well as our streaming and messaging services. And our whole objective here is to make that path to plumb that path from the query, the analytics query that drives so many business, all the way down to storage, the most effortless, the most prize performant as possible.

>> The thing that's happening right now that's interesting is if you look at the GenAI, you say, "Okay, the models are getting more efficient." You see the tooling and step function level growth with the data models. So the tooling that's becoming available is just creating so much innovation. So you got tooling, you got the GenAI happening. And the role of the data movement is becoming key. So as you look at the analytics world, they've been around for a while, they've had dashboards, and then you got the platform side from the Kubernetes side, DevSecOps, they're dealing with the platform, they're kind of coming together. You've got the two worlds, analytics and platform, data thinkers, data engineering coming together, the data science, the analytics, you got the engineer... How do you look at that intersection because Bedrock and SageMaker sit in the middle there and on top of you got Q, the action's in there, right? This is where all the action is at Snowflake, Summit, Databricks, DataPlus AI, all the conversations, MCP is connecting the model. So you see a lot of the model efficiency driving new patterns of data. Whether it's how the HBM memory sits next to the chips or whether the data is stored, so how do you look at that innovation opportunity for customers? Because okay, they're data platforms, models are coming on faster, new things are happening. Does that change the equation of storage and requirements? What are some of the things you see there?

Mai-Lan Tomsen Bukovec

>> Well, one thing that hasn't changed, John, is that if you think about turtles all the way down, your bottom turtle is always going to be the data. And some of the interesting evolutions that we're seeing in these AI models is the capabilities of these agentic infrastructure, these AI agents and the workflows that you can do. And so I believe that the next generation of data workers are really going to be AI agents. If you think about that, you think about the bottom turtle being the data, and you think about, "Okay, I have these analytics engines. I have AI data, AI agents that are going to be operating on that data." You can see this start to happen right now. I just got back from Singapore. John, that's why I have a tan right now.

>> Looking good.

Mai-Lan Tomsen Bukovec

>> And one of the customers that I was talking to there was a customer called StarHub. And StarHub is a super interesting business where it sits at the intersection between insurance and technology and they're building, they have built a data agent that operates in production today. And what it's doing is it's accessing data for its insurance business from both structured data sources, databases and unstructured in S3. And so everything that you can imagine that a human developer in application can do today can be done by an AI agent, especially with these super capable models like the Sonnet 4, Opus 4 from Claude. And if you look at the evolution, what remains the same is that the data is always going to be at the heart of every type of customization, personalization, application that you're going to build for your business. But now you have a different actor. And the different actor is the data AI agent.

>> That is a great concept. One of the things that you think about how APIs would connect, it was stateless. Data has state to it. And so you say, "Okay, agents can do more than just connecting APIs." We're starting to see this come up in the conversations around agents talking to other agents. We saw Swami's keynote at re:Invent. He was teasing that out. Okay, you got data agents, what does it look like? How do they talk to each other? Is there delegation? Is there trust? Is there state involved? How do you see that playing out? Because if this happens, then you're going to have a bunch of data agents running around doing all the, I won't say maintenance, but connecting and reasoning around how it could be used for GenAI. So you're enabling above you. So what's your vision for data agents? How would you share that?

Mai-Lan Tomsen Bukovec

>> Well, I think the most important thing when you think about any operator on your data, whether it's a human or it's AI, is you have to make sure that you always have the best data perimeter in place. If you talk to a CISO, they will talk to you at length about the importance of the data perimeter. That is something that in AWS we have focused on from day one. And one of the reasons why we've had so much adoption from enterprises in the cloud is because the CISOs have often led the charge by saying, "I need the cloud platform that has the best tools to establish and maintain a data perimeter." And whether it's the foundational pieces of IAM or it's really innovative services like our access analyzer, which is based on automated reasoning, which is a special form of computer science and math, we have always innovated in this space. And those tenants hold true whether it is a human or an AI. It's how do you make sure that you've established a data perimeter and then the AI agent coming and working within your data perimeter has to follow the rules of the road.

>> I love the data agent idea. Things pop into my head, I think of orchestration. I can see a data agent managing requests, routing data, moving data when appropriate or not moving data or figuring things out. If that happens and is happening, what does that do to the application? Because now the business logic is also in the data. So you got data as an infrastructure thing, S3 and all the services, but when you start to get into the analytics, the intelligent apps that will be enabled, what's your vision on how the data agent connects to the intelligent apps?

Mai-Lan Tomsen Bukovec

>> Well, a lot of what we're seeing with the rise of Iceberg, which you talked about, is it gives a common way to interact with tabular data. It's one of the reasons why we introduced S3 Tables back in late 2024 is because we saw so many of our data lakes moving towards Iceberg as a common way to interact with tabular data stored in S3. And so the capabilities of Iceberg make it easy for a human user to interact with the data, but at the same holds true for the data AI agent. And so I think you're going to see as Iceberg is adopted across the data lakes of today and tomorrow, I think you're going to see leverage. And this is actually one of the things I hear all the time, John, from Chief Data Officers or CTOs or CIOs, which is, "I have so much investment, so much investment that I put into my modern data architecture. How do I take advantage of these new developments in AI without having to redo my schema, without having to do a data migration? How do I play it forward and how do I evolve my data architecture so that I can build these agentic workflows, but I can also have my AI agents respecting things that are in place today with the data perimeter?" And that is one of the key things that we've seen with the fastest companies to adopt AI are the ones who are able to evolve already starting on that modern cloud infrastructure.

>> And the requirements to do that is what? S3 Tables interoperating with other data lakes? What's that customer's solution there? It's like, "Okay, I want to maintain what I invested." But is it connection-oriented, what's the solution?

Mai-Lan Tomsen Bukovec

>> Yeah, John, honestly, it just starts with using AWS because if you think about Bedrock, if you think about all the capabilities that we have in place, they all integrate with S3 today. I'll give you an example, QuickSight, Q for QuickSight, super popular.

>> Huge product. Love that product.

Mai-Lan Tomsen Bukovec

>> Just ask questions of your data using natural language. We have integration with QuickSight into S3 Tables. So as long as you're starting with a foundation of AWS, which is the basis of the modern cloud platforms, you already are a huge step towards having that infrastructure that you want.

>> It's really interesting to see how S3 has just become such an enabler. And what's fascinating is that some key things have to happen, open, good performance, but you guys built the hooks in. And SageMaker has a role, Bedrock has a role. What's the best configuration that you hear from customers that are driving S3 and S3 Tables to get the best analytics, get the best generative, because I want to bring in the innovation from generative AI because that's where the breakthroughs are, but I have analytics already kind of in place. So what are some of the best practices, would you say, from a customers that are implementing the end-to-end S3 Tables with the breakthroughs of GenAI and the best of the analytics?

Mai-Lan Tomsen Bukovec

>> Well, I would say if you have Parquet data in S3, you should store it in an S3 Table. And the reason for that is that you get the built-in capabilities of Iceberg, which then flow through to all the other zero-ETL, etc type of capabilities in our AWS services. I will also say, John, that I think the next generation of data lakes are actually going to be metadata. Because as the world accumulates more and more and more data, which is coming in from sensors, it's coming in from ETL, it's coming in from applications. Customers need to find the data that they're going to use for AI, they're going to use for their next generation of data products. And so that's why we launched, as generally available earlier this year, this idea of S3 metadata. Now, John, we could have built S3 metadata to have an S3 API because you know we try very hard to have a very simple API, but we didn't. We built it as an S3 Table because we knew that customers were going to want to do SQL type queries against their metadata. And so if you think about that, you think about an item that you have in storage, and you think about annotating metadata onto it, whether it's custom or system metadata, and then you think about querying the metadata to find the data you need for AI or for analytics or for whatever you need, that I think is going to be the pattern of the future.

>> And that benefit is that speed or just more intelligence? Because metadata has the advantage because you don't have to go through all the querying and retrieval.

Mai-Lan Tomsen Bukovec

>> You don't have to pull your data out of storage.

>> It's just you get it when you need it? Intelligence and speed.

Mai-Lan Tomsen Bukovec

>> Exactly. And if there's one thing that we found from a cost perspective, it's data at rest is always going to be your most cost-effective solution for storage, yet you do need it. And the beauty of having lots and lots of storage is that you can use it for AI, you can use it for different things, but you kind of want to know what's in your storage before you use it. And the missing link there is metadata. So if you can layer all kinds of information ranging from governance and data classification, but also usage of your data in your metadata, then you can use those SQL queries to just query, "What is this data doing? Who has used it last? How do I think about it?" And then you can decide at that point, "Do I want to use it for AI or for ETL or whatever?" And that is why I think that metadata is going to be the most queried part of data in the future.

>> And great for agents as well. Agents will feed on the metadata.

Mai-Lan Tomsen Bukovec

>> Exactly.

>> Or co-locate the metadata on their-

Mai-Lan Tomsen Bukovec

>> That's exactly right. So you imagine that AI agent that does something with the data, the AI agent can annotate into the metadata. What has been done with the data and what AI agent took action on that metadata?

>> Yeah, I think, again, what's so exciting is that the annotation concept, having the S3 Tables, which by the way, super exciting that you guys announced that re-invent last year. I think we did two segments on theCUBE, but that was a game changer, that was clear. If you look at DeepSeek, what they did, they created all this innovation around doing reinforced learning without the human in the loop there. So in a way, the metadata is going to be ripe for generative AI around reasoning and inference. Has that crossed the table yet for you? You have announcements coming and what do you see the GenAI fitting in here? Because agents are going to need to hook into this, orchestrate the data because data availability is the key for the best data to reason against. So if I'm a machine learning or GenAI algorithm or model, I'm only as good as the best data I can get.

Mai-Lan Tomsen Bukovec

>> Or the best metadata you can get?

>> Or the best metadata in this case.

Mai-Lan Tomsen Bukovec

>> So if I think about how S3 approaches metadata, we apply the same traits of S3 to metadata as we do to data. Which means it's strongly consistent. It has the cost point of S3, which means you can store ever-increasing amounts. It's got the durability, the availability, the reliability. Our goal is that the metadata system is a system in itself and the ability for actors, human and AI to interact with it in a very reliable way is going to be crucial for being able to use the metadata to use the data in a cost-effective way at scale. So it really is, like I said, a missing link for all of these different interactions. And we have-

>> It's almost a meme, metadata about metadata about metadata and there's the data. A little over the top, but having data about data is a super valuable, I can't underestimate that. And I'm just trying to connect the dots because I can envision the efficiencies and the intelligence that comes out of knowing just enough about the data to take action and have authority on it.

Mai-Lan Tomsen Bukovec

>> Yeah, one way to think about it is all the context about your data. And today, when you go into a business, the context of what the data is, where it is and how you use it, it's in humans. So you imagine taking all that context and you put it into metadata and it really opens up this whole universe of how to make that data more useful or the opposite to be true, which is it's pretty clear from the metadata that it's not useful, you put it into archive and you save the cost of the storage.

>> What's really great about AWS, I got to say, and not to give a little commercial for Amazon, but a thing like DeepSeek could come in, new innovation could come in, but you don't have to change anything. All the other stuff stays as good as it is and gets better and S3 is a great example. S3 Tables has been a huge deal. How has business been for you? Obviously this past six months, it feels like a year, so much has happened. A lot of announcements, new models, you mentioned a bunch of them. S3 Tables is post-reinvent. What's been the run on the business side of people realizing this and jumping right in? Give us some taste of the feedback on the response? Are they moving fast to S3 Tables? Is it making an impact? Give us an update.

Mai-Lan Tomsen Bukovec

>> Well, there's been a tremendous amount of excitement, John, and I think part of it is this idea of basing S3 Tables on Iceberg. Because Iceberg is this incredibly widely adopted OTF and many companies have based their whole standard for how they want to interact with tabular data on Iceberg. Combine that with the fact that we have exabytes of Parquet data already, tabular data just sitting there in S3 and that union, that combination of being able to go with an OTF standard like Iceberg and use it with my data has been incredibly exciting. I think people are also excited about using metadata and they're starting to use it, but they're going through the whole workflows now of how could I take advantage of metadata in ways that I wasn't able to do before?

>> Like what?

Mai-Lan Tomsen Bukovec

>> Well, okay, so we have frontier thinkers like Netflix, and Netflix as you know, has for years and years built platforms on top of AWS and open source. They have a deep commitment to open source. And so they have this whole custom catalog system that really runs a lot of their business in Netflix. And that is the thing that a lot of our customers are trying to evaluate, which is, "If I have in my bottom turtle this idea of unlimited growth in metadata, how would I change my governance? How would I change my creation of a data product? How do I change my workflows in my AI data process?"

>> You need to measure everything.

Mai-Lan Tomsen Bukovec

>> That's right. And so you used the word game changer, it is a bit of a game changer, this idea of metadata. And so our fastest adoption of metadata has been people using the system metadata and putting tags and using tags in their metadata. But we have so many conversations with customers who are saying all these different ways they want to thread it through any use of data for analytics, AI, governance, what have you. It is really going to be part of the bottom turtle of S3.

>> Yeah. And I think the whole Parquet thing is a great example because that was a hard problem to solve, that interoperability between table formats and then Iceberg kind of takes that away. So the data is now interoperating.

Mai-Lan Tomsen Bukovec

>> Yeah, and S3 is a very fun place to work because you listen to your customers, John, and the story of data has been written by S3 customers, really. And you look at where S3 is now versus more than 10 years ago and I won't tell you how long I've worked in S3 because then I'm going to feel old, John.

>> I was a customer in 2007.

Mai-Lan Tomsen Bukovec

>> Okay, all right. High five.

>> Early customer.

Mai-Lan Tomsen Bukovec

>> But if I think about the evolution of S3, we have so much semi-structured data now, and that is because the convergence we're seeing is to put all the data, structured, unstructured, semi-structured, into S3 because of the economics, because of the durability and because of the availability. That is really the trend that we're seeing.

>> And coding down to that level gets you some performance advantages, having that knowledge and then that interoperability. And that's the trend we're seeing at the chip level. People building software around memory management, around the chips. So storage, again, you also take higher level services like Q for business is another good one, and QuickSight as well. I have to ask you, going forward at the re:Invent, what is your goal now? Because the first six months has been a world tour, been a whirlwind, the announcement's been phenomenal. You got re:Invent in six months, what's the focus for you? Is it data agents? If you had to go look at the key things you're going to make happen this year, what are they?

Mai-Lan Tomsen Bukovec

>> Well, I can't spill the beans, John, as you know.

>> You can tease us a little bit.

Mai-Lan Tomsen Bukovec

>> But what I can tell you is that if you look at the big changes like our S3 Tables, like metadata, a lot of what we're doing is again, our customers are writing the story of data. And so we look and we say, "How can we build so that you can do that more easily using S3 traits?" And I believe that this world of, and so much of S3 storage is driven by application data, primary application data. But this world of storage is a place where you put your application data, storage is a place where you put your analytics data and storage is the place where your AI applications run from, that's all going to start coming together more. And we have some really interesting things coming up in the next six months that make S3 a native place for AI data.

>> You're converging all that data from the different applications that were once just applications, into S3, but not taking away anything, you're only enabling more?

Mai-Lan Tomsen Bukovec

>> That's right. That's right. And the reason why is we're always inspired by what our customers do, and our goal is to provide these capabilities so our customers can just invent the latest experiences. And they do and we are inspired by them.

>> You've seen a lot on S3, you've seen everything. Mai-Lan, great to have you on theCUBE. Thanks for everything. Final word, put a plug in for your group. Shout out for the team. What's the hottest thing going on? What's the cool stuff?

Mai-Lan Tomsen Bukovec

>> Well, I got to say the cool stuff for us has always been the foundational things, John. We're talking about all these new features, but I have to say that the engineers that come into work every day working on your data, your analytics, your streaming, your messaging, they're a hundred percent committed to the integrity of your bytes, the integrity of your query. And so my shout-out is to the whole team that works on all the great services and data and analytics, thanks for doing what you do, which is making sure your data is there and you can get it back out whenever you need it.

>> It's not a storage pouch, it's a data platform. Thanks for coming on. Okay, we're here for the Halftime Report at AWS's headquarters. I'm John Furrier, thanks for watching.