In this interview from the AWS Pi Day 20th Year Celebration, Andy Warfield, vice president and distinguished engineer at Amazon Web Services, joins theCUBE's Rob Strechay to discuss how Amazon S3 evolved from a simple cloud storage service into foundational infrastructure for the internet and the AI era. Warfield traces S3's origin back to Amazon's internal need to eliminate islands of storage across teams building web-facing applications. He explains how the regional, multi-availability-zone architecture enabled a level of durability and availability that traditional storage systems could never achieve. The shift from eventual consistency to strong object-level consistency marked a turning point, simplifying application development and unlocking S3 as a primary data platform rather than a cold archival tier.
The conversation also explores how S3 is expanding beyond objects with the introduction of new data primitives — S3 Tables for structured Iceberg-based data and S3 Vectors for AI workloads — while preserving the open, multi-engine flexibility that made the original object API so powerful. Warfield details how customers such as Pinterest leveraged S3 as a shared data substrate, enabling teams to independently build on common datasets with whatever tools best fit their needs. He highlights the growing role of agentic and citizen-developer tooling, noting that developers can now build meaningful applications in a single afternoon by combining S3 primitives with emerging coding tools. Cost reduction remains central to the strategy, with storage prices down 84% since 2006 and intelligent tiering alone saving customers over $6 billion. From powering early genomics breakthroughs to underpinning the next generation of AI-native applications, Warfield provides a roadmap for how S3 will continue moving closer to application builders through higher-performance domains and deeper computational integration.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
AWS Pi Day 20th Year Celebration. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for AWS Pi Day 2026
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for AWS Pi Day 2026.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
AWS Pi Day 20th Year Celebration. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to AWS Pi Day 20th Year Celebration
Please sign in with LinkedIn to continue to AWS Pi Day 20th Year Celebration. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Andy Warfield, AWS
This 20th Amazon Web Services Pi Day celebration features Andy Warfield of Amazon Web Services, vice president and distinguished engineer. The conversation reviews the evolution of Amazon Simple Storage Service, abbreviated S3, from object storage to a foundational data substrate that supports analytics, security lakes and artificial intelligence, abbreviated AI workloads.
Rob Strechay of theCUBE Research hosts the discussion and Dave Vellante of SiliconANGLE cohosts the AnalystANGLE segment. Warfield describes S3's cloud-native architecture, engineered durability and innovations such as strong consistency, event notifications, Storage Lens and intelligent-tiering. They explain how S3 becomes the underlying substrate for data lakes, analytics, security lakes and emerging AI pipelines.
Key takeaways emphasize S3's shift from a simple object store to a critical data layer that supports analytics, security lakes, cost-efficient checkpoints and AI pipelines. Warfield states that continued work on tables, vectors and open table formats boosts throughput and performance. Strechay and Vellante note that abstraction, engineered durability and strong consistency enable developers and enterprises to scale cloud-native applications.
This episode highlights S3 features such as Storage Lens and intelligent-tiering and explores the role of S3 in modern data platforms, including data lakes, analytics and AI-driven workloads. The discussion addresses operational and architectural factors to consider for storage durability and performance and the implications for security and cost optimization.
In this interview from the AWS Pi Day 20th Year Celebration, Andy Warfield, vice president and distinguished engineer at Amazon Web Services, joins theCUBE's Rob Strechay to discuss how Amazon S3 evolved from a simple cloud storage service into foundational infrastructure for the internet and the AI era. Warfield traces S3's origin back to Amazon's internal need to eliminate islands of storage across teams building web-facing applications. He explains how the regional, multi-availability-zone architecture enabled a level of durability and availability that tr...Read more
exploreKeep Exploring
What motivated the creation of Amazon S3, and what problem was it intended to solve?add
How did Amazon S3's high durability (the "11 nines") and its regional architecture built on three availability zones influence enterprise disaster recovery, business continuity, and storage strategies?add
How did S3 evolve to address consistency issues and other "sharp edges" customers and builders encountered?add
Has object storage (for example, S3) moved beyond archival/cold storage to become primary, performance-sensitive storage used for analytics, streaming, and other live application workloads?add
What has the team been focusing on recently, and what are the main directions or feature priorities you expect will impact customers going forward?add
Why did you add table primitives on top of S3 (e.g., support for Parquet/Iceberg) and design them to be open and engine‑agnostic rather than a tightly coupled database?add
>> Hello and welcome to this CUBE Conversation, as we dive deep into this very special AWS Pi Day celebration. For background, for those who don't understand this and the significance, March 14th, 2006 was the original Pi Day, when AWS launched what it called Storage for the Internet. 20 years later, S3 holds over 500 trillion objects, serves more than quadrillion requests per year, powers over a million data lakes and underpins AI workloads all over the place. Today, we're unpacking how a service started as cloud storage has become foundational infrastructure for the internet and now for the AI era. To help me unpack this all, I am so excited to be joined by Andy Warfield, who's the VP and distinguished engineer from AWS. Welcome onboard, Andy.
Andy Warfield
>> Thanks for having me, Rob.
Rob Strechay
>> I mean, I always love talking to you. I mean, again, we've gotten to talk for a number of years and you just bring a lot of involvement and enlightenment onto why things have happened with AWS. So, why don't we start with the beginning of S3 and Pi Day? And why did AWS decide to build S3 back in 2006? Who was the customer AWS was working backwards from? And what internal problem at Amazon really dictated to go down this path?
Andy Warfield
>> It's a great question. I should be clear upfront that I wasn't here at that time that S3 launched, and so I'm drawing on the experience of the team on this one a little bit. I was still, at that point, I think finishing up grad school. It's crazy that it's 20 years ago, working on Zen, which Amazon ended up using on the cloud side as well. And then, moving into a role where I was an S3 customer, building on top of very early S3 for some projects that we worked on. And so, talking to the team and looking back on a lot of the early S3 docs and the original PR , like you say, there was an intention that it'd be storage for the internet. I think the thing that I've learned over the years talking to the team about the origin story is that internally Amazon had a lot of teams that were building internet-facing applications and they were running into a problem that I think a lot of enterprises, to a certain degree even today, run into around teams having to stand up their own enterprise storage. And so, they were finding that they were creating a lot of islands of storage. They were finding that there was a lot of repeated undifferentiated management of that stuff. And that, at the end of the day, the storage that they were building on top of wasn't the best fit for the types of web-facing applications that, at the time, Amazon was really pioneering with. And so, S3 was born out of both a really significant internal need around having an elastic, scalable web-facing storage service, but also, the observation that what was good for Amazon was probably something that was really valuable for a lot of external builders.
Rob Strechay
>> Yeah. And I think also one of the big factors... I mean, I remember back then I was building out, I was still actually on the IT side myself back then, and building out data centers and we were looking at S3. And one of the things that I remarked was, again, the 11 nines of reliability and durability. And that was one of the main factors, which as a person who owned disaster recovery and understanding that and business continuity for a financial services company in North America, that was a big thing that we were looking at when we started to look and broaden our expectations of what storage and that storage layer could be, but that was a great thing.
Andy Warfield
>> For sure. And the thing that you're saying is really interesting, in that I was in the same space for a bunch of my career. Working on the enterprise side and thinking about disaster recovery strategies, you were always figuring out where the co-lo for your second site would be. And whether it was like ISDNs or T1s or what kind of network you needed to provision to get onto that. And talking about SLOs and SLAs and RPOs and stuff for having that redundant copy of your data. And a really fascinating thing about the way that S3 quickly evolved was into building as a regional service, as a storage system that was built on top of the AWS three availability zone primitive, and actually rethinking storage from the perspective of having three physically-distinct sites with separate infrastructure, physical separation and power and all of those things, and actually being able to achieve a level of durability and availability that other storage systems, up until that time, had really never been able to even design for.
Rob Strechay
>> Yeah. It would seem that, again, it was when I spent a little time at AWS and everything, and it was really, like you said, the underpinning of a lot of things, plus having been a partner of AWS's and built on top of S3 with SaaS delivered products and all of the things, I mean, now you've been there for a bit now, and again, the whole idea of any developer, any amount of data, any time, that seems to be still that direction that AWS has gone in from the initial intent to where it is now. What really, again, has helped drive that forward? Because I don't know that the original folks thought S3 would become what it is today, to put it mildly, which is amazing platform for that matter.
Andy Warfield
>> I don't know. I think that it would be a fun thing, and maybe we should do it sometime, to try and walk through the eras of learning and the team's mindset over S3, because I think there have definitely been some discrete transitions on how the teams looked at stuff. But I think you're totally right, that the observation was very early on, that presenting up object APIs and building for really the experience of unlimited scale and unlimited throughput and just removing limits as a source of things that add friction and slow you down as an enterprise or as a developer really defined certainly the first half of S3's existence. Singularly, that thing of learning as a team how to operate that way and how to respond to events on anything that happened. I think it's often said in storage all up that to build any storage system, like a file system or a SAN or block store or anything, it's about 10 years of maturity to just get a system to a point that it's really, really trustworthy operationally. And with this one, you've got a thing that's not being shipped as software or as something that's run by the customer. It's a service, but the team really needed to build and establish that maturity. And I think looking back that that first 10 years, for sure, was around learning how to scale, learning how to build new regions, learning how to have metrics and alerting and stuff to really stay in a great posture for scale and performance and operations. I've been on the S3 team for almost nine years now, so really the second half of S3. And I think that it was an interesting moment that from the time that I joined, we had an incredible foundation to talk about and the team still focuses loosely on those fundamentals. And so, in my early days on the team, I did a lot of work on performance. We really started to grow our story and performance to talk about taking advantage of the width of the S3 front-end and being able to build really, really high throughput-oriented storage applications by working across the entire web server fleet. We really, over that same period of time, angled into other sharp edges that customers and builders had on top of S3. So, a really, really significant one is consistency for the first 10 or so years, S3 was eventually consistent, which was a design decision that let us move fast as a team at scale, but left customers and builders having to make decisions to deal with in their applications. And when we did that launch of object-level consistency for S3, suddenly the system became much easier to build on top of. And there's this benchmark that comes up in a lot of our conversations where customers tell us that, "We did a thing," and it allowed them to make their software simpler and that's like such a rewarding thing for us. And I think it's a thing that the team measures ourselves against. And so, I think you have this long period of just being ruthlessly focused on refining that original product without really even changing its boundaries too much.
Rob Strechay
>> Yeah. I mean, I totally agree. I think that even that one, because having been involved over that era when it went from eventually consistent to having consistency, I think that was key to a lot because that allowed people to not have to do things in, like you said, in their own software. And I think the storage services that are on S3 have grown over the years. I mean, being able to be, like you said, it's already Multi-AZ, but how do you then bring it region-to-region? And how do you do other things with global clustering and things of that nature that also, to your point, unlocked a lot of different activities for customers that they could now build their software in a different way? And I know we're going that direction because it's much more than a storage system at this point. When did it become apparent for you and for the team that it was just so much more than just object storage?
Andy Warfield
>> I think that's a great question. I think that we've had signal and a lot of evidence of customers building much more. So, first of all, to the thing that you're saying, the just object storage thing, to me, at the time that I first started working with object storage and then moved on to work actually on the S3 team, object storage was very much an archival primitive. It was a thing that you could actually turn on in your backup system or as a target to stick cold data. There were object front-ends for tape libraries and stuff. And I think a thing that was remarkable to us was... One example was the move of, at the time, a lot of Apache Hadoop through the S3A connector to run on top of S3 directly, instead of on top of HDFS. So, you had these early companies doing data lake style analytics and they were building these big JBON disk racks inside their enterprise data centers and running HDFS on it to be able to do large-scale log analytics or warehousing type things. And with the launch of the S3A connector, suddenly they could run those same Hadoop jobs on top of EMR or on EC2 instances in Amazon and just use S3 as a backend for it.
And I think that was one of the cases, it's probably not the first one, where we started to see the data in S3 started to become effectively primary data, and effectively, first-class application data. And through those early years, certainly a lot of my first years here, it was like the realization that suddenly performance was a factor because people had latency needs off of the data. You were seeing examples of startups and other companies doing media streaming directly through S3, like sending live feeds into S3, running transcode, and with a timeliness requirement, getting that out to broadcast. And suddenly, there was this moment of like, "Holy smokes, the system's being used for a lot more than just cold storage."
Rob Strechay
>> Yeah. I mean, I think again, just knowing some of the customers and some of their use cases from talking to them over the years, I mean, you have the streaming, you have the big streamers, you have others that are doing mobile-required applications that has distribution, you have SaaS providers, some of the biggest SaaS providers in the world that are on top of it. And not to mention things like genomics. I remember during COVID and everything like that with all of the data that was coming in to go and find and process for these new genomics types of applications that then led to the mRNA stuff and things of that nature. It's just incredible, the different use cases that it's spawned off in that way.
Andy Warfield
>> Yeah. And it's interesting, I mean, the early days obviously were very much around the SaaS case thing you mentioned, right? And S3 as an origin for CDNs, like for CloudFront in particular. An interesting thing about the genomics space that you talk about is the genomics folks, it's a bunch of researchers in all sorts of domains. It's researchers in universities, it's researchers in healthcare companies and drug manufacturing, drug research and so on. And they tend to, especially in the earlier days of genomics, collaborate through shared tooling and shared libraries. I remember early on the launch of what the genomics folks called the GATK4, which was like the most popular large-scale analytics framework for genomics. And getting hooks into that for S3 suddenly created this thing where they had shared software that they could all build and work on top of, but also, shared infrastructure abstractions that they could work with. And I remember talking to scientists who were so excited about the fact that these shared tooling and the opportunity to have scale and elasticity for storage and compute meant that it was a thing that they didn't have to think about in terms of doing genomics research. And so, it's those stories that I think the team finds so rewarding that by solving infrastructure problems, we can get a distracting thing out of the way and suddenly science can move faster. And it's amazing to see those examples. We see those examples even today with some of the users on top of things like S3 vectors.
Rob Strechay
>> Yeah. I mean, there's just been so many launches over the last 20 years. I mean, the multi-part upload, S3 Lambda event notifications delivering 300 billion plus events per day. Like we talked about, the strong consistency. CloudFront integration, again, which has been one of the big use cases all along. How do you see these and others, as we go forward, really have fundamentally changed how developers build their cloud-native applications. What do you see as some of the ones that have really been key for you?
Andy Warfield
>> I mean, the stuff that we talked about around removing sharp edges has definitely been so hard to appreciate in terms of its impact. So, stuff like consistency, conditional operations, and a lot of the other thankless internal changes that just mean that we scale more and give more consistent performance and things like that. I think that that's probably, I don't know, somewhere in the neighborhood of 80% of what the team does is just core under the cover's focus on improving and scaling the system. I think to the directions that we're going next and the way that we really see more feature-level things impacting our customers, I would say that there's two themes, Rob, right now that are really, really, I think, capturing our imagination and really seem to be resonating with customers. The first one is around adding new data types, new primitives to S3. And so, that's the launch of S3 Tables and S3 Vectors over the past few years. And so, we can totally talk more about that and some of the experiences there. I'd say the other one that is overlapping with those two, but is really, interestingly, picking up speed is the role that S3 and those other data primitives play as an underpinning from building applications from agentic tooling and I don't know what term you want to use, citizen developer, opening up development to a broader audience and having the right building blocks to build on top of that I think is a space where we're really, really excited.
Rob Strechay
>> Yeah. I mean, why don't we stay there? Because I think to your point, I mean, you mentioned two of my favorite with the vectors and the table buckets. Well, I mean, to me, that really opened up yet... Like we were talking about, it's more than just object storage. I mean, there are definitely others who've built databases and/or data lakes on top of S3, but the table buckets and Iceberg and looking and leaning into open standards, I mean, to me, that was great. And actually, I think it ties into what you were talking about with the agentic portion of it as well, because I think they go hand in glove with each other as well. What are you seeing as customers are pushing you into that next era, the AI era as we go that way?
Andy Warfield
>> I think one thing that I wish we could say we did it super, super intentionally, but it was a thing that customers, I think, taught us and it's a thing that I think the team really internalizes as a central thing to how we think about S3 is that the original object APIs were incredibly accessible. There were a thing that you could build libraries around and they left your data in a position that you could take data that was intended for one thing, and then you could go and build other things on top of that. And so, a really, really common example was there were loads of customers that had log data being ingested from their SaaS applications and they would do an ETL-style conversion on that log data, this is years ago, into Parquet. And they'd have a curated set of Parquet files representing a table of logs or a table of ClickTrail data or whatever inside of S3. And inside those customers, you'd see this really remarkable thing happen over a few years, which is they would stand up teams that own their own internal services. And for example, like Pinterest who built loads of stuff on top of S3, would launch new teams to do new things with the data that Pinterest shared across the organization, but they would take those arcade tables, they would transform them into whatever they needed for that team and they would run with it. And so, the idea of having a shared substrate for data and a shared API for it really meant that customers had a ton of freedom in terms of how they built on top of their data and they were free to use whatever tool was most appropriate and they just had a ton of velocity as a result of that. And that was a thing that I think we really wanted to preserve as we moved into other things. And so, as we started to look at tables, we were seeing a lot of customers building on top of OTFs, like Apache Iceberg. And the idea was that the backend of your database, that structured storage as a compliment to the unstructured object storage that you already had with S3, had the same property of wanting to be able to use any engine or any tool on top of it. You want to be able to work with Athena on top of it. You might want to work with some partner or external engine, you might want to just go write code against it. And so, we kind of leaned into the tables thing as customers loved the idea that on top of S3, they could have a structured data primitive, but they didn't love the fact that building on top of Iceberg, they were having to do all of their own table maintenance and they were having to operate it and reason about potentially durability-impacting issues if they had reference issues in their runtime for Iceberg. And so moving Iceberg a little bit off the stack and building a managed construct where they could just have a table API was something that really seemed to resonate with folks and I think holds true to where we approach the stuff as S3. And that's what's taken us into vectors as a third data type. And so, that's kind of the thinking behind those. The bit that's been really, really interesting is as we see customers, and even as I and the team are doing a lot of building, whether we're building operational tools for ourselves or building log analysis tools or just doing weekend projects, everybody is like, I think, really invigorated by the stuff they can get done with these emerging coding tools. You can build a pretty meaningful thing over the course of a Saturday afternoon. And the property of being able to work with data that you already have or being able to write data into S3 and then riff off of it with a different project the next weekend or hand it to someone else on your team to work with is just like a surprisingly awesome fit. The fact that having these building block level primitives means that it almost like individualizes what historically has been something that you needed to go and find the right SaaS tool for. And that's a thing that right now we're really excited about the speed of what folks are doing.
Rob Strechay
>> Yeah. And I think it's partially because of the flexibility of S3 and all the different things that you can do with it. I mean, even there's the FSX and S3 integrations that are used in that ML HPC type of scenario as well when people have large datasets and things of that nature and they need some file system access to it and things of that nature. So, it makes a lot of sense to me that it becomes the underpinning for a lot of this. One thing we didn't touch on that, again, I feel like you have to touch on is the fact that, again, over the last 20 years, it's about 84% lower price on S3 than it was in 2006. And when you look at it, there's the price reductions and over $6 billion saved by intelligent tiering alone. When you look out and look at the price curve and everything like that, especially with what's on going on in the market and everything like that, it would seem that this sets you up for that Amazon principle of always looking to lower prices and give more value at the same time. Is that something you're still pushing forward with?
Andy Warfield
>> Of course. I mean, at the end of the day, from a storage perspective, price is another form of friction for being able to build. If you're concerned about that aspect of your system, where you end up having to spend time auditing and identifying data that you can have or perform deletions on and things. And the way that certainly a lot of our larger customers have always talked to us about this is to say that there's an opportunity cost to all of that work and to deletions all up that there's, in a lot of cases, a future value on data that is hard to quantify. And we're certainly seeing right now that with a lot of customers that are heavy on AI adoption, for example, that customers who found ways to save those long-term datasets, whether they're logs, or transaction records, or older artifacts from their business, they're finding that that stuff is actually a really meaningful source or an indexable source for training models or building applications on top of. And so, we pushed a lot over the years on things like introducing storage classes, introducing intelligent tiering. I think intelligent tiering is an example of something that we basically turned into a one slick product to be able to allow us to choose the storage class that best fits the behaviors of your data. And I think we've saved over $6 billion on behalf of customers by launching that feature. And so, it's something that we really treat as a feature of the system that we just have to keep pushing on.
Rob Strechay
>> Yeah, I totally agree. And so, hey, this has been great. We've been talking about the last 20 years. Where do we go from here with S3?
Andy Warfield
>> I think the team is really, really excited about the velocity and the rate of launches and innovation over the past couple of years. I think that we're working closer to application builders than we ever have at this point. You mentioned there are databases and other services that are built directly on top of S3 that often allow themselves to focus on a really differentiated performance APTI layer, but then back with S3 for all of their durability and scale. I think that we will probably continue to move closer to applications, which is going to drive us into higher-performance domains. It's probably going to tie us into deeper integration with computational tasks that you want to perform on your data. And it's definitely, absolutely, going to bring us more tightly integrated with the kinds of development tools that are emerging today that are allowing a much broader population to build applications. And that's a place where I think we're really, really focused in terms of continuing to evolve this service.
Rob Strechay
>> Yeah. I have no doubt about that, Andy. I think that makes a lot of sense. And as you would say, you're always looking around the corner to see what's there and make sure that there's no one-way doors either. But I love it and thank you, Andy, for coming on. This has been great. I always love talking about this stuff and where it's going.
Andy Warfield
>> Good chatting with you too. Thanks a lot for having me on.
Rob Strechay
>> Yep. And thank you for watching this very special AWS Pi Day celebration on theCUBE, this CUBE Conversation. Stay tuned for some further analysis here on theCUBE, the leader in analysis and news.