We just sent you a verification email. Please verify your account to gain access to
Cloud AWS re:Invent Coverage. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Cloud AWS re:Invent Coverage
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Cloud AWS re:Invent Coverage.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Cloud AWS re:Invent Coverage. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Cloud AWS re:Invent Coverage
Please sign in with LinkedIn to continue to Cloud AWS re:Invent Coverage. Signing in with LinkedIn ensures a professional environment.
Kevin Miller, Vice President of Global Data Center for AWS, discussed his team's efforts to manage the expansion of data centers worldwide with a focus on efficiency and reliability. They have made improvements such as reducing blast radius and enhancing cooling efficiency. Liquid cooling, especially for AI training, has been a core innovation. AWS is exploring the use of nuclear power for lower carbon footprint and reliability. The goal is to be water positive by 2030, with a focus on returning more water to communities than is consumed. Advancements in powe...Read more
exploreKeep Exploring
"How does the process of building and maintaining data centers for cloud services work?"add
What are some of the features and improvements in data centers that have been announced, specifically relating to reliability, efficiency, and reducing blast radius by close to 90% in some electrical systems?add
What innovations are being implemented in data centers to improve efficiency and reduce energy waste, specifically in terms of liquid cooling and computational fluid dynamics modeling?add
What are some ways in which companies are improving efficiency in AI training?add
What is the focus on performance per watt and grid modernization in the conversation?add
>> Welcome back everyone to theCUBE's coverage here in Las Vegas for AWS re:Invent 2024. I'm your host, John Furrier of theCUBE. It's our 12th year covering re:Invent. I remember 2013, our first was the actually second re:Invent. I think re:Invent one was kind of a beta. I don't think Andy wanted to say that on stage, but it was technically the second but so much has happened. Just the expansion of cloud, cloud regions, the overall CapEx infrastructure spend now to support the next generation inflection point around gen AI, and just the overall growth globally of technology has been a big story. We got the person here inside theCUBE, Kevin Miller, Vice President of Global Data Center for AWS. He handles all the expansion, all the work involved, and thinking about getting more power and capacity.
Kevin Miller
>> That's right. Thanks for having me.>> Like nuclear reactors. Well, welcome to theCUBE. We're going to go nuclear in this section.
Kevin Miller
>> It sounds fun. Yeah, thanks for having me.>> Thanks for coming. So first of all, explain to the folks what your job is because you have probably one of the coolest jobs and probably a tough job, and from an organization standpoint to figure out how to bring more capacity. It's come up in every conversation.
Kevin Miller
>> Yeah, absolutely.>> David Brown and I talked about it at length. The chips are only as good as the power that can power them.
Kevin Miller
>> That's right. Well, chips need power, and we need data centers to run them to keep them secure and available, so that's my job. It actually is a really fun job. I have a great team, worldwide team. When you talk about our 34 regions around the world, at the end of the day, the cloud is not ephemeral. It is a set of data centers that we have in these geographies and countries around the world. My team owns everything from taking a bare piece of land to building data centers, putting all the equipment inside, all the electrical and the cooling equipment. Actually, a lot of that's our own designs. And then the control system that runs the data center, which is really important to run them efficiently and have high availability. So we own all of that, and then of course all the 24 by 7 operations, each of our data centers has 24 by 7 staffing to make sure that they continue to run well and so that's all part of my team's role.>> Just quick question on that whole design side, is it open source or you guys proprietary knowledge for you guys? How do you guys look at these designs? Do you guys collaborate in an open ecosystem? We've seen success with Open Compute Project for instance. What's the strategy around some of the design thinking?
Kevin Miller
>> Yeah, we have some engagements there, but for a lot of what we do, it's a pretty fundamentally different way we approach the data center where instead of working on discrete, highly featured components in data centers, similar to the way we build services, we actually look at simplifying and building smaller building blocks that we can scale up and so it's been in a different approach. And so a lot of that is our own development, but we certainly work with a broad ecosystem of partners around it.>> Back when theCUBE had no real filter, we could just get anyone on like James Hamilton when he would unveil all the secrets. He used to talk about blast radius.
Kevin Miller
>> Oh, yeah.>> That core concept.
Kevin Miller
>> Yeah, absolutely.>> In context to that, it was scale. At scale, things break a lot. I know you guys have done such great work over the years. We've covered it on SiliconANGLE and theCUBE around the way you guys did Nitro and the connectors, just the history, just been an ongoing progress. What are some of those blast radius conversations you guys have now around design? Because as you roll out at scale, it's not just a real estate play, you're designing a system.
Kevin Miller
>> Oh, yeah, absolutely.>> It's the space, it's the location. Does it have power? Is it near water and cooling? Does it have sun? I mean, all these things are now into consideration. What are some of the features to get those primitives right? Which is like the classic AWS flywheel.
Kevin Miller
>> Right, building the right primitives. Well, I mean within the data center, that's exactly how we think about it. We think about building primitives and in fact this week we announced a number of new components in our data centers. That's actually one of the key stories is that we've reduced some of the blast radius by close to 90% and for some of our electrical systems. We're really focused on continuing to improve reliability as well as efficiency, both carbon and cost efficiency of all of our capacity. We're really proud to be talking about some of those innovations. They really help at the end of the day help our customers.>> What was the news? What was the news that you guys announced here? Because I want to make sure we get that out there, what was the key news?
Kevin Miller
>> Well, there's a lot of innovation around liquid cooling, around other components in our data centers to just to streamline, to improve availability, reducing blast radius as well as really driving improvements in carbon efficiency. With some of our improvements we're seeing at peak cooling times up to a 46% improvement in our cooling efficiency, just the power that's used for it. And a lot of that we do some pretty cool modeling with computational fluid dynamics to really make sure we never have to over-cool because basically any kind of excess cooling is just waste, you're wasting energy. And so that's a big part of, just like we think about services, how do we really remove under utilization and really drive utilization up? The same philosophy applies to our data centers.>> What's the core innovation in the data center right now if you had to point to one thing?
Kevin Miller
>> I think a lot of it is around liquid cooling and bringing liquid straight to chips because it's a much more efficient way of removing heat. And of course, with all the AI innovations, what's really important with AI for training in particular, we want training to be efficient. For it to be efficient, you actually need to pack as many chips as you can into as small a physical space as you can because that reduces latency, and the latency is what drives efficiency so that's one of the biggest.>> And we saw the UltraServers with Trainium2 now available.
Kevin Miller
>> That's right.>> That's powering.
Kevin Miller
>> Yeah, we have them visible here at the show, it's great.>> We saw Peter DeSantis' keynote, that's a six-month's worth of content to unpack. It was like a key fest. If you're into hardware that was like.
Kevin Miller
>> It's a great one.>> That was a pretty damn good one, it was a good one. Let's get into that. Unpacking that whole future, what are some of the things that you're looking at to power some of these large LLMs? Because that's the number one thing. Getting the cost down and the scale up price performance is a word. It's been around the industry, but we're seeing an era now where price performance is hitting the scale where the next thing comes out, increase the performance but lowers the price, but it's step functioning through.
Kevin Miller
>> Right. Well, and I think one of the biggest new measures that we really are focused on is performance per watt, right? I mean, because we want to be able to do the same amount of work with lower electrical usage and so we're really seeing those benefits. When you look at our Graviton, our Trainium processors, we're really seeing some of those benefits play out. We're also doing the exact same in the data center as well. So when I talk about not over-cooling, it's really how do we get that out? But when we look at electricity more generally, we are very close partners with a lot of electric utilities and generators and grid operators. We spent a lot of time working through, and in fact, in some parts of the world, I think we probably have better perspectives on grid capacity and transmission constraints than the grid operators in some places. But that's part of what we do to really partner, and to make sure that we're doing the right things to deliver electricity for the data.>> It's funny you mentioned the grid. I won't say they're slow, but there's a lot of operating technology we've seen in the IoT space, especially in manufacturing and these other older operating technologies. They're not necessarily IP-based. Grid is a similar model where it's been around for a while. It's critical infrastructure.
Kevin Miller
>> Absolutely.>> How do you look at grid modernization right now? Again, you guys are bringing a lot to the table. I want to get into grid modernization and we'll talk about some of the community impact, but for grid modernization, there's giving back to the grid that the grid can't accept it.
Kevin Miller
>> Well, and I think a lot of the opportunity in the grid is actually, you take what we've done in the data center and then you scale it up to the scale of a grid. Because one of the things we've really innovated with in the data center is the control system, which is, and all the software, the industrial software, which of course when you're talking about industrial operations, 99% is way too low. You can't tolerate 99%, you need highly reliable systems. Six nines, seven nines is reliable systems. That's what you need in the grid. And so you take that utilities, that kind of industrial utility software that we run in data centers and you apply that idea to the grids. I think there's a lot of opportunity to be smart about where do we place new capacity, both. To taking power from the grid but also putting power on the grid, and using industrial software to really manage that more efficiently. There's a lot of opportunity there.>> Kevin, for us old school guys, grid computing used to be a term in the '90s.
Kevin Miller
>> That's right.>> And so when I think of grid, I think that's the ideal state is to have just a modern set of resource versus now it's highly selective. Take us through your selection process because it's not like you're rolling into these towns and says, "Hey, we're going to build a grid node here." Or ideally, maybe in the future that could be there, but a lot of times the resources may not be in an area.
Kevin Miller
>> Yeah, and we spend a lot of time. There's no one factor that drives it but we look at a number of factors. Obviously, we look at where's power available, where do we have land availability? We actually spend a lot of time working with the community. We understand the local workforce. In some communities we find we've recently had some large announcements in Mississippi, $11 billion in Mississippi and in Indiana, the state of around South Bend. And these are places where we've worked with the local communities, and we see a real hunger for that workforce development and a joint collaboration to make sure whether it's through community colleges or other programs that we're actually training the next generation workforce. Well, of course having land and power and the other conditions, we need to understand what are all the constraints for us to build data centers and make sure we're operating within them.>> I remember talking to Andy Jassy at one of the replay parties many, many years ago, maybe about a decade ago, around when Major League Baseball went from old stadiums with the AstroTurf to the pristine fields, the Fenway Park vibe. They had new ballparks being built and they would revitalize cities because they would essentially be a physical plant, a ballpark, and then around it, restaurants. Just the economic throughput that came from that facility was huge now, and we were riffing that, hey, these regions are bringing a digital revitalization. So talk about how you guys are seeing that play out, because I see that being a big story because you go to Indiana and drop $11 billion, you're in Mississippi, 10 billion, that's real money.
Kevin Miller
>> Oh yeah, absolutely.>> And there's a facility plant there, you got a ballpark region.
Kevin Miller
>> Right. No, I mean it's a data center, and so obviously there's a number of ways that shows up in the community. We do some economic studies and we look at the GDP benefit for that. Local economy usually is multiples of the direct investment because the direct investment of course covers the construction and all the jobs that are created during the construction. There's jobs obviously to operate data centers and all the workforce development that has to happen for that to take place. But then we also just partner closely with communities. Oftentimes we build programs like Think Big Space, which is an idea where we go into a local school and build a space where bring in 3D printers and Lego kits and Technic kits and really see a ton of engagement with younger kids and really realize like, "Hey, I can solve these interesting science technology problems," and so it's really fun to see some of that open up. And some places, it's literally planting trees and helping communities just do what they want to do in their community effectively.>> And get them exposed to the technology, those next kids are going to be building the next power and cooling technology.
Kevin Miller
>> Absolutely.>> You never know, robotics clubs are popular in those areas now.
Kevin Miller
>> Oh, they're so popular, yeah. I mean, some amazing things that they can do with robotics. So really fostering that innovation and actually bringing that to communities that may not have that previously, where there's just not that genesis of whether it's parents or employers or someone who wants to do that, we can show up. I mean, my team, I spend a lot of time meeting with my teams in the communities, and frankly, they love spending that time within their community because it really shows.>> Kevin, I have to ask you the whole old way, new way around data center construction. The old way was, oh my God, we need to rack some servers. Oh, there's some real estate available. So real estate trusts usually run all that business and it's just a plant and they just, okay. And then they plug power and next thing you know they suck all the power out. Not really scientific, and now it's a little bit more crafted.
Kevin Miller
>> Oh, yeah.>> Talk about the difference between the two, how the discipline's changed, how the market's changing, the personnel's changing. Through the old way, new way, what's the transition look like?
Kevin Miller
>> Yeah, well, I mean I think the big thing for us is, as I was saying, we build our own data centers and we really rethought the way that works. I think traditionally you'd buy, for example, complicated UPS systems and they have a lot of functionality within one system where we've really taken it and pulled it apart, really simplified it. And then again, it's that industrial control software on top that actually is a lot of the magic in terms of we have a certain operation in our data centers, a traditional data center. If you have a failure, you might have a long run book, do steps one through 84 to resolve it. We add one of those procedures where to diagnose a certain failure would take us hours. And during which period we were running our generators, we had to have backups running. We are now using industrial software to manage that where instead of hours, literally the software takes two seconds, it can identify the problem and then change the configuration of our electrical system to fix that problem. And therefore we don't even have to run a generator. So that's the kind of benefit we see. When you think holistically about the data center, you focus on simple components and then that software which has predictable, just like we build services, having predictable failure recovery paths is so important and it really leads to higher availability for customers. And obviously carbon benefits, not having to run generators is a huge benefit.>> I was talking to Dave Brown about Trainium2 and we got to do a little throwback. Oh yeah, back in 2007 I was working on this, but we were going, but this is an OG on EC2 and all the history, S3 and whatnot, you've been there.
Kevin Miller
>> Yeah.>> But one thing he said that was striking, I want to get your reaction as for your world, because he said, "We've gotten really good at building silicon and the time it takes the tape out through the advancements of what Amazon brings to the table." Through those experiences, certainly with Annapurna, he went into great detail, but the point was is that, hey, we've gotten good at this.
Kevin Miller
>> Yeah, that's right.>> How good have you guys gotten at data center building? What have you learned and what does AWS and Amazon's scale bring to the table? Obviously supply chain, knowledge.
Kevin Miller
>> That's a big part of it, yeah. And I think that I totally agree. We're very good at chips design and I'd say in many parts of the data center we're that way as well. When I think about our electrical systems especially, we've gotten very good at understanding what does it take to go from one mile an hour to 60 miles an hour. When we're building a new component, we'll test it, but then the question is, okay, next year can we get 100X of those? And obviously to do that, you have to be thinking way ahead in the supply chain and the components. It also goes into the design because if you can have simpler designs with more repeatable components that you can manufacture quickly, that reduces your time to deploy a data center and reduces the variability you see from different supply constraints. We've gotten very good at building data centers. A lot of that is about how much can we manufacture offsite and pre-manufacture and have that staged and then how do you pull it all together as quickly as possible so it's been pretty cool to watch.>> Yeah, awesome. I think you guys bring a lot to the table. I just will say as a side note here, I was just in Atlanta for Supercomputing '24.
Kevin Miller
>> Oh, yeah.>> Liquid cooling was the hottest topic. I mean, who would've thought?
Kevin Miller
>> No pun intended.>> It was hot.
Kevin Miller
>> No, it's critical going forward just given the density that we're seeing and the need to have as many chips as you can in a small space.>> Are you guys doing direct cooling on the chip?
Kevin Miller
>> That's right.>> Or around the chips, around the system?
Kevin Miller
>> No, we're doing direct-to-chip cooling through a cold plate technology. I mean, we think there's a whole ecosystem. Of course, it depends a lot on the chips as well and the rest of the components, what they will accept. We think the direct-to-chip is where we are.>> You can really see with heat coming and then cooling down, do some routing.
Kevin Miller
>> One thing that I think people don't quite understand with liquid cooling is some people think that that involves a lot of new water. It's actually, it's a closed-loop system. Actually, the fluid is, it's not bringing fresh water in. It's actually a closed-loop system to take chilled water, and by chilled it's not very cold, but it gets to the chips, it takes the heat away, and then that heat is exhausted. We're actually really proud of how much water we don't use. We don't use a lot of water in our data centers, and this doesn't change that actually.>> Well, I think that brings up the whole point of it's not assembling a bunch of servers bespoken, bolting them together. You guys are looking at it as a system architecture.
Kevin Miller
>> You really think about it holistically. Exactly. That's one, I think one of the key innovations.>> All right, so you've got the energy thing you're working on, you're powering the LLMs, you've got the grid modernization, you design your own data centers, you're learning a lot, you're getting good at it. You're serving the regions, all good stuff. Now, nuclear has been discussed.
Kevin Miller
>> I think we were going to go to nuclear at some point today.>> We got to go to nuclear here, nuclear power. A lot of people, including myself are not piqued on how safe it is, and there's a lot of documentation out there, and then how that could actually be a driver. Talk about the strategy on nuclear. What's gone into it? What's the thinking? Share your thoughts on this.
Kevin Miller
>> Yeah. Well, I mean a couple of things to start with nuclear. First off, it's not going to be something overnight, and it's also something that obviously it has to be safe. It has to be something that is even safer than we have in prior generations. It's really interesting. If you actually look at the US Department of Energy website, there's a lot of good information that they've published about the ideas and the possibilities, particularly with small module reactors. There's a lot of focus on inherent safety and passively safe designs that don't require any action. In the worst case, system shuts itself down, but we certainly look at nuclear energy as lower carbon than even wind or solar because wind and solar actually still have carbon in all of the mechanics, everything to build those sites so it's actually lower carbon when you look at it, and obviously provides more firm power, 24 by 7 power source. And so again, it's not overnight, but we really do think that down the road it's going to be a really important enabling technology as we, Amazon, and we the world at large transition towards a carbon-free future. We think it's a pretty critical and important technology.>> You guys working on that, okay, cool. So not a lot to report at this point, other than you working on it.
Kevin Miller
>> Yeah, we've made some investments. We're partnering closely with an SMR manufacturer and X-energy as well as with utility partners that are planning to then deploy those because we plan for all of this to be grid-connected. Our data centers are grid-connected. We want to pull power off the grid. We are excited about new sources of energy on the grid.>> That's really good where we need the grid to be safe and secure and efficient.
Kevin Miller
>> Reliable.>> Reliable, all that good stuff.
Kevin Miller
>> Operated, yeah.>> All right, let's zoom out and go for the abstract thought exercise. You get good at building data centers, you're connected, but to the grid. Now we're in a distributed computing paradigm. Do you see a vision where all these cameras on light poles basically are being powered by a data center, computer vision?
Kevin Miller
>> Oh, yeah.>> And you start to get to local zones and I start getting to distributed infrastructure. Do you see those designs and/or those data centers connecting to extensions of the data center, like a power edge, like this edge device that's basically data center in a box? I'm oversimplifying it, but how do you see that expansion? Because that would bring more goodness to the edge, obviously got the space thing going on, but just stay with the earth for now.
Kevin Miller
>> Yeah, we'll see what happens in space. No, I think that's right. I think first off, fundamentally there's laws of physics around when you look at how rivers form. It's almost like how a river forms where the widest part of the river, I would say is kind of the regions, the core infrastructure we have today. But then the river has to start as a stream somewhere. And so those streams can be, like you said, the individual cameras. A lot of cameras today actually have a ton of compute power and processing on them, and that's not going away. I mean, I think there's still going to be a set of software that runs there that's actually helpful, but then the next level up might be we have 600 points of presence around the world today. It could be a point of presence or a local zone that we have, and then eventually those aggregate up into the widest part of the river, which is where the region is. I think fundamentally you're still going to see something like that over time. It's not going to be one size fits all. It's going to be a progression.>> We saw the cell tower expansion, every town and city had to have one. Now they needed cover a little bit different dynamic, but I can envision-
Kevin Miller
>> Well, I think it's pretty similar actually.>> Yeah, local towns and cities will need and they're going to be like there's no playbook for them.
Kevin Miller
>> That's right. I mean, it remains to be seen. What's that intermediate step? Because again, I think a lot of edge computing devices, whether it's a phone or camera or any other device, it's going to have a fair bit of capacity on it to begin with. And so then the question is, where do you need a lot of networking and where do you need actually some compute power? We'll see.>> Well Kevin, great to have you on theCUBE. I love this topic. As you know, I love to geek out on the infrastructure.
Kevin Miller
>> Yeah, absolutely.>> Final question for you. What's the coolest thing you guys have done that you can point to say that's the coolest thing we've done?
Kevin Miller
>> Well, I mean, I think that I am very proud of our work to reduce our water usage. And so you look at what we've done in regions like San Francisco or in Ireland where between 85 and 90% less water than prior generations, that's a really important. We're really focused on our goal to be water positive by 2030 and actually turn more water back to communities than are consumed for operations. But we're already making some really good progress on that, and that's something I'm really proud of.>> Yeah, and that's a game changer. That's an earth changer.
Kevin Miller
>> Yeah, absolutely.>> Thanks for coming on theCUBE, really appreciate it.
Kevin Miller
>> Yeah, thanks for having me, John. Yeah, good to be here.>> All right, get all the action going. Nuclear here, we're getting the nuclear option coming soon on powering those chips. But the key to success with these chips is getting that power and energy because the LLMs that are coming down and going to get smaller, faster, cheaper, price performance. Again, powering this next generation AI is AWS. I'm John Furrier here in theCUBE, nuclear reactor of content. We'll be back after this short break.