We just sent you a verification email. Please verify your account to gain access to
Cloud AWS re:Invent Coverage. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Cloud AWS re:Invent Coverage
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Cloud AWS re:Invent Coverage.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Cloud AWS re:Invent Coverage. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Cloud AWS re:Invent Coverage
Please sign in with LinkedIn to continue to Cloud AWS re:Invent Coverage. Signing in with LinkedIn ensures a professional environment.
At AWS re:Invent 2024 in Las Vegas, the focus is on the evolution of AWS since 2023. The discussion highlights the integration of data, analytics, model development, and gen AI application development. SageMaker plays a key role in this approach, along with hardware and Bedrock for inference workloads, offering flexibility in model choice and development. The goal is to reduce inference and training costs for customers transitioning from prototype to production through features like model distillation and fine-tuning. Bedrock enables multi-agent collaboration...Read more
exploreKeep Exploring
What were customers looking for in terms of seamless progression from data to gen AI application development?add
What does SageMaker become now?add
What is the discussion about the anthropic relationship with the infrastructure and the implementation of tranium and ultra servers in Dave Brown's department?add
What is the vision for AWS in terms of customer use and access to gen AI applications?add
>> Welcome back everyone to theCube's coverage here in Las Vegas for AWS re:Invent 2024. It's our 12th year doing re:Invent. I'm John Furrier, host of theCube. Watching AWS journey from 2023 to today has really been a historic documentary, if you will, of just the innovation, but now we're on a whole nother level. What's happening with this I call gen two cloud, Amazon doesn't use that word, but I think it's gen two, whole nother level how software's going to be rewritten. Matt Garmin mentioned that to me and you start to see inference, all these new things point to a future that's not yet determined. The future is unwritten. It's being written now. Bhaskar, he's the vice president of machine learning and AI services for infrastructure. Bhaskar, great to have you.>> Great to be here.>> The future is unwritten right now, and the canvas is the new software emerging and you're seeing that with everyone's excitement around gen AI and essentially around agents. And Iga Swami kept up the end of his keynote because it is coming, but there's a lot of good stuff happening in between then. We were talking before you came on around how SageMaker has dropped into the infrastructure level, and now Bedrock is the centerpiece for using that. The infrastructure stack is a little bit changed. Take a minute to explain what's going on with the stack. What is infrastructure? What's model layer look like? And then where does it stop and start? Where's the line?>> Great question and happy to be on your show. I think if you look at where our customers are, customers are trying to understand how to move from data to analytics to model development, to gen AI application. And if you look at where the industry was prior to our announcements this week, they were siloed. There were siloed tools and services for data. There are siloed tools and services analytics. There are siloed tools for model development and then there's siloed services and tools for developing a gen AI app and deploying them. But when we spoke to our customers, and this is the Amazon DNA, we speak to our customers when we walk backwards from our customers and what do our customers want? And customers were really asking for a seamless way to move from data to analytics, to model development, to gen AI app development. And how do you actually provide that comprehensive horizontal task completion in a seamless manner? And that takes us back to your question, which is what does SageMaker become now? And so from our perspective, we are responding to our customers and we are saying the next generation of SageMaker is the unified experience that will move our customers and help more customers from data to analytics to modeling development and also to gen AI app development. And so if you look at that, and you asked me a question about how do I think about the various layers of the infrastructure and where does one start and where does one stop? Let me walk you through that given this background. The bottom most layer, the way we look at it now has two parts to it. One is the actual hardware. And as you saw, we have our training too, our own Silicon and we are super excited about the price performance of that. That is part of the infrastructure piece. And now the next generation of SageMaker encompasses the data analytics, modeling and AI, and that's now become an infrastructure component for our customers to create gen AI applications. Now, if you go one more layer up and say, "Great, I have my model with my own data that was used to either customize the model by fine-tuning or by model distillation," you probably saw the model distillation announcement->> I did.>> On Bedrock. Models that are distilled on Bedrocks is 500% faster and 73% cheaper to run. And so that brings us to the next layer, which is the Bedrock for inference. We now consider Bedrock as the platform for inference workloads on AWS. And the key element here for us is how do we help our customers write and build their gen AI applications while understanding that there's not going to be one model that rules them all. If you take the clock back two years ago, maybe a year and a half ago, I think there was a feeling that there's going to be one model to rule them all. I'm going to choose the model and then build my applications and I'm done. Now, if you fast-forward, we had model A won the benchmarks. Few months later, this model B and then model C and model D. So this constant leapfrogging of models is now becoming a reality, so how do you help our customers build their general applications while providing the model choice and also helping the customers move from one model to another? In fact, 97% of AWS customers use more than one model. And I don't know if you knew about that. But that's the very interesting->> Well, we heard. We also heard from Andy Jassy on the keynote that internally, as you guys were experimenting with the models, there was diversity around model choice even inside AWS before the announcements came out.>> Exactly. And so that I think has now become a reality. So given that reality, how do we help our customers with the right set of services and tools for them to build their gen AI applications? And this is why some of the launches that we did today on Bedrock, whether it was model distillation, whether it was prompt caching, whether it was the fine-tuning of models, they all help to reduce the inference cost because inference cost is a very critical piece of going from prototype to production. And these tools and features that we launch today help customers reduce and manage inference costs. It's not just inference costs. When you talk about costs, it's also the cost for training. And this is where the SageMaker HyperPOD AI is very useful. The SageMaker HyperPOD, the training, the task governance, the flexible training plans, the recipes.>> It's funny, because I love the positioning because it brings that orchestration of the workloads because job completion is at the end of the day, what you're striving for and job completion is the KPI that everyone's benchmarking around. So you don't want to actually redo a job that's been completed that you can offload. There's so much complexity.>> Complexity in dealing there, and we want to automate that. We want to make it extremely easy for our customers and make it cost-efficient so they can actually go to market faster and much cheaper training costs. That's the training piece, so there's infrastructure which is SageMaker plus the architecture, the hardware architecture. One more layer up gets you to Bedrock and the Bedrock features and capabilities for inference. Now if you go one more layer up and you think of this and say, okay, if I'm in Bedrock, how do I build using the tools that Bedrock provides, whether it's knowledge bases, whether it's RAG, whether it's Bedrock Guardrails, and we just announced automated reasoning that integrates with Bedrock Guardrails, that reduces hallucination. And whether it's a multi-agent collaboration, how do you use all of these tools in Bedrock to build your gen AI application so that you can now go to .>> By the way, I love the multi-agent collaboration feature. I thought that was pretty hot. The automated reasoning checks are nice too. That kind of flows into that. A lot of good, this is coming in and I love how RAG has evolved into not just search, it's a lot more going on.>> And the Graph Rag.>> Oh, yeah. Just point your structured data and graphs data at the graph and the structured data too.>> Well, that's another piece, right?>> Yeah.>> The Graph Rag helps you connect various data sources together to augment your responses. There's the data automation piece that we launched where you can take unstructured carpets of unstructured data and create structured data with it. So we are pretty excited.>> Before we get any further, I want to go down the rabbit hole a little bit with you on this because I want you to explain first what you do. You're a VP at Amazon, which means you got a lot of responsibility. Are you building the products? Are you building products and engineering? What specific things are you working on? What services? What's your focus?>> Oh, great question. The way I look at my charter is I'm responsible for Amazon SageMaker. I'm responsible for Bedrock. I'm also responsible for the data processing capabilities like Amazon's AWS's, Athena, EMR, Glue, and then how do you actually bring the data?>> You get all the jewels. You got all the good stuff.>> How do you bring and put them all together so that the data, the ML and the AI pieces all stand as one unified piece, which is what our customers have been asking for.>> You are looking at a major part of the stack. Dave Brown's got below you, so he's got the compute side. What I'm really interested and what I was fascinated on, on my pre-interview with Matt Garmin, although he didn't give me the scoop, but he was pretty clear and I thought I nailed it with the inference piece because I think what I heard there and Verna was celebrating Lambda a couple weeks ago, 10th anniversary, and that's the serverless revolution. We saw that, and I'm like when Matt said the databases and that inference is going to change the way software's written, the dots connected. Oh, Verna's on the serverless celebrate. Oh, serverless abstracted away servers in a way for developers with functions and Lambda. Oh, great stuff. Okay, inference is going to abstract database because we've been saying on theCUBE for almost eight years, "Dave, there's too many databases, but databases are great. You can have a lot of different data." Time series for here, you got columnar store there, you got already relational whatever database to do the job. Not one database to rule the world. That sound familiar?>> Yeah.>> Inference now is the interface to the developer.>> Yes.>> That will have a breakthrough for capabilities just like serverless. No one knew what was going to happen next with serverless. So if you just take history with serverless and say, "Okay, put that into the inference side," that means the developer, every developer that was born, that's not 20 or 15 today has lived with a database. You don't write code without databases, right?>> Yes.>> We all know that. Now that's changed. They're living with inference.>> Yes.>> What is your vision on what that turns into? Because the game will change. It's still the same game. You're storing data somewhere, but it's being done for you, it's database less, I guess. But something's happening. Tell us what's your vision?>> Yeah, great question. Let me split them into two parts. There's one where you have structured data. As you rightly called out, if I'm a GNI app developer, I want the most easiest way to take my data and create a gen AI application. I don't want to worry about whether it's structured, unstructured, what do I do about unstructured? How do I manage the corpus of data that I have? So if I have a structured data, the structured data can also be made useful and integrated into Bedrock for your gen AI inference. That's number one. Number two is I have unstructured data. Think of them as tables, images, videos. I have documents that have tables center into them. How do I take this large corpus of data, transform them systematically into something that's valuable and then integrate them into my application without having to worry about how I actually go build my pipeline to do it? If you remember from the database world, people are talking about ETL. How do you do an ETL equivalent of it for gen AI, but without having to worry about and doing the heavy lifting that's required to move from unstructured to structured? And this is where the Bedrock data automation is going to play a very critical role because with the unified API call, you can now transform a corpus of this unstructured data into valuable data that you can now integrate very easily with inference. I think the point that you started out with, which is inference is now think of this as the pinnacle that actually combines the work that you've actually done with data, the work you've done with models and the work you've actually done with RAGS and knowledge bases and all of that all come together and how do you make it easy through APIs so that enterprise customers are wanting to develop AI applications don't have to do the heavy lift. Very excited by these announcements.>> Yeah, and it's interesting too because you think about infrastructure as code. Go back to the early days of the cloud. That was DevOps. I want to control the production environment because I'm building it. That created a revolution. Yeah, it throws security in there, DevSecOps. And then you had the complexity of all the servers and serverless. Now you have all this complexity around data. You mentioned formats, but this database, that goes away from the developer. Now I'm the developer, I'm just coding. So you're going to have businesses code, data as code, so the relationship to the developer becomes paramount. It was nice to see Matt Garmin's the first slide be about the heroes, community. But his second sequence of his keynote was developers. Again, another little tell sign there of where the mindset is. This is going to have a huge impact on developers. Any vision for developers out there thinking about this? What could this turn into? Because this pinnacle is going to also be a service agent to the developer.>> That's a fantastic segue. I love the way you actually got to this piece. Now, one of the things that I think is now becoming an inflection point in the industry is there's a realization that there will be agents which are very specifically designed to perform a single task. And you will have a multitude of these agents and then you want to have an orchestrator who can take these agents, orchestrate them to produce the final response to the customer. This is one more level up than me just writing a gen AI application. Think of each agent as an application. Now you have multiple of these agents. How do you put them together? Again, with the Bedrock's multi-agent collaboration, we are making it extremely easy for customers to write multi-agent collaboration systems without having to do the heavy lift. And the interesting piece here is each agent is specifically designed to do a single task. But even again here, you want to make it extremely easy for customers to do it. We don't want them to be writing a lot of code just to get an agent out. So how do you make it easy and scalable?>> We're entering an era of personalization, so quick question for you. Within a year or two years, say two years, will agents be able to book my flight and my hotel going to New York and make the transaction for me? Under two years or greater than two years?>> I think it'll be under two years.>> Even with payments involved?>> I think there will be->> Reservations, maybe.>> Yes.>> Payments?>> Yes. I think the pace at which the industry is rapidly approaching production scale for agents I think will be there. Because if you look at it, if an agent can impersonate under authorized conditions for a user, then you can actually do the user's task. And so we are already there with some portions of it, there are sparks of it with the computer use that we launched recently in collaboration with anthropic. These are actually all trends moving towards the place that you actually described.>> Talk about the anthropic relationship with the infrastructure because what's coming out of the tranium too, and the ultra servers that's being implemented, they're doing their job in Dave Brown's department.>> Yes.>> I got to say they're doing good work.>> Great work.>> But people are writing down to the hardware level. And for the first time in theCube, in the 15 years we've been doing this, this is the only year besides me saying the word assembler code I've heard people talk about, "I'm writing assembler, I'm at the Colonel level." These aren't hardware engineers. These aren't ISV type, these are software developers. They're going down with a value extraction. So you see the SageMaker at this point of value because you've got orchestration, you got interface to infrastructure. SageMaker is becoming that interface to everything below Bedrock. So that's going to be a nice spot to do some innovation.>> Correct.>> What's the developer angle on that? What's the focus for me if I want to extract value? I know perplexity is going down to the low levels, to squeezing every piece of performance and value out of that.>> Yeah, great question. I think the reason why we actually get down to the kernel level, for example, and even assembly, is you want to extract every single mileage that you can to reduce the cost of training or running insurance on these workloads. That's the primary reason. And if you look at the SageMaker Unified Studio, the Unified Studio is essentially a unified environment, as I mentioned, multiple times, which helps you move from data to developing your MN model or your gen AI model to building your gen AI application. And that I think is basically how we are thinking of lifting up the abstraction level so that it's easier for developers to use this unified environment. For example, the Bedrock ID is now integrated into the SageMaker Unified Studio. That's a pretty interesting development.>> Bhaskar, you got a hard stop. I could chew your ear off for another hour. We'll definitely pick this up maybe in Seattle. in Seattle?>> Yes. I'm based out of Seattle.>> You guys got a nice studio I use up there. Good podcast studio up there. Let's pick it up there, but just end the segment with your vision for ML service at the infrastructure. What's the biggest change? Obviously SageMaker, people now may or may not know it's been positioned, but what's the vision of where the value is, how people can interface, or what they should be doing? What should they be looking at? Share with us your vision.>> Let me start with where we want to be. We want AWS to be the best place for customers to build gen AI applications. We want SageMaker, we want Bedrock, and we want the data processing capabilities that we launched to be the most easiest and a scalable way for you to get to market really quickly and very efficiently. Whether you're trying to process data or whether you're trying to train and build a model, or whether you're running inference, that is the vision.>> Awesome. Well, thanks for coming on theCUBE. I know you're super busy. Thanks for taking time out of your busy schedule.>> Absolutely.>> Welcome to theCUBE first time, now part of theCUBE alumni family.>> Yes.>> Appreciate it.>> Thank you. And as I said, I'm a fan of your show.>> Thank you. All right.>> Thank you.>> Well, we're bringing all the action here. This is theCUBE factory here. We're bringing all the action here in Las Vegas, our 12th re:Invent. We've been covering the journey and it's like a documentary is writing itself. So all the innovations here. I'm John Furrier with theCUBE. Thanks for watching.