Cloud AWS re:Invent Coverage | Pete DeJoy, Astronomer

Clips
News
More from Cloud AWS re:Invent Coverage

Pete DeJoy

SVP Products, Co-Founder

Astronomer

Rethinking workflows: How generative AI and cloud computing drive operational shifts

As organizations embrace cloud optimization strategies, they’re moving beyond traditional analytics toward advanced data infrastructures....

play_circle_outline Importance of data foundation and data infrastructure in today's business operations

play_circle_outline Growth and momentum of Apache Airflow and Astronomer's commercial products

play_circle_outline Shift towards data products and App-Dev over Infrastructure-Dev in modern data teams

play_circle_outline Boosting Data Product Development Through In-Depth Analysis of Unique Features

Info
Transcript

Pete DeJoy, Astronomer

Pete DeJoy

SVP Products, Co-Founder Astronomer

TheCUBE covers AWS re:Invent 2024 with host John Furrier showcasing cloud technology and generative AI driving data engineering value. Pete DeJoy, SVP of product and co-founder of Astronomer, discusses the shift to operating businesses with data infrastructure. Apache Airflow gains industry momentum, with partnerships with AWS and products used by companies like Marriott and Ford. The differences between Airflow and Astronomer's commercial product, Astro, are explored, emphasizing enterprise support and data observability tools. The discussion highlights the ... Read more

explore Keep Exploring

What has been the evolution of discussions around data analytics and data infrastructure in the past 10 years? add

What is the latest update on the use of Apache Airflow in various industries and by specific companies such as Marriott and Ford? add

What are data teams transitioning towards when they start thinking about data products instead of just data pipelines and infrastructure? add

What is the challenge around accelerating the development and delivery of differentiating data products, especially after investing in infrastructure and foundations for the next generation of a data platform? add

bolt Powered by CUBE AI

Pete DeJoy, Astronomer

search

>> Welcome back everyone to theCUBE's coverage here at AWS re:Invent 2024. I'm John Furrier, host of theCUBE, our 12th year covering re:Invent. It's been a whirlwind watching the journey, it's like a documentary watching the innovation cycle, just how cloud has just kind of created a lot of value, but now going next level, you're seeing a whole other dimension of value creation. The participants, the brands, the people are bringing a lot to the table and of course, generative AI is driving a lot of value. That's data, data engineering. And back on theCUBE is Pete DeJoy, SVP of product and co-founder of Astronomer. Great to have you back on. Good to see you. Last time we were at the New York Stock Exchange.

Pete DeJoy

>> That's right. That's right. It's good to see you at a new venue, John.

>> So re:Invent is popping. Obviously it's packed, they extend out to the wind. But it's the same game here every year, for 12 years it's gotten bigger and better, but we've noticed that the pattern is clear. It's a lot of practitioners, a lot of smart people in industry and entrepreneurs coming together. They're learning and it's like a gift of new stuff. It's like holiday season, new services, and there's a lot of stuff. Some stuff hits and hunts and stays, some goes away, but it's all about throwing stuff out there and Amazon's doing their part as a major cloud provider. They've got big customers. We had JP Morgan Chase on, we have Poolside just came on, you guys are on. This next level of startups, as a founder, as an entrepreneur, you are in the new class, what I call the new brands emerging that have a clear view of this new generation. And you see a lot of opportunities. So I've got to ask you, as you look at your venture, and by the way, huge chops on open source because you guys were powering all the projects of Airflow. So you're in open source, but you have the view in front of you and the world has changed how software's being written, the role of data clearly with inference becoming a building block, which is to me the most significant announcement here. And if you look at the relationship between developers and databases, well if serverless they had, I had a relationship with servers too when I'm a developer back in the old days. But with serverless, that changes that. Now with inference, software's changing. As an entrepreneur leading your company, you're in the new class of brands that are emerging out of this wave. I mean, give us your view. How do you look at this current world? You're building products, it's highly accelerated, pace of place fast. What's your view, Pete?

Pete DeJoy

>> Yeah, look, John, I think first of all, it's such an incredible time to be working in data and data infrastructure. I mean, could you pick a better discipline to focus on in this age of large language models and generative AI and companies really starting to lean into leveraging AI and data as a strategic advantage. And look, what an exciting conference as well. I'm coming off of a couple of days of long amazing customer meetings and sales meetings and really just chatting with folks about all of their hairy data platform and data infrastructure problems. And look, we've been doing this for about 10 years now, and when we first started Astronomer and started working with data teams, a lot of the discussion of the day was about analytics and actually going and using data to drive insight about how the business is operating right now. And that was a great business to be in as somebody that is deeply passionate about data infrastructure. That was very exciting because very deep technical challenges and you're driving outcomes for these customers in the form of a very deep analysis on the state of their business. Now, we've actually seen the secular trend flip over the last really three or four years where companies have moved from using data to analyze their business to using data to operate their business. And AI has obviously accelerated that tenfold, but now not only they're operating their business with data, they're differentiating their business with data. And as a result, there's been this huge focus on data infrastructure and data foundation. And we've actually felt that momentum in our core business. We've been building airflow for a long time. You can even see over the last five years, airflow download and contribution has spiked dramatically. We've seen orders of magnitude more folks involved with our open source community, and our business has followed suit. We have some of the largest players in the world using our commercial products at this point, really to drive their next generation of data strategy. And that's a very exciting thing for us.

>> Yeah, and I want to get into some of the airflow momentum and the business momentum, but we had Bhaskar on, he's a VP of ML Services for infrastructure. We were talking about how SageMaker has dropped down kind of the infrastructure layer and how Bedrock has emerged as more of that kind of model layer. And I was telling him, we were at that awkward time a few years ago where it's like, am I a model person or am I a platform engineer? And what he highlighted, what I thought was interesting, I want to get your reaction to this, is that a lot of stuff was in silos around analytics and people didn't have that horizontal view. And that's why they moved SageMaker down because it's really targeted people who think about horizontal execution, not just siloed analytics, but analytics is in there too. So you have now this horizontal, and clouds been horizontally scalable, one of the benefits of cloud. So what's your reaction to that? Because now the real SageMaker value that is product, was about running and orchestrating those jobs. Job completion is the number one KPI we're hearing in generative AI.

Pete DeJoy

>> A hundred percent.

>> So take us through your reaction to that because that's where you move from siloed department to infrastructure.

Pete DeJoy

>> That's right. And look, I'm a chemist and physicist by education. So I like using chemical terms, but we view data as a high entropy problem, right? It's naturally increasing in complexity and sophistication over time. And as a result that entropy means that heterogeneity and messiness, it just comes with the territory. So all of these silo problems that plagued data teams in the era of analytics are still plaguing data teams today in the era of AI. And data's still in silos and companies are still trying to figure out how they're going to get data out of silos, how they're going to centralize and how they're going to use domain specific data that is really IP in this age, where they're using these things to fine-tune and train models to drive outcomes to their advantage. And again, all of that's a data infrastructure problem. You made a great point there that really this foundation needs to be in place in order to actually drive contextual outcomes for the business.

>> Share your momentum with Astronomer. Again, this comes back to why I think you guys are doing so well, because you're targeting that group of people that are solving that problem, because that's foundational. You're enabling that. Talk about the business momentum and how Airflow is moving and more about the OSS side of how all this kind of comes together.

Pete DeJoy

>> That's right. So first and foremost, we built Apache Airflow, the Airflow community has really taken off. We've built actually a great partnership with AWS and other public clouds and making sure that Airflow is the de facto standard for data orchestration. And again, the community momentum has been crazy. We had our first Airflow Summit last year and another one coming up this year. And the attendance was really, really encouraging to see. Now on the commercial side, we're mostly working with high-scale, sophisticated data engineering and data platform teams that are building shared services for Airflow internally. One of our big kind of shared customers with AWS is Marriott. I am staying in a Marriott hotel this week.

>> Good for you. Always stay with the loyalty.

Pete DeJoy

>> They're using Airflow worldwide this point on Astro, our SaaS product, to go actually empower their downstream data teams to build these data foundations for their outcomes. Whether that's telling you how many Bonvoy points you have in your app when you open your Marriott app, or actually doing dynamic pricing on rooms based on seasonality and surges in demand. All that's running through our platform at this point, and that's something that we're very excited about. On the other side, last time we sat down, we talked about Ford who's using our product across their entire organization, doing very, very advanced things from traditional analytics work, to training their self-driving AI models based on radar and lidar input that they're getting from their remote vehicles. And all this stuff has just signaled that we feel very confident that we can work with some of the biggest and hairiest data problems in the world, and we're just continuing to do so. Again, we spent the last couple of days sitting down with some very major customers and folks that we're starting to work with and specing out how we can help them drive better improvements to their data strategy.

>> And also I would point out that for you guys, having that open source project, DNA, super valuable. Astro your product, talk about the difference between Astro and Airflow, obviously open source, commercial version of it, classic open source business model. Talk about the differences and how people should understand the two and when to engage one or the other or both.

Pete DeJoy

>> Yeah, absolutely. Airflow has always been our bread and butter and always will be. As with any commercial provider of an open source project, the first order of value that we deliver is enterprise support and expertise around the open source project. Now Airflow has, again, broad adoption in the market at this point. So providing that level of support is a big differentiator for us. On top of that, we build a bunch of really great software too though, John. So we talk about our product in three categories, build, run, and observe. On the build side, we have a bunch of really great developer experience abstractions that help people move more productively. That's kind of the set of tooling that you'd expect for writing, testing, deploying code, going from development to production with ease. And our customers really fall in love with us for that. On the run side, we've actually built a bunch of IP into the way we manage and run Airflow for customers. That actually makes the unit economics of running Airflow with Astronomer better than anywhere else in the world. We have better auto-scaling capabilities, especially at significant levels of scale. We have a lot of leverage. And on the observed side, we have a full suite of data observability tooling that gives you data lineage across your whole data platform and proactive alerting for potential bottlenecks in your supply chain of your critical data products.

>> So classic use case of where, hey, I can play with open source one, you know open source, let me go play with it first. And then they check it out and they go, okay, now I want to roll this out. That's where Astro comes in.

Pete DeJoy

>> That's where Astro comes in.

>> Got it, okay. So let's get into the App-Dev category. I wanted to ask you this because I didn't have time last time, but App-Dev is this categorical. "Oh, App-Dev." And is it developers? Is it top of the stack? But you kind of talked about some of those things, run, build, observe, that's an App-Dev environment.

Pete DeJoy

>> That's right.

>> So App-Dev has multiple means. How would you parse the App-Dev segment of the market? Is it platform apps or is it just more dev? Is App-Dev just DevOps? People always ask me that question, I don't always have a clear answer.

Pete DeJoy

>> Yeah, I think it's a really good question, John. So what we're observing I think in the data space is a very similar shift to what we observed in kind of the DevOps space 10 years ago, when DevOps really revolutionized software engineering. Again, when we started the company, we had a lot of discussions about data pipelines. Now we're talking about data products for people. And when you start thinking about data products, that looks like App-Dev over Infrastructure-Dev at the end of the day. Because what people are doing is they're building highly contextual outcomes that solve a specific problem. For example, when you open your banking app, you check your balance, that's actually a data product. That balance is being serviced by a bunch of very complex infrastructure under the hood, that tells you how much money you have in your checking account when you open your app. And at the end of the day, when data teams move from thinking about their infrastructure as tooling or just pure pipelines, to the actual outcome and products, their development workflows change too. And that's where we get into this whole App-Dev conversation for data teams. And I think that's what we're in the middle of.

>> So there's a little bit of overlap there in terms of the category, I would say what you're saying is that it overlaps depending upon what the outcome is.

Pete DeJoy

>> That's right.

>> So you're doing development with data basically.

Pete DeJoy

>> Yep, a hundred percent. A hundred percent. And again, I think, we've had these own discussions internally as our data team has started delivering customer-facing features in our products, our data team is now servicing tables from our Snowflake warehouse, inside our product to deliver differentiation at that observe level for our SaaS product. And their accountability looks much like a software engineer's accountability. Because if those tables don't get updated, they're on call, right? They get paged.

>> I asked the question because one, I get that question a lot, I don't have a clean answer. And also it's changing in the way we're seeing it, but also here at the show at Amazonre:Invent, you've got Q for developer, I heard you and George riffing about, "Oh yeah, generate pipelines." I think you guys are going back and forth, love the toe-to-toe there. But what that teases out is the trend of non-technical users getting data stories and data price. Because I ultimately, yeah, show me my balance. So you're starting to see non-technically, but the programs and they're going down the stack. So you're starting to see migration to other value spots for coding. And so if you're going to have a non-technical layer, whether it's Q for developer or other code assistants, code generation, smart synthetic data doing stuff, the developer's still got to find a home to code the value,

Pete DeJoy

>> That's right.

>> So just the data point shifts. What's your reaction to that?

Pete DeJoy

>> Yeah, I think we've been hearing about the citizen data scientists or the citizen data practitioner for a very long time. And I think we're entering a phase where that's actually starting to become very real and the tooling is ready to be in a state where folks that might not have all of the know-how or traditional expertise around data infrastructure and data management are able to actually use these data platforms to get leverage for their day job. Whether that's an application developer that's building a feature that wants to embed a model or a data set that the data team's delivering, or it's a business user that wants to ask a plain text question about the performance of their business quarter over quarter. We're seeing data make its way into the native workflows of professionals who otherwise prior had to go to a data team task for answers.

>> And Pete, I love the point you made earlier because you go where the complexity is, the app developer has to go where it's hard and you make the outcome work. If you have this kind of query prompt or whatever, citizen level, I'm going to go down and solve that problem there. So if that's in the data layer, so you see that at platform engineering, you start to see that data pipeline. So it's just interesting to see that App-Dev is just, you're building the app. If the data is a key component, that's just the input.

Pete DeJoy

>> That's right. But the key piece of all of this, John, is behind that prompt layer and behind that developer interface layer, the data infrastructure in this box of data platform messiness needs to be handled really, really surgically by these teams. Because when that question gets asked, we need to be confident that you're going to get a good answer back.

>> I think last time we chatted, we riffed on the whole data engineering persona. You guys talk a lot about data ops. I still love AI ops as a topic, even though it's kind of like an older word. But we are living in an AI Ops world where there's engineering that has to get done. I mean, you're not going to be able to just throw AI at things to solve it, and you've got to manage the data, because it will be in silos. That's why I like this inference message because if you can abstract away and make this the pinnacle, inference the pinnacle of a lot of other stuff under the covers, that's a DevOps-like mindset. That's an engineering thinking. And I think you guys are doing this aggressively.

Pete DeJoy

>> That's right.

>> But still, that's got to get done when you go down to an edge product.

Pete DeJoy

>> Yeah.

>> What's the data strategy there? You've got to engineer that in to the system.

Pete DeJoy

>> Yeah, it's really a stepwise thing. I mentioned earlier this shift from analytics to operations. So many of our customers come to us for a traditional ETL analytics use case and very quickly move on to MLOps, like traditional MLOps. Now we're at a point where 50% of the workload that we're running across our commercial products are MLOps pipelines. And now we're really starting to see the emergence of AIOps and LLMOps, as really maybe a branch of MLOps, but very specific to large language models and the things that you need to actually productionize large language models and fine tuning.

>> And I think you guys sit in a nice layer in the stack too, where SageMaker sits, it's a layer underneath the model engines and the developers, and you're now orchestrating and supporting the infrastructure, in this case data. I know the chips are involved too, of course, but that data infrastructure, it looks a lot like infrastructure.

Pete DeJoy

>> Yeah, and we're really getting dragged forward by our customers on this John, honestly. I mentioned the MLOps use case earlier, we'll often get deployed at a big account and very soon thereafter have somebody banging down the door and saying, "Hey, I really would like to use Astro for this new use case that I'm working on." Whether it's some new agentic workflow or some new AIOps pattern that we haven't seen before. And we're really just trying to work very closely with folks that are on the bleeding edge here and make sure that we're servicing them in the right way.

>> Yeah, I've heard a lot on theCUBE this week around AI centers of excellence, which are kind of like lab slash not really show pieces like a briefing center, but AI get your hands dirty. And they create landing, they call landing zones. What kind of landing zone gets replaced with data, play with it basically. I bring that up because I want to get your thoughts on this whole idea that, if I'm going to have a data ops network or organization, how do I set this up? And so I was going to ask you, what conversations are you having here? You mentioned a lot of customer calls, here at re:Invent, what are the conversations you're having? Are they around thinking about how to put applications in experimentation mode? Is about scaling? What are some of the calls you've been having? What have been some of the conversations here this year at re:Invent that you've been having?

Pete DeJoy

>> Yeah, so much of it is around how to actually accelerate the development and delivery of differentiating data products. Now, I know that's a bit of a mouthful. What it really means is, hey, we've invested in all this foundation and we spent the last, in many cases, 10 years either moving to the cloud or building these foundations for the next gen of our data platform. Now how do we actually start seeing results? How do we actually take that infrastructure that we've built and use it to drive the next generation of differentiating features for our company? And that's a very high context discussion, right? Because what a differentiating data product is for Marriott is very different than what a differentiating data product is for Ford. And at the end of the day, we're here to support all these folks and we want to make sure that we can support-

>> You guys have to get into the conversations. I mean, beauty's in the eye of the beholder, as I always say. So depending on the enterprise, their data will define the conversations.

Pete DeJoy

>> That's right. I'll give you another example, John. The Texas Rangers are another big shared customer between us and AWS that we've chatted about quite a lot at this conference. And they are currently using our product to do all of their in-game predictive analytics. So they won the World Series last year. I'm a big baseball fan. I'm a New York Mets fan.

>> Thanks. It's okay. I'm a Red Sox fan, but that's in the eighties. We won't go back there.

Pete DeJoy

>> Yeah, the Mets painful existence. The Rangers won the World Series last year and they actually walked us through how they're using our product to actually do in-game predictive analytics, and the amount of data that they ingest in these professional sports organizations that help inform where the batter should be positioned, what they should be looking for on the next pitch, how they can position their fielders to get a competitive advantage. Now that's a data product, that's like helping them really win in the definitive sense. But very contextual to what they're trying to accomplish as an organization.

>> I think you're onto something big with these data products, that's going to be more of the norm. Having these data products, integrating with other data products, fusing them together. I think that's why I like the models, will come together as they can get exposure. I mean, the model can only work on the data that they're exposed to. And so you keep doing that great work. By the way, the Texas Rangers CIO is a CUBE alumni, so he's been on theCUBE many times. Actually, he actually showed me the ring on theCUBE, didn't let me wear, the Chicago Cubs CIO let me actually wear the ring. I wore the Red Sox ring and the Cubs ring, on theCUBE. Couldn't get the Texas Rangers ring, he wouldn't take it off his finger.

Pete DeJoy

>> That's great.

>> Closing thoughts, observations in the industry that you're seeing. Again, you're at the front lines, cutting edge area. Pete, what are you seeing?

Pete DeJoy

>> No, I think we covered most of it, John. A lot of very exciting innovation coming out of AWS re-invent, new technology, new ways of managing data, new ways of leveraging generative AI. I think maybe my final closing thought, there's obviously a lot of discussion around agentic workflows and what that means for the next generation, at least in my world, the way the problems I like to think about, or what that means for developers and data practitioners at the end of the day. And I think there's a lot of very interesting things that are going to come in the next 12 to 24 months.

>> Like what?

Pete DeJoy

>> So I think a lot of people assume generative AI is a means for developer experience, like kind of code generation. I think there's actually a lot of leverage to be developed in the knowledge of the platform itself. So you can think about high context developer experience, wherein your IDE assistant knows what databases you have access to, what schemas you have access to, where you should be reading from, where you should be writing to, what the API schema that you're trying to pull from looks like. And all of that deeply embedded into really the core developer experience. And I think that's very interesting. Really on the other side of the house, we're very interested in how we can use agents to actually go do auto remediation at the data infrastructure level, and help people save money. I think there's a very rapid cost attribution and cost expansion space that's heating up in the era of the lake house. And we very much view that as a big opportunity.

>> And efficiency around data movement, efficiency, what context, when, where, all relevant. Been great to see you again. Thanks for coming on theCUBE here atre:Invent. Good to see you.

Pete DeJoy

>> Thanks John. Yeah, thank you for having me.

>> All right, CUBE coverage here. Again, wall to wall, four days of CUBE coverage. We'll do 40 interviews. We'll do our part to get the story out to you. It's a data story, all theCUBE's data is out there. I'll be right back after this short break.