Christopher Sullivan
Director of Research and Academic Computing for the College of Earth, Ocean and Atmospheric Sciences Oregon State University
Anthony Dina
Director System Engineering Dell Technologies
Joe Steiner
Principal System Engineer Dell Technologies
Dave Vellante
Co-Founder & Co-CEO SiliconANGLE Media, Inc.
Oceans of Data: Tackling Climate Research With Data and AI, Americas
February 25, 2025 | 6:00 PM - 6:35 PM UTC

This program will be available on-demand after the conclusion of the Oceans of Data: Tackling Climate Research With Data and AI, Asia-Pacific broadcast. 

Join theCUBE and Dell for the exclusive event, "Oceans of Data: Tackling Climate Research With Data and AI." Discover how a robust data foundation is essential for maximizing AI's potential in climate research. Dell experts, along with Chris Sullivan from Oregon State University (OSU), will delve into managing large data volumes, integrating AI solutions, and handling unstructured storage. Learn how Dell PowerScale and NVIDIA infrastructure support scalable AI adoption, offering actionable strategies to optimize data management, accelerate AI workflows, and future-proof AI initiatives across diverse applications, driving impactful results in climate research and beyond.

Chris Sullivan, director of research and academic computing at the College of Earth, Ocean and Atmospheric Sciences, Oregon State University, joins theCUBE panel with Dell’s Anthony Dina, director of system engineering, and Joe Steiner, principal system engineer, to discuss AI’s role in ocean and earth research. Led by theCUBE Research analysts Dave Vellante, the discussion sheds light on how AI and Dell infrastructure, such as PowerScale, facilitate groundbreaking advancements in oceanic studies, critically enhancing research efficiency and outcomes.
Read more
explore Keep Exploring
What was one of the greatest advents for AI? add
What was one of the key factors that enabled the advancement of AI technology? add
What is the importance of having on-prem data storage and networking capabilities in the context of handling large amounts of data? add
What are Dell's thoughts on choice and partnerships in shaping the next generation of AI transformations? add
What are Dell's thoughts on choice and partnerships in shaping the next generation of AI transformations? add
What are some hurdles faced when integrating AI methodologies with diverse data sets and how can technology infrastructure help in dealing with these challenges, with a focus on real-time plankton research? add
What are the challenges faced when collecting and processing large amounts of data in remote locations such as forests and oceans, and how can edge AI and cloud services help overcome these challenges? add
What strategies are working for customers in maintaining data integrity, security, and addressing the challenges of managing unstructured data and the silo effect across different industries, particularly in light of evolving privacy regulations, AI risks, and ethical standards? add
What strategies are working for customers for maintaining data integrity and security in various industries when dealing with unstructured data and data silos? Additionally, how are heightened risks around AI, evolving privacy regulations, and ethical standards being addressed effectively in these industries? add
What are some ways in which PowerScale and other relevant products drive efficiency for organizations in terms of handling persistency of data? add
What are the three areas that data is subdivided into when discussing network improvements in the PowerScale platform? add
What are the key considerations for creating a durable and sustainable AI practice in any industry? add
What are the fundamental areas that need to be addressed in order to make the deployment of AI algorithms sustainable and durable in any industry? add
What are important factors to consider when choosing partners for data and AI strategies? add
close
Watch video in context
search
Dave Vellante

>> Hello
and welcome to the Special CUBE Power Panel where we delve into the transformative impact of AI and oceanics and earth research showcasing how Oregon State University leverages AI for groundbreaking advancements in the role infrastructure such as Dell PowerScale, and other AI infrastructure plays in supporting its mission. With me are three guests on our panel today. Chris Sullivan is the Director of Research and Academic Computing for the College of Earth, Ocean and Atmospheric Sciences at Oregon State University. And two folks from Dell System Engineering, Anthony Ant Dina and Joe Steiner. Gentlemen, welcome to the Cube. It's great to have you.
Joe Steiner

>> Thanks
for having us.
Anthony Dina

>> Yeah,
great to be here.
Dave Vellante

>> Yeah,
you bet. All right, Chris, let's start with you.
Christopher Sullivan

>> Yeah.
Dave Vellante

>> You're
helping do some amazing things. Assist our audience in understanding your mission, your role as Director of Research and Academic Computing at OSU, and your focus on ocean and atmospheric sciences.
Christopher Sullivan

>> Yeah,
so I've been here at Oregon State for over 25 years as a researcher and computational scientist and as the director in this college, the College of Earth, Ocean, Atmospheric Sciences, we really do focus on how the plan is being changed by the things that we do and the way we interact with it. And we use a lot of data, obviously, and a lot of computational methods to help us do that. When we look at the ocean side, we are working with plankton and crabs and seabirds and krill and all kinds of different species from the ocean. We also have a lot of work happening on the land. We have, working with the United States Forest Service on spotted owl and all the other animals in the forest and even freshwater plankton. So we're really trying to marry both of those pieces together and understand things like the carbon cycle and how we're changing those things to understand our planet and how we can interact with it better.
Dave Vellante

>> So
it's awesome that you've been there for a couple of decades now. And the reason I want to follow up with that, because you have seen AI evolve, kind of the pre-AI herd around the world and of course the gen AI moment. How has AI enhanced what you're doing with ocean and earth sciences research? And specifically, I'd be interested in your AI journey pre and post that gen AI awakening.
Christopher Sullivan

>> Yeah,
I mean, ultimately we have computation, we just have a gamut of data coming through. And before we had AI, really I want you to appreciate that we were data rich and information poor, and one of the greatest advents for AI was the cheap hard drive. It was about data. AI is data-driven and it's married to that data. And before that, as we started storing the data, the CPUs and some of the technologies had difficulty working through the quantities of data that scientists would love to put in. If we look things statistically, if I use a small number of data points, the statistics is always going to be biased or skewed, and data really and increased amounts of data help us really get more statistical relevance. And so moving through different technologies, CPUs and then ultimately GPUs, really changed our interaction with that data and allowed us to get those statistical numbers up to make more relevant information valuable.
Dave Vellante

>> Thank
you for that. Anthony, I want to bring you into the conversation. I remember the early days of big data. It was awesome and it was exciting, but then we were drowning in data. We didn't have the tools or the horsepower, the infrastructure, the processes, the skill sets to deal with it, and that's completely changed. So I'm curious, in your work with addressing unstructured data specifically and the challenges that you and your customers see there, how do you see AI generally, maybe gen AI specifically, changing the way organizations are managing their information silos? And what role do you see things like governance play in that process? Give us your thoughts.
Anthony Dina

>> Yeah,
in fact, three things kind of come to mind. Just kind of give you a data point to anchor exactly what's going on. The world of data is ever-increasing. It's a big part of the, let's call it the digital trail from sensors and other scientific equipment. It's also the result of people who create data, create documents, replicate documents and those kinds of things. A hospital alone will generate 50 petabytes worth of data in a single institution. That's just one hospital. As we move from the science, from our understanding of the universe and how things work, to the intervention of patient and bedside care, we are collecting a large amount. And the challenges come down to three basic things. Number one, even though we're talking about bits, they're really atoms. So the physics of storing data really, really matters. And the amount of energy, the space, the consumption, the organization really needs to come into play. And it's a cost not just in terms of dollar spent, but a cost in terms of actually the fundamental resources that are available for other things. The second thing that comes to mind is that we've historically looked at data from the early days of the fifties through the eighties as a structured table format. And as we increased our interest to add scientific equipment that could scrape data and then we would take these analog signals and record them, we moved into a semi-structured, and the world of consumption increased and so did the complexity. Generative AI has made everybody a programmer. English is the most common programming language. And so with that, there's been a huge demand on providing structure to unstructured elements. We need to describe the image, we need to annotate the sounds that we hear so that we can find it, we can process it, we can curate it, we can cultivate it. And then the last piece is around governance. And this is really an important attribute to the challenges we face. Not only do we need to make sure the right people have access in the process, but we have to prevent the intrusion from cyber attacks. And so we have to think about chain of custody in a whole new way. We have to think at it not just at a file level, but at a data level. And these are responsibilities that include policies, upfront thinking, as well as core technologies.
Dave Vellante

>> Great,
thank you for that. A couple of things, 50 petabytes just in a single hospital. I love the stats. Alex Wang actually reported that JP Morgan Chase has 150 petabytes of data, and they told me, JPMC that is, that, oh, it's way bigger than that. And so we get kind of excited about the cloud, a lot of the experimentation going on in the cloud, but there's tons of data particularly at the edge, and obviously we're going to talk a little bit about that and I'm super excited about that. Did you have a thought on this, Chris?
Christopher Sullivan

>> Yeah,
I want to add to that really quickly. It's really important people understand that they think that cloud is where we do our work. We have too much data to be putting that in the cloud and getting that work done. We need to have on-prem and we need to own that on-prem and move it around between our different disciplines and domains very rapidly. And Dell really did help us recently look at how to do that and change the way that we interact on-prem with our data by helping us look at 800 gig networking between our data centers. And that really changes the way that we can interact with our data. And we are talking about petabytes. Trying to move petabytes into the cloud and then move it back out is prohibitive, so.
Dave Vellante

>> Yeah,
absolutely. I mean, virtually every customer I talk to is saying, yeah, they're doing some stuff in the cloud, but they're also building their own on-prem AI. They have to because of, well, the things that Anthony just mentioned, governance, but also just moving data. It's too darn expensive. So Joe, I wanted to bring you into the conversation. I've said that we talk about repatriation, that's sort of, bromide. I think it's all about bringing AI to the data. I think that's a better tagline, and I think that's going to really take off in 2025. And the key to that is collaboration. You've got to have partnerships. I know choice is a central theme in driving AI forward and it's fundamental to Dell's strategy. I wonder if you could give us your thoughts on choice and maybe an example of how your partnerships, whether it's with academic institutions like OSU or technology providers like Nvidia or whatever, are shaping the next generation of these AI transformations.
Joe Steiner

>> Thank
you. It's an amazing time in all of our lives. Dell technology has been helping customers build out world-class internal clouds for years, and we're going to continue to do that. We're doing amazing things with compute at the edge, but we also started looking at how are customers working with data on the way down to the infrastructure? And that enables people like Chris to be able to use incredible tools. And Dell's approach is to try to create an open ecosystem or embrace the open ecosystem. And that means we want to give our customers choice. So as we look at the various tools that are available for our customers to work with data on the way down the infrastructure, it kind of comes back to some of the things that Ant was talking about with security and governance and can we provide a common control plane or can we enable Chris to leverage more network connectivity and to be able to process information and store the metadata and catalog information from the various structured sources around their environment.
Joe Steiner
I
think that open nature is going to play a really strategic role going forward. And whether it's Chris's team and the amazing thing that he's doing there or what we're doing in hospital systems and working with customers that have silos of data across their hospital systems and how we're linking it together. Or in other spaces where retail for example, we're able to help customers have a better view of their customers. When they try to do a customer 360 view, they for years have created data in silos. And with our open ecosystem and our software stack and being able to enable them to have connectivity into those structured sources and to be able to streamline how they get access to the customer 360 view, it's meaningful to their organizations.
Dave Vellante

>> I
mean, it's all happening. We were just down at NRF recently doing a special series, and thinking about those silos and breaking those down from things like supply chain, fundamental, and I'm glad you brought up metadata. We've been trying to harness metadata for many years and we're finally there, I feel. But I wonder, Chris, from your perspective, maybe we can get into some of the examples. Start with the hurdles that you face when you're trying to integrate AI methodologies with these diverse data sets. But then, I'd love for you to get into some of the work you've done with whether it's real-time plankton research that might be particularly relevant, and how the technology infrastructure, whether it's power scale or AI servers, data management, reference architectures, how technology is helping you deal with the challenges. What role does it play?
Christopher Sullivan

>> I'm
going to give you two examples, and I really like the way that you said bringing the AI to the data because really in the end, shipping data around whether it be through networks or FedEx, really is prohibitive and really takes the temporal aspect of getting information out of that data and delays it. And so I have projects where we're trying to monitor the forest for endangered species like the spotted owl, and that data brings in hundreds and hundreds of terabytes every month. And ultimately, trying to bring that back from the forest is very prohibitive. And if we were able to basically process it at the edge and send maybe through the cloud systems or through the cellular networks just CSV output, we would really dramatically change the way that we can look at the forest and we can really enable the forestry groups in lumber companies to get out there and take advantage of the forest that we have and the timber and the things like this. And so it's really about an enablement of the commercial side and protecting the endangered species by leveraging edge AI to end cloud services to help us. Now, we still have to train the models locally on large data sets, and this will be a point to bring some of that data back to retrain. And that's where we use, again, a lot of our on-prem technology, but really, really getting that technology out there and working with ranger stations where we can put some server or some work class pieces of equipment to actually do some of that. When we look at the oceans, if I take and I run a ship out into the ocean, let's say we have a new ship that we're building to do this specifically called the Tommy, that'll cost me a million dollars just for 10 days in the ocean. And that 10 days in the ocean will generate just for one project that we're running on that run will be a hundred to 200 terra of output as we collect that. And the problem is if I don't actually do something at the edge, I may have collected meaningless data, I may have just put the cameras in the water and got nothing back for it, and I've spent the million dollars. And so by having some edge AI and some technology actually built into the ship, we can actually show that we're able to get data, we're in the right spots, and we're able able to actually understand what's going on. And when we look at the plankton monitoring, plankton produces 50% of the earth, the oxygen we breathe on this earth. And if we use that as a monitoring tool, it's really like the canary in the coal mine to helping us understand how we're changing the planet. It's also 17 to 20% of the food per capita on this planet because it is the basis of our food web. And so really we want to be out there and we want to be monitoring the plankton, and we really want to be doing it in real time because if it takes us too long, the data becomes meaningless. And so this is how we've started to put Dell Technologies onto the ship to help us actually do it in real time or semi real time. And one of the reasons we had to pick Dell Technologies was because of compliance and governance. We have lots of different groups coming onto that ship. Government groups, different agencies, and all of them have to meet compliance. And if I pick different random hardware, I may have to struggle and get those pieces of hardware to meet compliance across a gamut of different government agencies and granting groups. And so Dell allows us to hit all those compliance, get real time, put it out on a ship, and build a stack that really is heterogeneous and has a lot of different pieces to it, not just one piece of technology.
Dave Vellante

>> So
speed and time to results really matters for you because, what'd you say? You're spending a million dollars in 10 days, was that the stat?
Christopher Sullivan

>> Yeah,
I mean, that's what it costs me. It costs me a million dollars to run the ship every 10 days, and if I come back with no data, that's not okay. And so putting a stack, we did this as a test on our ship that we're currently running and we were running 40 gig cables down the halls and all kinds of stuff, and it was chaos. The new Tawny has a data center built for this, and I put a hundred gig network down the dock to that ship so we can offload the data from that data center. And so we have an entire stack that will be going onto that data center, which includes file space, power scale, power edge processing machines, GPUs, all of those pieces so that we can flex any type of algorithm out there at sea while we're spending those alarming amounts of dollars to be out there.
Dave Vellante

>> Yeah,
so much more value out of that million dollar spend. I want to, Anthony, bring you into the conversation again. When you think about managing unstructured data and the silo effect across many industries, you work with not only organizations like OSU, but you're in healthcare, you're in manufacturing. What strategies are you seeing that are working for customers for maintaining data integrity, security? When you think about the heightened risks around AI and evolving privacy regulations and ethical standards, what are you seeing there that works?
Anthony Dina

>> Well,
look, there's lots of different industry-led use cases, and Chris gave us a really rich perspective on what it takes to conduct the science, to understand the fundamentals that's happening in our planet. But that level of collaboration amongst the science community is not just for that particular domain. The challenges we see when we deal with data silos is that they're actually sourced from human silos or organizational silos, and this is an important factor. So the structure and format of data that gets generated is oftentimes the result of a very specialized business process, and that process has a different schema. And as we try to understand the fundamentals of an operating environment in a corporation or the operating environment in a planetary system, we have to go beyond those individual silos. We have to link together those elements that are going to create a unified perspective. And so a big piece of doing this is being able to put in place the triggers to be able to identify the data source, who owns it, how is it used, and to track that chain of custody along its workflow. And this is how we overcome the differences. Now, we've also seen some innovative approaches. We've seen a distributed mechanism putting AI at the edge or putting compute close to the data. We've also seen possibilities of federating that data. In other words, I will only fetch in real time the rows and columns of which I really need. I don't download the entire table. I download a fraction of it. And so these approaches allow us to be a lot more buoyant, allow us to be more adaptive. Now, when I think about being adaptive, I think about other industries like media and entertainment. If you look at how stories are produced, they begin with a script. They go through a funding process, small numbers, large numbers of small contractors are put in place to actually put the finished product together. It is thoughtful, plan-forward approach. But if you look at sports, sports is a very dynamic. Things happen on the field we didn't possibly expect. And so what I'm seeing is a shift to take game console technology and apply it to media production. We need to be adaptive and responsive, and you can carry that metaphor in hospital systems. We want children that get out of the hospital bed faster because we've now incorporated electronic health records, PAX imaging, and doctor's notes in a fundamental way that delivers a recommendation so that the kids can move out and grow up in their own way.
Dave Vellante

>> Joe,
I'm interested, I want to stay in the topic of unstructured data for a minute and scaling AI. We all know data is the lifeblood of good AI. How do your products, whether it's again, PowerScale and whatever products are relevant here, how do they drive efficiency for your organization, your customers, in terms of how you handle persistency? What have you seen with regard to customers unlocking any new patterns or breakthroughs that excite you?
Joe Steiner

>> Yeah,
I'll subdivide it into three areas. The first one is how customers land data. As we talk about the network improvements that are being made in the protocols that we support in our PowerScale platform, we want to make it as open and as rich as possible for organizations to transmit data to us and for us to be able to land that data in a format that is secure that allows them to then pick up and process that data. So we think about the various processing engines on our PowerScale platform, being able to interact with various engines that are commonly used across companies, things like query engines or logging in event engines or data protection engines. The scale-out nature of PowerScale, and with each node having CPU, memory, disk cache, capacity, networking, we're able to provide massive amounts of bandwidth and performance to those engine types that are processing data for customers. Now, whether that's an application living in an HCI environment that's able to reach across the network and get access to the content that they landed on our platform, that's an amazing thing we're able to help our customers do is process data in place without propagating it across the organization. Unfortunately, oftentimes data is copied and then moved around as part of the pipeline process flow as customers ETL or modify data, and it gets copied around 10, 12, 15 times. Our goal is to enable our customers to land that data, process it with performance capabilities like our Nvidia SuperPOD certifications stack that is part of our PowerScale platform, and give our customers choice as to what type of node type and performance attributes do they need for those engines. And the third element is protecting it and retaining it for long periods of time. We've built into our platform a rich array of security features from cyber resiliency tools to worm and retention capabilities to give our customers the flexibility to be able to move it from one tier to the next tier, driving down cost to serve. But more so having a namespace that they're able to manage data in like 50, 100-year time periods, not having to have human intervention with moving data over the course of that time, keeping that namespace alive and archiving that content, but making it available for Chris and other companies to do amazing things with the data and apply data science techniques to various verticals that choose to buy our platforms.
Dave Vellante

>> But
first, start with how do you plan for the future? Thinking about data frameworks, how do you make them resilient? What strategies do you recommend to customers to balance the performance, the scalability, the security in industries that are so varied? Healthcare, manufacturing, sciences, et cetera, entertainment? How do you handle that?
Anthony Dina

>> Well,
so at the risk of sounding like a humanist masquerading as a technologist, I feel like there's three fundamental areas that's going to make it a durable or sustainable practice for any industry. And the first is we have to recognize that we've deployed these AI algorithms in some sort of business process or scientific discovery process to enable a human decision. This is a human at the center of this activity. And so we've got to shift the frame from being so task-focused to team-focused. We've got to think about the implications for us all. So that's the first thing. The second thing I think about is around the ethics. Are we collecting data we should? Are we processing it because we can or because we should? And those fundamental ethic-oriented questions are I think needed in order to avoid the negative externalities of those choices. I think about other institutions like Case Western, which has a multi-disciplinary process to discover and deploy technology, and I love how they do training in different ways and they are building a brand new building. The ethics department will be a part of their AI function. And so I think about this idea of collaboration across disciplines, and it's this kind of fundamental construct when we take different points of view and we put them in the same room looking at the same object or the same process that we really have a more durable and sustainable process. So cross discipline, ethics-oriented, and really fundamentally never forgetting the first principles, which is this is all about human progress, human understanding, and our human relationship to this world.
Dave Vellante

>> So
I wanted to actually close on this, and you touched on it, is kind of what the future looks like. So Chris, we went from Dave pulling up nets, putting material under a microscope and with pen and paper logging, and then you went to sneakernet with hard drives, and now you're in real time. And you just talked about the future, which is a vastly increased orders of magnitude observation space. So I wonder if we could close with your thoughts on what you expect to see, what you'd like to see in the next, even go out on a longer term horizon, three, five, even 10 years.
Christopher Sullivan

>> So
when we look at this over the next, we have projects like the Ocean Observatory Initiative, which is a 25-year grant and stuff like this. Really what we're all talking about is things like the carbon cycle and monitoring the carbon cycle because that's really what affects the world around us. And what we're trying to do is understand and get data from any and every aspect of that carbon cycle that we can do. All the animals on the planet, whether it be plants, plants, animals, all of it interact with that carbon cycle. We interact with that carbon cycle. And so really the question becomes over the long haul, how does all the science and all the data translate into this larger world that we look at? And when we start to be able to look at Nvidia and things like the Omniverse and being able to take all those data points and combine them together and visualize them to understand the world, we will be able to start projecting using those technologies where we're going to be in five, 10, 15, or 20 years, and the Ocean Observatory Initiative and the data that we're collecting is right now at minute interval. And so we're going to be able to start incorporating that to not only project, but as we make changes now, what those projections will mean.
Dave Vellante

>> Interesting.
I mean, today we store things in strings that databases understand, and now we're speaking programming and human language, and we're building representations of organizations and ecosystems that are people, places and things. So I wanted to close with Anthony, and then Joe, you bring us home. Anthony, your thoughts on the future, where you'd like to see the industry headed, Dell's role in that future?
Anthony Dina

>> Well,
look, so Dell's role in everything we do is around trying to empower our clients, our customers with choice. And we think an ecosystem really matters. We think that things can come and go, and if we can drive a level of efficiency, if we can create a bigger impact, then that's where we will invest.
Dave Vellante

>> Nice.
Hey, Joe, how about you bring us home?
Joe Steiner

>> Yeah,
I agree. You can count on Dell to give you the technology capabilities to interface with. And I can't wait to see, as we work with customers like Chris and others, we are limited by their creativity. As customers work with data on the way down to that infrastructure, it's going to be amazing things that they do through their creative minds that really changed the way we view this world. So I appreciate the opportunity, and Chris, thank you for all you have done and continue to do in your space. Appreciate your time.
Christopher Sullivan

>> Yeah,
well, we really do appreciate the partnership with Dell and the opportunities you provide us. So thank you.
Dave Vellante

>> Hey
guys, I want to thank you for your time. I mean, you've really been generous of a wonderful conversation and it was fantastic having you. I hope we can see each other at the various events, and of course, online as well. So thank you very much for your insights.
Christopher Sullivan

>> Yeah.
Joe Steiner

>> Thank
you.
Anthony Dina

>> Thanks
everybody.
Dave Vellante

>> Okay,
I'm just going to close out here. We've talked about data-driven AI growth, the importance of partnerships and collaborative strategies to drive innovation, whether it's AI or other related technologies. It's all a mosaic that comes together. And look, you got to pick good partners so you don't have to worry about all the mundane and all that heavy lifting so you can focus on your mission. You want partners like our sponsors, Dell and Nvidia, who made this conversation possible. Thank you. And look, you want to refine and optimize your data and AI strategies. It starts there, and you got to keep your options open because things are moving so fast. And don't forget, you want to visit Dell at Nvidia GTC, which last year in my opinion, was the most important conference in the history of the industry. It's March 17th to the 21st. It was unbelievable last year. The place is packed. It's at the San Jose Convention Center. It'll be fire marshal full. I remember last year, Jensen went by the Dell booth. There was all kinds of noise. We looked up. We thought it was Taylor Swift in the house. Well, no, it was Jensen. So check that out. Go to the Dell booth. Check out Dell and Nvidia's latest innovations and solutions. And thank you for watching this special CUBE Conversation in our power panel. I'm Dave Vellante, and we'll see you next time.
person_outline 14
Brendon R.
pk
prathyush k.
FE
Frank E.
Kent L.
FF
Frank F.
RD
Raju D.
KS
Ken S.