We just sent you a verification email. Please verify your account to gain access to
SC24. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For SC24
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for SC24.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
SC24. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to SC24
Please sign in with LinkedIn to continue to SC24. Signing in with LinkedIn ensures a professional environment.
Supercomputing 24 is now focused on AI infrastructure as well as supercomputing for HPC. Scott Bills, VP of professional services at Dell Technologies, emphasizes the importance of understanding and managing data for AI applications. Organizations need to identify the right data sets, classify them, and ensure high quality and compliant data for model training and deployment. Automated solutions and AI optimized data pipelines help organizations streamline data integration and meet AI system demands. Dell’s AI and data management solutions leverage their expe...Read more
exploreKeep Exploring
What are some important considerations when managing massive volumes of AI-specific data in order to reduce complexity and maintain performance?add
What factors are important to consider when orchestrating data pipelines for different use cases?add
What are the key differentiators of Dell's AI and data management solutions, and how can they be beneficial to various industries?add
>> Hello and welcome to the Cube's coverage of Supercomputing 24 where we're live on the floor in Atlanta with even more coverage coming from the Cube Studios where I'm at. SC24 is now more than just about supercomputing for HPC, but also AI infrastructure is taking really a central theme of the situation down there in Atlanta. And with that, I really want to dive into a discussion about how organizations need to evolve their data strategies to build these new AI enabled applications. Right now I'm joined by Scott Bills, who's the VP of professional services at Dell Technologies. Welcome on board, Scott. Again, both of us we're just in Austin having these conversations about how AI is transforming things and how AI is really fed by data, so I'm glad to have you on board.
Scott Bils
>> Yeah, glad to be here. Excited to talk about data, which view as a critical issue enabler to driving value from Gen AI. So excited to talk about it.>> Yeah, I agree. And I think when you start to look at how organizations can effectively manage massive, diverse volumes of AI specific data to reduce complexity and maintain performance, what are some of the things that you're talking to customers about getting that management and really getting a better understanding and handle on that diverse data?
Scott Bils
>> Well, it's a critical first step, and as we talked about, data is the critical driver of getting value from AI and gen AI. And typically the first step in getting that value is to understand what are the data sources that are going to enable that value. That requires you to A, know what the right data sets are, B, where they're at, and C, how you need to think about classifying them and labeling them. Not every data set, not every file, not every document's going to be appropriate from a sensitivity point of view for every AI use case, every model, and need to think through that comprehensively and have a structured approach towards data if you want to scale AI and increase use case throughput in the enterprise.>> Yeah, I couldn't agree more. And I think one of the things that when we look at the strategies behind this is really ensuring high quality compliant governed data that is brought into AI reliably for being able to do model training and deployment. How are you really helping organizations with that? Because garbage in, garbage out and really compliance and governance and high quality are kind of the themes there. What are you doing with organizations to help them with that?
Scott Bils
>> Really it's from beginning to end. It's helping them as they're thinking about their use cases and where they're going to get value from AI. Helping them identify, assess the right data sets, helping them to develop a centralized catalog, helping them to classify and group the data according to applicability and how and where it can be used. It's helping them then integrate the data into the appropriate use cases and then automate and orchestrate that to ensure that you have the right velocity, including the right access to the right data sets to support the use cases. But it's really that life cycle view all the way from identifying the data sources, classifying, curating, cleansing, and then automating, ingestion, and scaling. And it's helping provide that end-end platform is really what we do and what organizations are going to have to do comprehensively to enable the AI opportunity.>> Yeah. And I would assume that a big piece of that part of it is really helping them, like you said, in the data quality. And I would assume that customers are saying, Hey, we want to reduce our risks and make sure we're compliant in there as well.
Scott Bils
>> Absolutely. So it's understanding the data sets you have, doing the cleansing, the curation up front, but then also from an automation standpoint, making sure you have kind of pipelines and orchestration set up to ensure that data quality over time and that it's a scalable model and approach to that issue.>> Yeah. No, I think that's key. And actually it leads me right into the next question, which is really around automated solutions that you can provide to help address these inefficiencies in the traditional data integration process, because you're really trying to meet AI systems speed and volume demands. What are you doing to help organizations really address that part?
Scott Bils
>> It's really not just the integration piece, but also leveraging tools, technologies, and platforms to orchestrate data pipelines for different use cases. Different use cases we'll access different data sets. It's incredibly important to understand based on the data types, based on the characteristics of the model, the LLM, the use case, the type of performance you need. And to ensure that the data throughput, the way you've automated and orchestrated that model is going to drive the scale, the performance, the responsiveness you need to match the outcome and deliver the value.>> And a lot of that has to also be back to the other question, was that the governance aspect of it and tying the pipelines and who can see the data and getting it to the right persona on time.
Scott Bils
>> Yeah, and it's interesting. That's another important part of the data catalog. We talked about the importance of classification and having a broad gated governance model. Another important part of a catalog is to provide improved discoverability for folks that are out deploying, defining new use cases in the organization, be able to identify data sources that are going to be the most value to them and improving their ability to identify access and integrate for data sources that they may not have been aware of across the enterprise.>> Yeah, I think you hit on another great topic, which is AI specific data catalogs, which are really needed to enhance data sets and management of those data sets, discoverability and compliance. That has to also factor into a lot of what organizations are looking to do when they want to know something as simple as lineage of what's going on.
Scott Bils
>> Yeah, no, absolutely traceability. It gets back to the data quality issues, being able to track that lineage and who's touched that data. And then the metadata as well as we think about data catalogs, an incredibly important part is a metadata about the content or the file itself, but also the metadata about the content that sits in the file or the object. And when you think about a comprehensive data catalog, it provides that level of detail over and above the lineage, but other critical metadata items that help you improve the discoverability and address the governance and compliance issues around data and AI.>> Yeah, and I know you guys are doing stuff at the actual file and object layer to actually expose that in particular in a power scale to be able to expose that metadata directly to the Dell data Lakehouse as well, which is again, how do you help people shortcut that? But one thing I wanted to dive a little bit deeper into is, because we kind of danced around it a little bit, is the role of AI optimized data pipelines and how they play in automating, scaling and enabling the real time and how they empower dynamic AI applications. What are you guys doing to help organizations kind of build the right pipelines at the right point and in the right way?
Scott Bils
>> Yeah, a lot of it's understanding based on the use case requirements, the platform requirements they have, what are the performance levels, what's the throughput they're looking for, time to first talking. Think about all the critical elements and performance dimensions around LLM and use case performance, ensuring that the customer has the right tools and automation orchestration in the back end to support and hit those metrics. It's not a one size fits all approach in terms of pipeline orchestration and automation. And understanding the use cases, the performance requirements and dimensions around that is critical to coming up with the right answer, the right solution. Again, it's not one size fits all. It depends on the use case, the customer, and what they're looking to drive.>> Yeah, I think that is so critically important because everybody, although they're looking for the easy button, especially in data and the data stack nowadays where data platform, it's so complicated, and I think there's so many moving parts. Data is in islands of auto... We used to have islands of automation, now we have silos of data all over the place. And bringing that together really is one of the things that you're really focused on, and I think you have some services around helping organizations really get their data cataloging and pipeline implemented correctly to streamline that integration. Why don't you kind of help us understand some of those, what you're doing in that space as well?
Scott Bils
>> Yeah, it's really identifying and standing up or deploying the actual catalog for customers, helping them roll up the sleeves and actually working with them on the classification. The labeling exercise, helping them think through what's the right framework and how should they apply it and how should they integrate that into their governance processes. For specific use cases, identifying the best and most appropriate data sources. Helping to initially curate, cleanse the data for that first run, test run of the model, and then help to build that into the overall orchestration pipelines. So it's really diving in at... It's two levels. One is helping to develop the overall data catalog applicable across multiple use cases, but then on a use case by use case basis, helping to implement pipelines, orchestrate, automate, and ensure the data quality around that. T.
He other piece I wanted to mention too is the work we do around the Dell data Lakehouse where a lot of customers are leveraging that platform as well as a foundational element of the AI factory. We bring to bear a lot of our professional services capability to stand up that platform and then help implement some of the capabilities we talk about, leveraging third party tools as well.>> So I think that's key, because I think being able to put this all together and really be able to nail it, because data is the lifeblood of AI, makes a lot of sense. From your perspective being at Dell, what really differentiates Dell's AI and data management solutions and the services that you wrap around them? And really how can that be beneficial to various different industries? Because not every industry is the same, and they may say, Hey, my view is I'm a snowflake over here. How do you help organizations across industries with the solutions and services you're providing, and why is Dell uniquely positioned for that?
Scott Bils
>> Yeah. Look, when you take a look, I think 40% of the world's data is stored on Dell storage and infrastructure. So we're uniquely positioned to understand the challenges around data management, data engineering, and managing data fabrics. So a lot of the differentiation just comes from our lineage and strength we have around storage and data infrastructure and working with customers, develop, deploy platforms around that to optimize the outcomes they get from their data. But a lot of the differentiation is really tied back to that and the unique set of services we've built over time, much earlier than AI, to help them. You may recall the era of big data and data analytics and then AI and now Gen AI. We provided capabilities across that whole journey. We've built over decades and excited to bring those to bear now in helping customers with their AI journey.>> Like we like to say, everything that's old is new again. Nothing ever dies, it just reinvents itself in a different way. And like you said, big data. I mean, I don't know what's bigger. We're talking exabyte data now and having to deal with that for AI. So I think again, just killer in that way. So last word, I know organizations are always hungry to read more, especially when we get into the technology just enough here, but people want a deep dive. Where should people go to really get a better understanding of what you can offer and how it tied together services and solutions?
Scott Bils
>> Yeah. Look, we have great content on Dell.com that provides overviews of our newly announced data management services. Would also encourage everyone to go take a look at a blog we published this week in conjunction with our announcement by Paul Taylor on our services portfolio team that provides a little bit more depth and detail around the exciting new web services we've launched this week.>> Yeah, I love it, Scott, because having you on, it talks about not just the tech, but it's the people aspect of it, which is so key as well. And I think your services portfolio, helping people get to that level and be able to take it on themselves is just, I think, killer. I love the mantra there. So thanks for coming on board and explaining all this today.
Scott Bils
>> Yeah. Yeah, no problem. Thank you. Enjoyed the conversation.>> And thank you for watching this segment live from the floor of SC24, and the Cube Studios where I'm at. Stay tuned for more SC24 on the Cube, the leader in tech news and analysis.