Data Protection & AI Summit | Mitch Seigle, Spectra Logic

Clips
News
More from Data Protection & AI Summit

Christophe Bertrand

Principal Analyst

SiliconANGLE & theCUBE

Mitch Seigle

CMO

Spectra Logic

Spectra Logic brings tape technology into the modern era

Tape drives might seem like old news, but according to Spectra Logic Corp., they’re a part of the modern data solution. Now reaching its 25th anniversary, the company focuses on tape-based solutions with an eye toward current cloud and artificial intelligence-based trends.“Tape is 75 years old,” said Mitch Seigle (pictured), chief marketing officer of Spectra Logic. “It’s had an incredibly long run, but it’s far from archaic

play_circle_outline Unraveling the Legacy: Mitch Seigle and Spectra Logic's Journey from 1979 Tape-Based Solutions to Modern Data Management

play_circle_outline Unlocking AI Potential: The Crucial Role of Tape in Data Storage and Protection for Enhanced Model Training and Archiving

play_circle_outline Tape's energy efficiency and cost benefits compared to online storage solutions.

play_circle_outline Object storage evolution integrates tape, with APIs simplifying access and management.

play_circle_outline Tape as a service: new model for consuming tape technology through cloud interfaces.

play_circle_outline Embracing Cyber Resilience: The Essential Role of Tape in Data Protection and AI-Driven Backup Strategies

Info
Transcript

Mitch Seigle, Spectra Logic

Christophe Bertrand

Principal Analyst SiliconANGLE & theCUBE

HOST

Mitch Seigle

CMO Spectra Logic

In this Data Protection + AI Summit segment, theCUBE’s Christophe Bertrand talks with Mitch Seigle, CMO of Spectra Logic, about why modern tape is at the heart of AI-ready data protection. Seigle outlines Spectra Logic’s 45-year focus on storage, from the modular Spectra Stack to the multi-exabyte TFinity Plus, showing how high-performance computing sites like SLAC rely on tape for archiving petabyte-to-exabyte data sets.

The discussion explains how object storage front ends such as BlackPearl make tape as simple to consume as S3, while the new Tape A... Read more

explore Keep Exploring

What is the background and current focus of the CMO at Spectra Logic? add

What are the main use cases for AI that utilize tape storage? add

What are the advantages of using tape storage for large language models (LLMs) compared to online storage? add

What has been the evolution and current state of tape technology in the context of modern IT storage solutions? add

What is TAPAS and how does it relate to the modern use of tape storage? add

What role does tape storage play in the future of AI infrastructure and information architecture? add

bolt Powered by CUBE AI

Mitch Seigle, Spectra Logic

search

Christophe Bertrand

>> Hello and welcome back to the Data Protection and AI Summit. Very pleased to be with Mitch Seigle here, CMO of Spectra Logic. We're going to be talking about Spectra Logic and about tape. So let's get into introductions. Tell us about yourself, Mitch, and tell us about Spectra Logic.

Mitch Seigle

>> Well, thanks, Christophe. It's a pleasure to be here with you on theCUBE. First, a little bit of my background. I'm the CMO here at Spectra Logic. I'm a storage industry veteran. I've been around the industry for more than 30 years, working across all of the various segments of storage, including software and hardware and in the disk, the flash, and now ultimately in the tape space. So seen it all, been there, done that, watched all the trends, and really, I'm excited because we're sort of entering a new era here around the intersection of data protection and AI, and in particular the role that storage plays to enable all of that growth. A little bit about Spectra Logic, if you're not familiar with the company, we're 45 years old. We are uniquely dedicated to storage, so we're deep experts rather than generalists in the space. We were founded in 1979. Sort of a parallel story that may sound familiar to you. Our founder started the business in his college apartment while he was still a student at CU Boulder. And the company's headquartered today in Boulder, Colorado, where we exclusively focus on the development of tape and tape-based solutions, and in particular those that are object-based now to interface very easily into all of the infrastructure that's so modern today. So despite being 75 years old, the technology is as fresh and vibrant as ever and actually is continuing to see innovation on a regular basis.

Christophe Bertrand

>> All right. And if I may quote you, I think you said, "Well, there are two types of IT person. The people who know they use tape and the people who don't know they use tape. " So I'll give the answer, which is, if you don't know that you're using tape, it's because you're using cold storage in cloud. Actually, if you use cloud, very likely some of the data you're leveraging is at some point or has been on tape or will be on tape.

Mitch Seigle

>> Well, that's absolutely true. And so thanks for the little pun there. It's a little bit tongue and cheek, but the reality is that cloud- based cold storage services have a very large tape tier behind it. And so there are many users today that are archiving or backing up to the cloud on things. And I'll use Amazon S3 Glacier as an example. Most people are familiar with that service that is very heavily backed by tape. And so there are a lot of reasons for that, and we'll probably talk a little bit more about that through the course of the conversation.

Christophe Bertrand

>> Absolutely. And we'll also connect the dots for our viewers. Tape and AI and data protection, why do we have those sort of three terms in the same sentence? And actually, there's a very, very good reason. And one, I'll give away one hint, which is, well, one of them is scale, but we'll talk about use cases also in a conversation in another session we'll have with you and a couple of colleagues and ex-colleagues. So let's talk about your customers because I think that's the most important thing to position how much work you've done in the past 45 years with what is actually a technology that is full of innovation with one of the most interesting roadmaps with a lot of capacity and scale projected into the future, but let's talk about customers first. You brought us what I call the wall of logos. So maybe we can just show that here on the screen, and you have lots of great customers. Maybe you can point to some of those logos and what you do for them and with them and give us a sense for maybe some AI initiatives that you've already observed.

Mitch Seigle

>> Well, we're very privileged to have some of the world's most impressive customers in a variety of different what we've considered to be vertical markets. We generally serve the high-performance computing community, and that's a key call-out here for folks that are thinking about artificial intelligence, where much of the infrastructure that's used today to deliver that is built on top of high- performance computing environments that have been well- established and well-proven. And of course, our tape technology forms a very large archival backbone in those environments. As you can see across some of these logos, we have folks that are in the National Laboratory space, Lawrence Livermore, for example, SLAC National Accelerator Library, which is just up the street from the studios here in Palo Alto, California, and has just this week been in the news. I don't know if you happen to see this, it was carried in the New York Times as well as many other outlets. The new Rubin Observatory in Chile has sent back the first images, and that data is actually transmitted from the telescope site in Chile to SLAC in Palo Alto, where ultimately the images will be stored on Spectra Logic TFinity Libraries that are installed at SLAC. And I kind of thought about this a little bit today because we're very familiar with the idea that many of our customers store the world's most important data. So data of global impact or global importance, weather data, for example, climate studies, scientific research, physics experiments, and so on. But now with the data that we're seeing of so many galaxies with such resolution that we've never seen before, I would venture to say that we've gone from global importance to galactic significance.

Christophe Bertrand

>> That's a great quote. We may have to use that as the subtitle this session, but it is true. I mean, the wall of logos as I like to call these types of slides is very impressive for a variety of reasons. Number one, clearly these are very large scale data users, and scale is the name of the game. And for AI, scale is and will be the name of the game for sure. So you can start seeing the connections here, number one. Number two, it's a variety of industries that you serve. There are lots of logos I know you cannot put up because as you can imagine, there are lots of big users of data that you can't really talk about. More importantly, some of the users have what we call HPC processes in place, high powered computing. So let's maybe talk quickly about what is an HPC process in general, and then we'll get into more of the product portfolio. But let's start with that because SLAC is typically use as your HPC example. So what is HPC? Why should people care and how does it connect to AI?

Mitch Seigle

>> Well, high performance computing in simple terms is really an infrastructure that is purpose built for processing power, performance, scale capacity, and also importantly at the highest levels of reliability, resilience, and lowest cost. When you talk about data at scale, just that term at scale means different things to different people. So let me give you an example that when we're talking about the data in a high performance computing environment, today, we have many customers that are archiving data onto our tape technology that are in the hundreds of petabytes, and we actually have customers that are now at the one exabyte or more level including a single library that's now at the exabyte level. So this idea which folks may have heard of around petabyte class storage, I would beg to differ that it's really crossed boundary quite a while ago in that we now have exabyte class storage. Spectra libraries are capable on the upper end of as much as six and a half exabytes of capacity compressed within a single system and raw capacity above two petabytes. Sorry, two exabytes.

>> Exabytes, right. So actually,

Christophe Bertrand

>> we should just take a look at the portfolio real quick. You have a pretty interesting lineup from the Spectra Stack to the TFinity plus that you mentioned earlier, and in the context of HPC but also other workflows. The idea is that you want to be able to ingest quickly, you want to be able to provide the data back quickly at scale to various processes in a number of industries. And again, I can't stress enough, it's not just HPC. It could be media. It could be in any industry as a matter of fact. So let's talk about your portfolio, walk us through the entry level all the way to the enterprise library.

Mitch Seigle

>> Sure. Well, thanks for that. And I think we have a slide up here, and you can see on the left that we have the Spectra Stack that is basically a rack-scale type of an environment in which this is modular. It allows you to start very small in terms of your capacity and grow to as large as about 17 petabytes within a single data center rack.

Christophe Bertrand

>> If I may interject here, I like that you say 17 petabytes is small.

Mitch Seigle

>> Well, it is in the context of what we're talking about here. When you think about media and entertainment customers that are storing entire film libraries that need to be preserved forever, so think about the idea that tape really becomes the final destination in which we're preserving, protecting and defending this data for all time. It's this idea that data's become forever and so that you have to have that scale. 17 petabytes is actually relatively small in the grand scheme of things, but for many organizations, it's certainly adequate. And as you kind of move through the product line, we also then have mid-range libraries that will scale up to 50 petabytes, which we have research universities where a single cube library, for example, would be well situated to handle all of their archival and backup needs in terms of storing that. Our Spectra T950, which we show a single frame on the slide, but that actually scales out horizontally to eight frames and up to 300 petabytes. And then of course the flagship, which is the TFinity Plus, that library actually is really designed for longevity. We first put our TFinity series into the market more than 15 years ago, and we have many of them that are still running. So we like to say that we're the oldest thing in the data center, and we're proud of that because we designed for longevity, which is a very different design center than storage that typically is replaced on a three to five-year cycle. Those large TFinity systems, as I said, will scale up to as many as 168 tape drives. So you get massive parallelism in the library, close to 57,000 slots available in those libraries to store your actual tapes, and they'll scale out to 45 frames, give you multi-exabyte capacity. So we're talking about the world's largest storage systems of any type are actually Spectra libraries.

Christophe Bertrand

>> Right. And in the context of AI, we know that there will be a lot of or massive creation of data moving forward in the future, very likely because of AI-related processes. We're talking zettabytes globally of new data. So clearly, it's going to have to go somewhere, but if you think about AI now there are very good use case to have maybe high speed disks or memory type of storage systems with compute close to each other to serve certain processes. But for a lot of processes that are AI processes today already, you can't afford for a variety of reasons to have the data on disk or in cloud, even on tape in cloud. So before we get into maybe more of the technology, real quick, what are the top three or four use cases you can think of today for AI that leverage tape and will be leveraging tape?

Mitch Seigle

>> Well, that's a great question. I think there are a few key use cases here where certainly during the training phase or retraining phase, as you're developing LLMs, there's a frequent need to do entire trainings and that you need to store a massive amount of that data so that you essentially have a rollback point, and you need to be able to store that effectively. So imagine if you've got a multi-petabyte LLM, and you need to have multiple versions of that, the online storage of that, not only is it expensive, but it is very energy consumptive. It requires a lot of footprint, whereas tape consumes no energy when the tape's not being used. So it's very, very energy efficient, the most energy efficient of all the storage technologies. It's extremely dense. And so if you think about this idea of cold data representing 70 or 80% of all data in an environment, the more of that data that you can move from online storage when it is cold or cooling down to tape, the more you can free up your environment to provide more AI processing capacity, and so it's kind of the best of both worlds. It's low cost, it's highly sustainable, it's the most energy efficient of all the storage media that's available today. And of course, it takes care of sovereignty issues, control issues, which are of critical importance, especially in the AI era, where having control, knowing where your data sleeps at night is a very, very important thing. I think there's one more aspect that we need to think about in use cases in general. The use cases today are all about building out this environment and beginning to exploit it. And there are many different variations on that. I think there will be additional use cases that will be emergent that are even larger because with a technology like AI, we tend to see that regulation follows and why because it has enormous potential, but it also carries enormous risk. Wherever there is potential and risk governments tend to want to control that. There is certainly every expectation. We're beginning to see the first laws emerging around all of this compliance. And so imagine that not only will you develop your LLMs and want archive those and provide various levels of protection, think about things like signals intelligence for national security purposes, where we'll want to potentially capture every query issued against an AI to look for patterns or so-called noise. If you recall how we typically do this by looking at signals that are intercepted, people will be looking for those queries or conversations that are going on in AI. They'll probably want to capture all of the results for litigation purposes or discovery purposes. So it's just simply an extraordinary amount of data. And then I think one of the most important observations is that because AI has the ability to look at volumes of data in scale beyond human capacity to analyze and detect patterns, we will now want to store data at the most atomic level possible. And when I say atomic, it's down to the finest level of granularity and we'll want to, if it's affordable and scalable, keep everything forever. Because as AI continues to advance in the future, it will have a resolution and an ability to detect patterns that humans are not capable of because of the scale of the data.

Christophe Bertrand

>> And that's a very good point. At the same time, I think there's also a conversation around eliminating data that may be redundant because obviously you don't want to have things twice. Maybe you archive it, which is another use case discussion we had with another company that you know well, Congruity 360 very recently, but that combination is very powerful. And to your point, I think you're correct. You don't know what the future holds. The algorithms are only going to get better. So five years from now, 10 years from now, you may want to go back to re-inject a bunch of collected data to build a new model or improve on a model. And actually, that's something that already happens. You mentioned some of the national labs and what they do, well, that's pretty much how they do simulations and keep improving on their simulations. Let's get into the technology for a second. We talked about the portfolio. Let's talk about object storage a little bit, how it fits with the tape constructs here and in a cloud context as well because a lot of people think of S3 and cloud and okay, well, turns out object storage S3 actually on tape two. Let's take a look at another slide here real quick that talks about some of your partnerships in context of backup and recovery, but also in archiving, but also more broadly, let's talk about object.

>> Sure. Well, I think there is sort of a new dawn,

Mitch Seigle

>> if you will, a modernization of tape. As you know, tape is 75 years old. It's probably the longest-lived technology that's still in active use today in the world of IT. So it's had an incredibly long run, but it's far from archaic. It has actually gone through a transformation from most of its lifetime where it required large organizations to be able to integrate it and custom coding for applications to write data to tape. And so while it was very powerful and has always offered incredible benefits in terms of sustainability cost, long before we even thought about those things, it wasn't the easiest to integrate into an organization. And so now that we are about 20 years into the object revolution, if you want to call it, in which object storage has evolved, we're now seeing different forms of object storage emerge from what was once disk-based to now things that are even based in flash and now ultimately in tape. And so what we've been focused on is building interfaces that front end those libraries and take care of the abstraction from common cloud access protocols, the de facto being the Amazon S3 and S3 Glacier APIs, which are out there that people have applications that write to. And we provide that layer in the form of our Black Pearl product platform, which is essentially software-defined storage that sits in front of the tape library and does that translation, if you will, to what goes on to tape. And so it's a very simple connection that makes tape as simple to consume as cloud storage.

Christophe Bertrand

>> Right. Because in this day and age, it's all about having APIs, common interfaces. You can't keep architecting and re-architecting with multiple layers and layers, and this is a great way of doing it and to be able to manage in an optimized fashion what will be, again, exabytes of data in some cases in your environment. Well, let's talk a little bit about how I could use this as a service. I think that's one of your initiatives. Tape is great. I see how, of course, you can deploy it and you should deploy it in your environment, on-prem. It's going to be needed for AI. There's no question around compliance, governance. The many reasons why governments will want to regulate this as well in terms of where it is located, where the data is located and processed. By the way, I think there are 75 regulations today already covering AI globally. One of our colleagues, Scott Heppner covering this topic, and that's going to be another summit that we'll be running in the next few months on data governance and compliance in the age of AI. And the regulations are already there. But let's talk about, again, could I use this as a service? Because to me, I think there's a way to maybe get into the technology, but consuming as a service is also a very popular way of consuming technology in general terms. So where are you with it? What initiatives do you have?

Mitch Seigle

>> Well, first of all, the short answer is yes. Can you consume it as a service? Now, that wasn't always a given. The first problem, of course, was to modernize tape, which we've done to abstract it to object storage. Once that's done, that could potentially be deployed anywhere. Today, we think of that largely as on-premises with hybrid cloud extensibility. All of those are functionalities that we provide in our object-based tape platform. The natural evolution of that was to go the next step and remove the physical idea of it being on the customer's premises to being on anyone's premises, whether that was at a managed service provider, a colo or virtually in the cloud services sector. And so we've actually created what we call the Tape Archival Platform As a Service, TAPAS. So we were a little hungry that day. It was getting close to half the hour when we coined that phrase for memorability. And we partnered with a startup, Geyser Data, which actually provides that as a service layer around the tape abstracted object storage. We stand that up in public cloud data centers using large data center providers. I won't name them. There are many of them, but essentially, it makes it as simple as a customer that's operating in a totally cloud-native or a cloud-enabled environment without any on- premises facilities where they don't want to manage tape because they don't have the skills in place to deal with it, that they can consume this on a purchased basis. And what's different here is we're not really selling you or renting you the storage, we're actually creating a physical tape library in the cloud that you access via S3 and S3 Glacier APIs. You actually own tapes that are in a physical library in a cloud service center. You can choose the locality of that for sovereignty and residency purposes. So it helps greatly with that. You can retrieve those tapes physically if you'd ever like to take possession of them, but what we've done is taken out all of the tape handling, if you will, no pun intended from that particular model, and made it as simple as a cloud interface. Customers that have tried this can actually log in, use a credit card if they wish, and within less than 10, 15, or 20 minutes, stand up their very first storage on tape in the cloud. Now, the advantage is that unlike the cloud services where you actually don't know where the data resides physically because you don't get to select that and you don't know what's on tape, you don't know what's on disk, and you're paying egress fees and API call fees that all add up with very large implications. There are no access charges here. You simply pay for the tape cartridges that you consume in the cloud on a monthly basis. When you're done with those, they can be returned to you or they can be destroyed. Your option makes it very simple to stand up tape in the cloud as a backup or archival target or as a secondary copy if you are wanting an additional copy stored offsite in the cloud at a much lower price than the traditional storage services called storage in the cloud.

Christophe Bertrand

>> Right. And this is really the best of both worlds because you get... I think compliance could not be more important, not just in the context of AI, but in general. So I think that's definitely a great solution. A lot of people, as we said earlier, may not know type, may not be PhDs in handling tapes, and therefore, this gives them a very easy way to consume the technology with all of its benefits and the cost benefit being one of them without having to learn anything or get the skill set, which can also be costly. So really the best of both worlds. Look, we've covered a lot of ground, Mitch, thank you so much for joining us. We're going to be continuing the conversation with one of your colleagues and one of my ex- colleagues who's an expert on the topic as well because I think there's so much to discuss for tape and AI, a lot of innovation going into tape. Maybe your last thoughts on where this is going from a scale standpoint. There's a roadmap for LTO, which is one of the standards, but LTO-9, I believe today. I think it's a pretty good amount per cartridge. Is 45 terabytes compressed or something like that? I mean, I get lost in the numbers because it changes all the time. But there is a roadmap. This is maybe one of the only technologies that actually has a roadmap with capacity that is published years ahead, which is fascinating. Where do you see this going overall and what are maybe your closing thoughts for our viewers?

Mitch Seigle

>> Well, thanks for that, and thank you for the opportunity to talk about tape technology and where it fits into the spectrum of AI and data protection. I would say that first on the roadmap, we're actually now just recently at LTO-10.

>> Oh, 10. - The 10th generation and 25th year.

Mitch Seigle

>> So the 25th anniversary has just occurred.

Christophe Bertrand

>> The LTO-10 technology stores 30 terabytes per cartridge. And so it's once again in LTO-9, we were at 18 terabytes, we're now at 30. And the roadmap for LTO, which is a public roadmap published years in advance, probably the only technology that does publish that roadmap years in advance, and so take that for what it will. Can you predict with precision point accuracy a decade in advance on any other technology? No. Are we perfect on the LTO roadmap? We're pretty darn close. So I think everybody's proud of that roadmap, but it gives people assurances. I think the key thing is that tape is not stagnant that, like any other technology, there's ongoing research and development, continuous innovation, not just in the tape drives and the media, but in the library functionality and capabilities of those libraries. And of course, in the way in which we package it and deliver those solutions to make them easily consumed, highly scalable, highly cost efficient. So I think you can look for continuous innovation in that space, more specific functionality to facilitate AI use cases as likely to occur. We're sort of in the infancy of that, but I would expect at some point that you would see AI- enabled interfaces that are aware of tape rather than it being abstracted, and they don't have awareness. They'll become very tape aware and take full advantage of that scalability and energy efficiency to help manage the footprint cost effectively and most importantly securely.

Christophe Bertrand

>> Yes. And as an analyst, I have to make predictions, but I think it's a pretty obvious one to me, given all of the advantages and the scale questions, the performance of the devices, I think you cannot have an AI infrastructure, an information architecture that does not include tape in it. I think that's essentially what the future is. And if I had said this few years ago, people would've thought, "Okay, you're crazy. Tape is dead. It's an old form of storage. " Well, it's not. It's not. It's pervasive actually. As we said, you are using it even without knowing you're using it in many cases. AI is going to be a pretty amazing accelerant, in my opinion, for tape. There's no question. With the modalities that you've explained, and for certain use cases, not for everything, but for plenty. And then I also believe cyber resilience is driving a lot of the use of tape for other reasons such as protecting the data because there are so many good reasons. And remember, compliance, governance, those are not going away either. So Mitch, great future ahead for tape, tape and AI and a protection of data assets. Thank you so much for joining us today.

Mitch Seigle

>> Thank you, Christophe. - And

Christophe Bertrand

>> to our viewers, thank you so much. My name is Christophe Bertrand, principal analyst here at theCUBE Research. Stay tuned for more on data protection and AI.