Mitch Seigle of Spectra Logic, a leading company in the storage industry, discusses the convergence of data protection and artificial intelligence at the Data Protection & AI Summit. With over three decades of experience, Seigle provides insights into Spectra Logic's role in this evolving landscape.
He elaborates on the company's 45-year history and innovations, emphasizing its dedication to developing modern tape storage solutions and its significance in high-performance computing environments. Spectra Logic, based in Boulder, Colorado, is renowned for expertise in tape and tape-based solutions. The conversation, guided by hosts from CUBE Research, explores how the company adapts to the demands of AI and data protection.
Key insights include the benefits of tape storage in AI processes, particularly in sustainability, cost-efficiency, and scalability. Seigle explains how tape storage is essential for managing large datasets necessary for AI training and archiving. They highlight its importance in global-scale applications, with examples from national laboratories and scientific research facilities. The discussion also covers Spectra Logic's Tape Archival Platform As a Service, a cloud-based storage solution.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Data Protection & AI Summit. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Register For Data Protection & AI Summit
Please fill out the information below. You will recieve an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for Data Protection & AI Summit.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
Data Protection & AI Summit. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open this link to automatically sign into the site.
Sign in to gain access to Data Protection & AI Summit
Please sign in with LinkedIn to continue to Data Protection & AI Summit. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
Mitch Seigle, Spectra Logic
Mitch Seigle of Spectra Logic, a leading company in the storage industry, discusses the convergence of data protection and artificial intelligence at the Data Protection & AI Summit. With over three decades of experience, Seigle provides insights into Spectra Logic's role in this evolving landscape.
He elaborates on the company's 45-year history and innovations, emphasizing its dedication to developing modern tape storage solutions and its significance in high-performance computing environments. Spectra Logic, based in Boulder, Colorado, is renowned for expertise in tape and tape-based solutions. The conversation, guided by hosts from CUBE Research, explores how the company adapts to the demands of AI and data protection.
Key insights include the benefits of tape storage in AI processes, particularly in sustainability, cost-efficiency, and scalability. Seigle explains how tape storage is essential for managing large datasets necessary for AI training and archiving. They highlight its importance in global-scale applications, with examples from national laboratories and scientific research facilities. The discussion also covers Spectra Logic's Tape Archival Platform As a Service, a cloud-based storage solution.
In this Data Protection + AI Summit segment, theCUBE’s Christophe Bertrand talks with Mitch Seigle, CMO of Spectra Logic, about why modern tape is at the heart of AI-ready data protection. Seigle outlines Spectra Logic’s 45-year focus on storage, from the modular Spectra Stack to the multi-exabyte TFinity Plus, showing how high-performance computing sites like SLAC rely on tape for archiving petabyte-to-exabyte data sets.
The discussion explains how object storage front ends such as BlackPearl make tape as simple to consume as S3, while the new Tape...Read more
exploreKeep Exploring
What is the background and current focus of the CMO at Spectra Logic?add
What are the main use cases for AI that utilize tape storage?add
What are the advantages of using tape storage for large language models (LLMs) compared to online storage?add
What has been the evolution and current state of tape technology in the context of modern IT storage solutions?add
What is TAPAS and how does it relate to the modern use of tape storage?add
What role does tape storage play in the future of AI infrastructure and information architecture?add
>> Hello and welcome back to the Data Protection and AI Summit. Very pleased to be with
Mitch Seigle here, CMO of Spectra Logic. We're going to be talking
about Spectra Logic and about tape. So let's get into introductions. Tell us about yourself, Mitch, and tell us about Spectra Logic.
Mitch Seigle
>> Well, thanks,
Christophe. It's a pleasure to be here with you on theCUBE. First, a little bit of my background. I'm the CMO here at Spectra Logic. I'm a storage industry veteran. I've been around the industry
for more than 30 years, working across all of the
various segments of storage, including software and hardware
and in the disk, the flash, and now ultimately in the tape space. So seen it all, been there, done that, watched all the trends,
and really, I'm excited because we're sort of
entering a new era here around the intersection
of data protection and AI, and in particular the
role that storage plays to enable all of that growth. A little bit about Spectra
Logic, if you're not familiar with the company, we're 45 years old. We are uniquely dedicated to storage, so we're deep experts rather
than generalists in the space. We were founded in 1979. Sort of a parallel story that
may sound familiar to you. Our founder started the business
in his college apartment while he was still a
student at CU Boulder. And the company's headquartered
today in Boulder, Colorado, where we exclusively focus
on the development of tape and tape-based solutions, and in particular those
that are object-based now to interface very easily into
all of the infrastructure that's so modern today. So despite being 75 years old,
the technology is as fresh and vibrant as ever and actually is continuing to see innovation on a regular basis.
Christophe Bertrand
>> All right. And if I may
quote you, I think you said, "Well, there are two types of IT person. The people who know they use tape and the people who don't
know they use tape. " So I'll give the answer,
which is, if you don't know that you're using tape,
it's because you're using cold storage in cloud. Actually, if you use
cloud, very likely some of the data you're
leveraging is at some point or has been on tape or will be on tape.
Mitch Seigle
>> Well, that's absolutely true. And so thanks for the little pun there. It's a little bit tongue and cheek, but the reality is that cloud- based cold storage
services have a very large tape tier behind it. And so there are many users
today that are archiving or backing up to the cloud on things. And I'll use Amazon S3
Glacier as an example. Most people are familiar with that service that is very heavily backed by tape. And so there are a lot
of reasons for that, and we'll probably talk a
little bit more about that through the course of the conversation.
Christophe Bertrand
>> Absolutely. And we'll
also connect the dots for our viewers. Tape and AI and data protection, why
do we have those sort of three terms in the same sentence? And actually, there's a
very, very good reason. And one, I'll give away one
hint, which is, well, one of them is scale, but we'll
talk about use cases also in a conversation in another
session we'll have with you and a couple of colleagues
and ex-colleagues. So let's talk about your customers because I think that's the most
important thing to position how much work you've done
in the past 45 years with what is actually a technology
that is full of innovation with one of the most
interesting roadmaps with a lot of capacity and scale
projected into the future, but let's talk about customers first. You brought us what I
call the wall of logos. So maybe we can just show
that here on the screen, and you have lots of great customers. Maybe you can point to some of those logos and what you do for them and with them and give us a sense for
maybe some AI initiatives that you've already observed.
Mitch Seigle
>> Well, we're very privileged to have some of the world's most impressive
customers in a variety of different what we've
considered to be vertical markets. We generally serve the
high-performance computing community, and that's a key call-out here for folks that are thinking about artificial
intelligence, where much of the infrastructure
that's used today to deliver that is built on top of high- performance computing
environments that have been well- established and well-proven. And of course, our tape
technology forms a very large archival backbone in those environments. As you can see across some
of these logos, we have folks that are in the National Laboratory space, Lawrence Livermore, for example, SLAC National Accelerator Library, which is just up the street
from the studios here in Palo Alto, California, and has just
this week been in the news. I don't know if you happen to see this, it was carried in the
New York Times as well as many other outlets. The new Rubin Observatory in
Chile has sent back the first images, and that data is actually transmitted
from the telescope site in Chile to SLAC in Palo Alto, where ultimately the images
will be stored on Spectra Logic TFinity Libraries that
are installed at SLAC. And I kind of thought about
this a little bit today because we're very familiar
with the idea that many of our customers store the
world's most important data. So data of global impact or global importance,
weather data, for example, climate studies, scientific research, physics experiments, and so on. But now with the data that
we're seeing of so many galaxies with such resolution that we've never seen before, I would venture to say that we've gone from global importance to galactic significance.
Christophe Bertrand
>> That's a great quote.
We may have to use that as the subtitle this
session, but it is true. I mean, the wall of logos as
I like to call these types of slides is very impressive
for a variety of reasons. Number one, clearly these are
very large scale data users, and scale is the name of the game. And for AI, scale is and will be the name of the game for sure. So you can start seeing the
connections here, number one. Number two, it's a variety
of industries that you serve. There are lots of logos
I know you cannot put up because as you can imagine,
there are lots of big users of data that you can't really talk about. More importantly, some of
the users have what we call HPC processes in place,
high powered computing. So let's maybe talk quickly about what is an HPC process in general, and then we'll get into more
of the product portfolio. But let's start with that because SLAC is typically use as your HPC example. So what is HPC? Why should people care and
how does it connect to AI?
Mitch Seigle
>> Well, high performance computing
in simple terms is really an infrastructure that is purpose built for processing power,
performance, scale capacity, and also importantly at the
highest levels of reliability, resilience, and lowest cost. When you talk about data at scale, just that term at scale means different things to different people. So let me give you an example that when we're talking about
the data in a high performance computing environment,
today, we have many customers that are archiving data
onto our tape technology that are in the hundreds of petabytes, and we actually have
customers that are now at the one exabyte or more level including a single library that's now at the exabyte level. So this idea which folks may have heard of around petabyte class
storage, I would beg to differ that it's really crossed
boundary quite a while ago in that we now have exabyte class storage. Spectra libraries are capable
on the upper end of as much as six and a half exabytes of capacity compressed
within a single system and raw capacity above two petabytes. Sorry, two exabytes. >> Exabytes, right. So actually,
Christophe Bertrand
>> we should just take a look
at the portfolio real quick. You have a pretty interesting
lineup from the Spectra Stack to the TFinity plus that
you mentioned earlier, and in the context of HPC
but also other workflows. The idea is that you want to
be able to ingest quickly, you want to be able to
provide the data back quickly at scale to various processes in a number of industries. And again, I can't stress
enough, it's not just HPC. It could be media. It could be in any industry as a matter of fact. So let's talk about
your portfolio, walk us through the entry level all the way to the enterprise library.
Mitch Seigle
>> Sure. Well, thanks for that. And I think we have a slide up here, and you can see on the left
that we have the Spectra Stack that is basically a rack-scale type of an environment in
which this is modular. It allows you to start very
small in terms of your capacity and grow to as large as about 17 petabytes within
a single data center rack.
Christophe Bertrand
>> If I may interject
here, I like that you say 17 petabytes is small.
Mitch Seigle
>> Well, it is in the context of what we're talking about here. When you think about media and entertainment customers that are storing entire
film libraries that need to be preserved forever, so
think about the idea that tape really becomes the final
destination in which we're preserving, protecting and
defending this data for all time. It's this idea that data's become forever and so that you have to have that scale. 17 petabytes is actually
relatively small in the grand scheme of things, but
for many organizations, it's certainly adequate. And as you kind of move
through the product line, we also then have mid-range
libraries that will scale up to 50 petabytes, which we
have research universities where a single cube library, for example, would be well situated to
handle all of their archival and backup needs in terms of storing that. Our Spectra T950, which we show
a single frame on the slide, but that actually scales out
horizontally to eight frames and up to 300 petabytes. And then of course the flagship,
which is the TFinity Plus, that library actually is
really designed for longevity. We first put our TFinity series
into the market more than 15 years ago, and we have many of
them that are still running. So we like to say that we're
the oldest thing in the data center, and we're proud of that because we designed for longevity, which is a very different
design center than storage that typically is replaced on
a three to five-year cycle. Those large TFinity systems,
as I said, will scale up to as many as 168 tape drives. So you get massive parallelism
in the library, close to 57,000 slots available
in those libraries to store your actual tapes, and they'll scale out to 45 frames, give you multi-exabyte capacity. So we're talking about the
world's largest storage systems of any type are actually
Spectra libraries.
Christophe Bertrand
>> Right. And in the context of AI, we know that there will be a lot of or massive creation of data
moving forward in the future, very likely because of
AI-related processes. We're talking zettabytes
globally of new data. So clearly, it's going
to have to go somewhere, but if you think about AI now
there are very good use case to have maybe high speed disks or memory type of storage systems with compute close to each other
to serve certain processes. But for a lot of processes that are AI processes today
already, you can't afford for a variety of reasons
to have the data on disk or in cloud, even on tape in cloud. So before we get into maybe
more of the technology, real quick, what are the top three or four use cases you
can think of today for AI that leverage tape and
will be leveraging tape?
Mitch Seigle
>> Well, that's a great question. I think there are a few key
use cases here where certainly during the training phase or retraining phase, as
you're developing LLMs, there's a frequent need
to do entire trainings and that you need to store a
massive amount of that data so that you essentially
have a rollback point, and you need to be able
to store that effectively. So imagine if you've got
a multi-petabyte LLM, and you need to have
multiple versions of that, the online storage of that,
not only is it expensive, but it is very energy consumptive. It requires a lot of footprint, whereas tape consumes no energy when the tape's not being used. So it's very, very energy efficient, the most energy efficient of
all the storage technologies. It's extremely dense. And so if you think about this idea of cold data representing 70 or 80% of all data in an
environment, the more of that data that you can move from online
storage when it is cold or cooling down to tape, the more you can free up your environment to provide more AI processing capacity, and so it's kind of the
best of both worlds. It's low cost, it's highly sustainable, it's the most energy efficient
of all the storage media that's available today. And of course, it takes
care of sovereignty issues, control issues, which are
of critical importance, especially in the AI era,
where having control, knowing where your data sleeps at night is a very, very important thing. I think there's one
more aspect that we need to think about in use cases in general. The use cases today are
all about building out this environment and beginning to exploit it. And there are many different
variations on that. I think there will be additional use cases that will be emergent that are even larger because with a technology
like AI, we tend to see that regulation follows and why because it has enormous potential, but it also carries enormous risk. Wherever there is potential and risk governments tend
to want to control that. There is certainly every expectation. We're beginning to see the
first laws emerging around all of this compliance. And so imagine that not only
will you develop your LLMs and want archive those and provide various levels of protection, think about things like
signals intelligence for national security
purposes, where we'll want to potentially capture every query issued against
an AI to look for patterns or so-called noise. If you recall how we typically
do this by looking at signals that are intercepted,
people will be looking for those queries or conversations that are going on in AI. They'll probably want to
capture all of the results for litigation purposes
or discovery purposes. So it's just simply an
extraordinary amount of data. And then I think one of the
most important observations is that because AI has the
ability to look at volumes of data in scale beyond
human capacity to analyze and detect patterns, we will now want to store data at the most
atomic level possible. And when I say atomic, it's
down to the finest level of granularity and we'll
want to, if it's affordable and scalable, keep everything forever. Because as AI continues
to advance in the future, it will have a resolution and an ability to detect patterns that humans are not capable of because of the scale of the data.
Christophe Bertrand
>> And that's a very good point. At the same time, I think
there's also a conversation around eliminating data
that may be redundant because obviously you don't
want to have things twice. Maybe you archive it, which is another use case
discussion we had with another company that you know well, Congruity 360 very recently, but that combination is very powerful. And to your point, I think you're correct. You don't know what the future holds. The algorithms are only
going to get better. So five years from now, 10
years from now, you may want to go back to re-inject
a bunch of collected data to build a new model
or improve on a model. And actually, that's something
that already happens. You mentioned some of the national labs and what they do, well, that's pretty much how they do simulations and keep improving on their simulations. Let's get into the
technology for a second. We talked about the portfolio. Let's talk about object storage
a little bit, how it fits with the tape constructs here and in a cloud context as well because a lot of people
think of S3 and cloud and okay, well, turns out object storage S3 actually on tape two. Let's take a look at another
slide here real quick that talks about some of
your partnerships in context of backup and recovery,
but also in archiving, but also more broadly,
let's talk about object. >> Sure. Well, I think there
is sort of a new dawn,
Mitch Seigle
>> if you will, a modernization of tape. As you know, tape is 75 years old. It's probably the longest-lived technology that's still in active use
today in the world of IT. So it's had an incredibly long run, but it's far from archaic. It has actually gone through
a transformation from most of its lifetime where it
required large organizations to be able to integrate it and custom coding for applications
to write data to tape. And so while it was very powerful and has always offered
incredible benefits in terms of sustainability cost, long before we even thought about those things, it wasn't the easiest to
integrate into an organization. And so now that we are about
20 years into the object revolution, if you want to call it, in which object storage has evolved, we're now seeing different forms of object storage emerge
from what was once disk-based to now things that are even based in flash and now ultimately in tape. And so what we've been focused
on is building interfaces that front end those libraries and take care of the abstraction
from common cloud access protocols, the de facto
being the Amazon S3 and S3 Glacier APIs, which are out there that people have
applications that write to. And we provide that layer in the form of our Black Pearl product platform, which is essentially
software-defined storage that sits in front of the tape library and does that translation, if you will, to what goes on to tape. And so it's a very simple
connection that makes tape as simple to consume as cloud storage.
Christophe Bertrand
>> Right. Because in this day and age, it's all about having
APIs, common interfaces. You can't keep architecting and re-architecting with
multiple layers and layers, and this is a great way of doing it and to be able to manage
in an optimized fashion what will be, again, exabytes of data in some cases in your environment. Well, let's talk a little bit about how I could use this as a service. I think that's one of your
initiatives. Tape is great. I see how, of course, you can deploy it and you should deploy it in
your environment, on-prem. It's going to be needed for AI. There's no question around
compliance, governance. The many reasons why governments
will want to regulate this as well in terms of where it is located, where the data is located and processed. By the way, I think there are
75 regulations today already covering AI globally. One of our colleagues, Scott
Heppner covering this topic, and that's going to be another summit that we'll be running in
the next few months on data governance and compliance
in the age of AI. And the regulations are already there. But let's talk about, again,
could I use this as a service? Because to me, I think there's a way to maybe get into the technology, but consuming as a service
is also a very popular way of consuming technology in general terms. So where are you with it?
What initiatives do you have?
Mitch Seigle
>> Well, first of all,
the short answer is yes. Can you consume it as a service? Now, that wasn't always a given. The first problem, of course,
was to modernize tape, which we've done to abstract
it to object storage. Once that's done, that could potentially
be deployed anywhere. Today, we think of that
largely as on-premises with hybrid cloud extensibility. All of those are functionalities that we provide in our
object-based tape platform. The natural evolution of
that was to go the next step and remove the physical idea of it being on the customer's premises to being on anyone's premises, whether that was at a managed
service provider, a colo or virtually in the cloud services sector. And so we've actually created what we call the Tape Archival
Platform As a Service, TAPAS. So we were a little hungry that day. It was getting close to
half the hour when we coined that phrase for memorability. And we partnered with
a startup, Geyser Data, which actually provides
that as a service layer around the tape abstracted object storage. We stand that up in public
cloud data centers using large data center providers. I won't name them. There are many of them, but essentially, it makes
it as simple as a customer that's operating in a totally cloud-native or a cloud-enabled
environment without any on- premises facilities where
they don't want to manage tape because they don't have the
skills in place to deal with it, that they can consume
this on a purchased basis. And what's different here is
we're not really selling you or renting you the storage, we're actually creating a
physical tape library in the cloud that you access via S3
and S3 Glacier APIs. You actually own tapes that are in a physical library
in a cloud service center. You can choose the locality
of that for sovereignty and residency purposes. So it helps greatly with that. You can retrieve those tapes
physically if you'd ever like to take possession of them, but what we've done is
taken out all of the tape handling, if you will,
no pun intended from that particular model, and made it as simple
as a cloud interface. Customers that have tried
this can actually log in, use a credit card if they wish,
and within less than 10, 15, or 20 minutes, stand up
their very first storage on tape in the cloud. Now, the advantage is that
unlike the cloud services where you actually don't know where the data resides physically because you don't get to select that and you don't know what's on tape, you don't know what's on disk,
and you're paying egress fees and API call fees that all add up with very large implications. There are no access charges here. You simply pay for the tape cartridges that you consume in the
cloud on a monthly basis. When you're done with those,
they can be returned to you or they can be destroyed. Your option makes it very simple to stand up tape in the cloud
as a backup or archival target or as a secondary copy if you
are wanting an additional copy stored offsite in the cloud
at a much lower price than the traditional storage services
called storage in the cloud.
Christophe Bertrand
>> Right. And this is really
the best of both worlds because you get... I think compliance could
not be more important, not just in the context
of AI, but in general. So I think that's
definitely a great solution. A lot of people, as we said
earlier, may not know type, may not be PhDs in handling tapes, and therefore, this gives them a very easy way to consume the technology
with all of its benefits and the cost benefit being
one of them without having to learn anything or get the skill set,
which can also be costly. So really the best of both worlds. Look, we've covered a lot
of ground, Mitch, thank you so much for joining us. We're going to be continuing
the conversation with one of your colleagues and one of my ex- colleagues who's an expert
on the topic as well because I think there's so
much to discuss for tape and AI, a lot of
innovation going into tape. Maybe your last thoughts on where this is going
from a scale standpoint. There's a roadmap for LTO,
which is one of the standards, but LTO-9, I believe today. I think it's a pretty
good amount per cartridge. Is 45 terabytes compressed
or something like that? I mean, I get lost in the numbers because it changes all the time. But there is a roadmap. This is maybe one of the only technologies
that actually has a roadmap with capacity that is
published years ahead, which is fascinating. Where do you see this going overall and what are maybe your closing
thoughts for our viewers?
Mitch Seigle
>> Well, thanks for that, and
thank you for the opportunity to talk about tape technology and where it fits into the spectrum of AI and data protection. I would say that first on the roadmap, we're actually now just
recently at LTO-10. >> Oh, 10.
- The 10th generation and 25th year.
Mitch Seigle
>> So the 25th anniversary has just occurred.
Christophe Bertrand
>> The LTO-10 technology stores
30 terabytes per cartridge. And so it's once again in LTO-9, we were at 18 terabytes, we're now at 30. And the roadmap for LTO, which is a public roadmap
published years in advance, probably the only technology
that does publish that roadmap years in advance, and so
take that for what it will. Can you predict with precision
point accuracy a decade in advance on any other technology? No. Are we perfect on the LTO roadmap? We're pretty darn close. So I think everybody's
proud of that roadmap, but it gives people assurances. I think the key thing is that
tape is not stagnant that, like any other technology,
there's ongoing research and development, continuous innovation, not just in the tape drives and the media, but in the library functionality and capabilities of those libraries. And of course, in the way
in which we package it and deliver those solutions
to make them easily consumed, highly scalable, highly cost efficient. So I think you can look for
continuous innovation in that space, more specific functionality to facilitate AI use
cases as likely to occur. We're sort of in the infancy of that, but I would expect at some
point that you would see AI- enabled interfaces that are aware of tape rather than it being abstracted, and they don't have awareness. They'll become very tape aware and take full advantage
of that scalability and energy efficiency to help manage the
footprint cost effectively and most importantly securely.
Christophe Bertrand
>> Yes. And as an analyst,
I have to make predictions, but I think it's a pretty
obvious one to me, given all of the advantages and the scale
questions, the performance of the devices, I think you cannot have
an AI infrastructure, an information architecture that does not include tape in it. I think that's essentially
what the future is. And if I had said this few years ago, people would've thought,
"Okay, you're crazy. Tape is dead. It's an old form of storage. " Well, it's not. It's not.
It's pervasive actually. As we said, you are using it
even without knowing you're using it in many cases. AI is going to be a
pretty amazing accelerant, in my opinion, for tape. There's no question. With the modalities that you've explained, and
for certain use cases, not for everything, but for plenty. And then I also believe cyber
resilience is driving a lot of the use of tape for other reasons such as protecting the data because there are so many good reasons. And remember, compliance, governance, those are not going away either. So Mitch, great future
ahead for tape, tape and AI and a protection of data assets. Thank you so much for joining us today.
Mitch Seigle
>> Thank you, Christophe.
- And
Christophe Bertrand
>> to our viewers, thank you so much. My name is Christophe Bertrand, principal analyst here
at theCUBE Research. Stay tuned for more on
data protection and AI.