This on-stage interview recorded at NVIDIA GTC 2026 features Gemma Allen of theCUBE and NYSE Wired, host, in conversation with CT Sun of AIC, chief technology officer and vice president of engineering, and Pompey Nagra of Solidigm, product and ecosystem marketing. The discussion focuses on storage architecture and solid-state drive innovation for artificial intelligence inference, including key-value cache, disaggregated architectures and liquid cooling.
CT Sun of AIC outlines AIC’s 30-year platform experience and a multi-tier storage model from graphics processing unit high-bandwidth memory through local solid-state drive to content-extended memory. Sun highlights accelerating demand for key-value cache and multi-tier storage and warns capacity gaps may persist for years. They advocate disaggregated architectures that leverage data processing units such as NVIDIA BlueField to address scale and performance requirements.
Pompey Nagra of Solidigm explains solid-state drive form factors, low-latency performance and integration into AI servers. Nagra emphasizes executing on SSD roadmaps, adopting PCIe Gen6 and Gen7 interfaces, preparing for broader liquid cooling adoption and prioritizing delivery and supply chain execution to meet high-throughput inference workloads. They also address operational factors to consider for large-scale deployment.
This conversation provides technical insight into AI storage strategy, SSD innovation, KV cache adoption and infrastructure choices that affect inference performance and data center efficiency.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
NVIDIA GTC 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for NVIDIA GTC 2026
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for NVIDIA GTC 2026.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
NVIDIA GTC 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to NVIDIA GTC 2026
Please sign in with LinkedIn to continue to NVIDIA GTC 2026. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
CT Sun, AIC & Pompey Nagra, Solidigm
This on-stage interview recorded at NVIDIA GTC 2026 features Gemma Allen of theCUBE and NYSE Wired, host, in conversation with CT Sun of AIC, chief technology officer and vice president of engineering, and Pompey Nagra of Solidigm, product and ecosystem marketing. The discussion focuses on storage architecture and solid-state drive innovation for artificial intelligence inference, including key-value cache, disaggregated architectures and liquid cooling.
CT Sun of AIC outlines AIC’s 30-year platform experience and a multi-tier storage model from graphics processing unit high-bandwidth memory through local solid-state drive to content-extended memory. Sun highlights accelerating demand for key-value cache and multi-tier storage and warns capacity gaps may persist for years. They advocate disaggregated architectures that leverage data processing units such as NVIDIA BlueField to address scale and performance requirements.
Pompey Nagra of Solidigm explains solid-state drive form factors, low-latency performance and integration into AI servers. Nagra emphasizes executing on SSD roadmaps, adopting PCIe Gen6 and Gen7 interfaces, preparing for broader liquid cooling adoption and prioritizing delivery and supply chain execution to meet high-throughput inference workloads. They also address operational factors to consider for large-scale deployment.
This conversation provides technical insight into AI storage strategy, SSD innovation, KV cache adoption and infrastructure choices that affect inference performance and data center efficiency.
In this interview from the NVIDIA GTC AI Conference and Expo in San Jose, CT Sun, vice president of engineering and chief technology officer of AIC, joins Pompey Nagra, AI marketing ecosystem manager at Solidigm, to talk with theCUBE +NYSE Wired's Gemma Allen about how the inference era is driving explosive demand for AI storage across every tier of the data hierarchy. Sun walks through AIC's storage tiering framework — from GPU-local SSDs at G3 to a new G3.5 "content memory extension" layer designed to absorb overflowing KV caches — explaining why each infer...Read more
exploreKeep Exploring
Neutral question: How does the shift into an inference-focused, "execution mode" era affect your business and what changes or opportunities does it create across your operations?
Note: the excerpt only partially answers this (it confirms "we provide the platform"), but does not fully explain the implications or opportunities.add
What are the storage challenges of AI workloads, and how do the storage tiers (G1 = HBM inside the GPU, G2 = system memory, G3 = local SSD, G3.5 = content-extended memory/storage, and G4) address those needs—including feeding GPUs at low latency, providing enough capacity for AI‑generated data, and handling growing KV caches during inference?add
What are the competitive dynamics between flash (SSD) storage, DRAM, and cloud storage, and how do those dynamics shape inference requirements?add
How do NVIDIA's BlueField announcements (particularly BlueField-4) affect your company's business and storage architecture—how will they play into your model for disaggregated, high‑availability AI/enterprise storage?add
>> Welcome back to TheCube here on the ground in San Jose. It's NVIDIA GTC 2026, and I'm here at the AIC booth. And joining me now for a conversation on all things storage is CT Sun, VP of engineering at AIC and Pompey Nagra, AI marketing ecosystem manager at Solidigm. Welcome guys.
CT Sun
>> Thank you.
Pompey Nagra
>> Thank you
CT Sun
>> Thank you for having us.
Gemma Allen
>> So this is certainly an interesting booth from the perspective of what's happening here behind me, right? It's a mixture of tech meets a beautiful mind. But before we get into that, maybe you could both just share a little bit about your companies, what it is that you do for those who might not be familiar.
CT Sun
>> Okay. So start with you.
Pompey Nagra
>> Start with you.
CT Sun
>> For me? Okay. So AIC, we've been in industry for 30 years. We focus a lot on storage, but with not only storage, but also a lot of server storage related product. AI server and recently, not recently, like five, six year, we focus more on AI storage. And right now it looks like the AI storage demand is so huge. So we've been working with Solidigm in the very first day on the AI journey.
Gemma Allen
>> Wow.
Pompey Nagra
>> Yeah. Solidigm is a SSD manufacturer and designer. So we build various different form factors for our partners, AIC and others, where we sell different form factors, E1.S, E3.S, which are using different storage and server platforms to store AI data at the fast rates it's required at today.
Gemma Allen
>> Okay. So let's just break this down for a second. NVIDIA fly the GPU, you fly the flash drive, you provide the rack, right? But we know-
CT Sun
>> We provide the platform, yes.
Gemma Allen
>> The platform. It's about a lot more right now than just space, right? It's about so many things, especially in this inference era. It's a hot word on the ground here this week. Jensen talked a lot about it. We're moving to execution mode. What does that mean for your businesses? How is that driving change opportunity across both of your spaces?
CT Sun
>> Right. Actually in the booth, we demonstrate the whole AI data journey from the creation of the data. And then in AI storage, what you need is not only dealing with latency because you need to feed a lot of data to GPU, as well as you need to have enough capacity because the AI generated data is more than you get the data from the nature world. That is one. That is huge, huge opportunity out there with this AI booming, right? So that is actually on the G4. But at the same time in the inferencing, you can see all the KV cache demand drive even more storage. So every tier probably ... Can I spend more time on this?
Gemma Allen
>> Sure, please do.
CT Sun
>> So for the G1 is actually inside the GPU, so it's the HPM. And also G2 is a system memory. So G3 is a local SSD. So this alone is inside the GPU box. But those KV cache, when you create those content, it will keep increase and increase. And one day overflow, you need still have something to keep it. So you have G3.5, we call it content extended memory storage.
Pompey Nagra
>> Content memory extension.
CT Sun
>> Yeah. So you have this storage that is G3.5. And then these are the ... In the beginning we talk about AI storage, right? So these are all for dealing with KV cache for inferencing.
Gemma Allen
>> Wow.
Pompey Nagra
>> So as CT was explaining, the vast amounts of data that AI is producing need to be captured, stored, manipulated, and moved. Solidigm, the leading provider in SSDs is able to take that data, store it in an AIC platform so that the GPUs can take that data and make sure that it can manipulate, calculate, and do the next level of processing through the AI systems that we've powered with our SSDs.
Gemma Allen
>> And what are the competitive dynamics around flash drives versus DRAM or pure cloud storage? How does this shape the inference requirements across the board?
Pompey Nagra
>> So from a storage perspective, there's a need to store data at low latency, but the storage needs to be local, so high capacity. So these combined with the next generation technologies such as PCI Gen 6, PCI Gen 7 that's coming, allows us to take that data at very high speed, store it so the GPUs and CPUs can retrieve right and read from the data, read from the disk story, and manipulate the data through the systems that's delivered through by AIC.
CT Sun
>> Yeah. So let me put more color on this. Not only on the ... Of course, Solidigm play a very crucial part. You provide a very low latency. SSD at the same time have different level of high capacity.
Pompey Nagra
>> Correct.
CT Sun
>> Right? But for the platform we provide to the market, we need to accommodate on this drive. At the same time, we need to dealing with outside storage network, provide the 200 gig network, 400 gig. And then right now we are talking about 800 gig. So dealing with this and the infrastructure, it won't be the same as right now. So a lot of time we deal with disaggregation in the storage architecture, right? So that we also provide the DPU using NVIDIA BlueField-1, BlueField-3, and then we will move to BlueField-4 on this disaggregation storage architecture. So at the same time, we dealing with latency and because you need a very flexible architecture to build up the storage tier so that BlueField can upload everything to disaggregation data box and our box is high availability. Let's make sure all the content we keep in the box, it won't go on because the system will keep running. We have multiple controller keep the data alive. If one controller die, we can just take over from the other controller. So that is keeping ... Because AI storage is actually a enterprise level storage. So that is a lot of game change here in AI era.
Gemma Allen
>> Let's stay on BlueField for a second because it did get some press yesterday, right?
CT Sun
>> Yes.
Gemma Allen
>> From the perspective of NVIDIA, which is right here, okay? So middle layer. Talk to me a little bit about how the announcements yesterday impact your business specifically CT. How do you expect this to play into your model for the ?
CT Sun
>> Okay. So we've been working with NVIDIA BlueField before they launch.
Gemma Allen
>> Oh, wow. Okay.
CT Sun
>> So BlueField-1, 2017, when they launched, we already have box ready for that nobody knows what DP storage was and right now everybody talk a lot, but we've been shipping a lot.
Gemma Allen
>> Yeah.
CT Sun
>> Okay. So we've been shipping on G4 because right now the DPU, we can ... MVM mobile fabric, you can move data. You have very flexible in storage architecture point of view. But for processing power in BlueField-4, because they change two different processing power, they're going to integrate Vera with CX9. With Vera, because you have 64 ATA encore inside, right? You have enough computing power. At the same time, LPDDR5X, you have a very high throughput, high, low latency memory. That is actually crucial for storage application. So I believe BlueField-4 will change the landscape of storage faster than we think in a very near term because for like G3.5 and G4, it will move to Bluefield-4 very fast.
Gemma Allen
>> So speak ...
Pompey Nagra
>> So if you take a look at what the real BlueField is doing to CT's point, you've got 800 gigs of traffic coming in into the system. And so with the ability to store that traffic, you need to be able to not just store that traffic, but make sure it's highly available should the system go down. So the amount of storage capacity at low latency that's required in any one system or anyone try grows exponentially.
Gemma Allen
>> Sure.
Pompey Nagra
>> And so if you take a look at the vast size of an AI system, a superpower from NVIDIA, the amount of storage and processing that's needed grows exponentially. And that's why we're here and excited to be here
CT Sun
>> That's why Solidigm right now selling very ... Yeah.
Gemma Allen
>> That's why they're making so much ... They're making so much money off it.
CT Sun
>> Exactly. Even Jensen say, "Oh, you guys are rich."
Gemma Allen
>> I love it.
Pompey Nagra
>> I think he said you're richer, didn't he?
CT Sun
>> No, no, no, no.
Gemma Allen
>> Okay. So drinks are on both of you guys tonight.
Pompey Nagra
>> There you go.
Gemma Allen
>> Let's talk about the world of ODM for a second, because it is a very interesting time from the perspective of customer expectations, market narrative and hype, and I'm sure reality on the ground, right? It's also a perfect storm. We've seen a lot of supply chain challenges, a lot of geopolitical strife, which of course impact your businesses and your ability to meet supply and demand. How are things shifting month to month from the perspective of what's actually happening in your factories?
Pompey Nagra
>> So from a storage perspective, obviously the market's seen, there's a supply shortage that's impacting a lot of customers. We have to prioritize with the bits that we get and deliver what we can get as soon as we can get them. So it's a matter of execution and delivery on the execution to the promises we make to our partners and our customers.
Gemma Allen
>> So it's almost like a tiering model, like you have no other choice, right? I'm sure.
Pompey Nagra
>> Absolutely. Right. We have to just execute with what we have, deliver what we have the best way we can do it. And Solidigm is meeting the customer expectations as best and as fast as we can.
Gemma Allen
>> And for you, CT, I mean, Taiwan gets a lot of news, right? It gets a lot of fear stirs from time to time.
CT Sun
>> Yes.
Gemma Allen
>> So what are your thoughts? What's it like for you from a supply and demand perspective?
CT Sun
>> Well, as a Taiwanese, right, we stand very strong.
Gemma Allen
>> Absolutely.
CT Sun
>> And we are life as usual.
Pompey Nagra
>> Absolutely.
CT Sun
>> We are working with all the industry, try to supply all the demand, understand what this market knows, try to keep us normal, but we are normal. But yeah, right now because the demand is too high, everything changed, everything changed, right? Like the demand and supply gap is getting bigger and bigger. So in coming next few quarter ... Well, the demand and supply gap might continue to another two, three years. So at the same time, I think we need Solidigm to help to get some location. Otherwise, I think this is a big time.
Gemma Allen
>> So one thing we know about supply and demand challenges is often it turns out to be a cost challenge too, right?
CT Sun
>> Yes.
Gemma Allen
>> Costs get crunched and customers often suffer the impacts of those. How do you continue to manage expectations from the perspective of scale, speed, cost? How do you weigh that up?
CT Sun
>> That is his portion. Our portion, we always be very cheap. We have good product for sure.
Gemma Allen
>> But it's cost, I guess in this moment we're in, where it's a speed race, it's a scale race.
Pompey Nagra
>> Right.
Gemma Allen
>> Is it still a cost race?
Pompey Nagra
>> It's always a cost race because time equals cost, right? People want product instantly. They can't grow the data centers fast enough. They want to move, move, move. And that's a good place to be. Unfortunately, just because of the industry dynamics, we can only deliver what we can deliver when we can deliver it. And as a hard way to say, it's a situation. We'd love to service our customers. We love our customers and our partners, and we want to give them the best solution possible at the rate they want to have it at.
CT Sun
>> Thank you in advance.
Pompey Nagra
>> Thank you. You're welcome.
Gemma Allen
>> So it's Monday. We're all back to work. NVIDIA GTC is behind us for another year. What's ahead for both of you? What are the key priorities between now and the end of 2026?
CT Sun
>> Well, in 2026, 2027, 2028, we'll keep on this pyramid, try to fulfill ... I did not talk about the because this is also very special. This is a tier to close the gap. So these all memory tiers, right? So memory tier always have different level of gap because from here to here is one with the five zero after, right? So that is a huge gap. So we need to invent or work with industrial leader to close this gap. So in this year, next year, coming few year, although the old pricing stuff, we still want to provide product for customer so they can close the gap.
Gemma Allen
>> And the pressure from context memory will only further widen that gap?
CT Sun
>> That is true because remember how many GPU NVIDIA are selling. Jensen mentioned 100 million and those will goes to inference and then those inference GPU will create content and then goes down to G3, G3.5, G4. They say KV cache, they will need the G3 for 16 terabyte capacity. And then this is share storage for everybody and include application, right? For KV cache alone, it's four terabyte.
Gemma Allen
>> Wow.
CT Sun
>> So you can do the mathematics. You are 100 million GPU times this. That is this alone probably 10X storage right now.
Gemma Allen
>> Well, it's a great time to have an offload option, right? So from your perspective, this is very opportunistic, I'm sure.
Pompey Nagra
>> Absolutely.
Gemma Allen
>> What's ahead for you and the team at Solidigm?
Pompey Nagra
>> So really it's, again, delivering on the needs of the market, understanding the industry needs. So KV cache, obviously we're playing in that place. We're looking at what the needs and requirements are, executing to our roadmap. In addition to this, the framework of the server is also changing. It's going from an air cool device to a liquid cool device. At E1.S we introduced last year at GTC is a liquid cool SSD where we have cold plates that all serviceable in a NVIDIA environment, if that was to be the case. So the VR platform, the Vera Rubin platform is all liquid cool. And so those liquid cool platforms as well as the KV cache platforms, the STX, the CMX powered by AIC all requires that level of ability to be cool and deliver on the capacity with the low latency and we're excited to be able to deliver those products.
Gemma Allen
>> And do you envision far more liquid cooling as a core model? Do you think you're going to see more and more of this? I mean, it's expensive, right? It has its challenges too. It's difficult to plug. What are your thoughts on that?
Pompey Nagra
>> So I think if you take a look at the physics of the environment, the servers are getting faster, the servers are getting more powerful, which means they're getting hotter. They need to be cooled. A lot of the servers today are liquid cool, but they need to extend that to the memory which is being done as well as a storage. So I think liquid cooling becomes a paramount requirement in the next generation series of servers.
Gemma Allen
>> Wow. Well, exciting times-
Pompey Nagra
>> And storage.
Gemma Allen
>> Exciting times, especially to be a company that's been in the market for 30 years. I'm sure you're seeing a lot of change, a lot of excitement and opportunity ahead for both of you.
Pompey Nagra
>> Absolutely. Yeah. For the future.
Gemma Allen
>> Thanks so much for joining us on TheCube.
CT Sun
>> Thank you.
Pompey Nagra
>> Thanks much. Bye-bye.
Gemma Allen
>> I'm Gemma Allen. It's GTC 2026. We're here at the AIC booth talking all things storage. Thanks so much for watching.