In this interview from CES 2026, John Mao, vice president of global strategic alliances at VAST Data, joins theCUBE’s Rob Strechay to unpack VAST's pivotal role in NVIDIA’s Vera Rubin system announcement. The discussion centers on the reinvention of the AI stack, specifically the evolution of KV cache storage to support larger models and longer reasoning capabilities. Mao explains how VAST is moving beyond the limitations of local high-bandwidth memory by utilizing NVIDIA’s BlueField-4 DPUs and Spectrum-X networking to create an infinitely scalable pool of NVMe storage. This architecture enables context memory to extend across the network with high bandwidth and low latency via RDMA, fundamentally changing how data feeds the GPU.
The conversation also explores the broader implications of these infrastructure advancements for the "AI Everywhere" era, bridging the gap between data center innovation and consumer applications. Mao highlights how this shared-everything architecture impacts industries ranging from sports and media entertainment to robotics and physical AI, allowing for the democratization of unstructured data analysis. Additionally, they touch upon the manufacturing and packaging simplifications of the new supercomputing generation, underscoring how these developments are accelerating enterprise adoption of AI in production environments.
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
CES 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Register for CES 2026
Please fill out the information below. You will receive an email with a verification link confirming your registration. Click the link to automatically sign into the site.
You’re almost there!
We just sent you a verification email. Please click the verification button in the email. Once your email address is verified, you will have full access to all event content for CES 2026.
I want my badge and interests to be visible to all attendees.
Checking this box will display your presense on the attendees list, view your profile and allow other attendees to contact you via 1-1 chat. Read the Privacy Policy. At any time, you can choose to disable this preference.
Select your Interests!
add
Upload your photo
Uploading..
OR
Connect via Twitter
Connect via Linkedin
EDIT PASSWORD
Share
Forgot Password
Almost there!
We just sent you a verification email. Please verify your account to gain access to
CES 2026. If you don’t think you received an email check your
spam folder.
In order to sign in, enter the email address you used to registered for the event. Once completed, you will receive an email with a verification link. Open the link to automatically sign into the site.
Sign in to gain access to CES 2026
Please sign in with LinkedIn to continue to CES 2026. Signing in with LinkedIn ensures a professional environment.
Are you sure you want to remove access rights for this user?
Details
Manage Access
email address
Community Invitation
John Mao, VAST Data | CES 2026
In this interview from CES 2026, John Mao, vice president of global strategic alliances at VAST Data, joins theCUBE’s Rob Strechay to unpack VAST's pivotal role in NVIDIA’s Vera Rubin system announcement. The discussion centers on the reinvention of the AI stack, specifically the evolution of KV cache storage to support larger models and longer reasoning capabilities. Mao explains how VAST is moving beyond the limitations of local high-bandwidth memory by utilizing NVIDIA’s BlueField-4 DPUs and Spectrum-X networking to create an infinitely scalable pool of NVMe storage. This architecture enables context memory to extend across the network with high bandwidth and low latency via RDMA, fundamentally changing how data feeds the GPU.
The conversation also explores the broader implications of these infrastructure advancements for the "AI Everywhere" era, bridging the gap between data center innovation and consumer applications. Mao highlights how this shared-everything architecture impacts industries ranging from sports and media entertainment to robotics and physical AI, allowing for the democratization of unstructured data analysis. Additionally, they touch upon the manufacturing and packaging simplifications of the new supercomputing generation, underscoring how these developments are accelerating enterprise adoption of AI in production environments.
In this interview from CES 2026, John Mao, vice president of global strategic alliances at VAST Data, joins theCUBE’s Rob Strechay to unpack VAST's pivotal role in NVIDIA’s Vera Rubin system announcement. The discussion centers on the reinvention of the AI stack, specifically the evolution of KV cache storage to support larger models and longer reasoning capabilities. Mao explains how VAST is moving beyond the limitations of local high-bandwidth memory by utilizing NVIDIA’s BlueField-4 DPUs and Spectrum-X networking to create an infinitely scalable pool of NV...Read more
exploreKeep Exploring
What was discussed during the NVIDIA keynote at CES 2026 regarding the Vera Rubin system?add
What impact does AI have on various industries, particularly in relation to consumer electronics and data centers?add
>> Hello and welcome back to CES 2026. We're here live from Las Vegas. And again, really diving into what's going on with some of the new and innovative stuff that's been announced here live at the NVIDIA keynote that they're doing at the Fountain Blue. And I'm joined by one of my friends here, John Mao, who's the VP of strategic or Global Strategic Alliances for VAST Data. You guys were on there. You were in there, you had your logo up on there and there was some really interesting stuff when they got into the Vera Rubin entire, I would say, system. Because I wouldn't call it a chip because it's many chips and it's many systems. But why don't you kind of help us understand and help people understand and unpack this, how you're playing with the context storage portion of Vera Rubin.
John Mao
>> Sure. Yeah. I mean, I think Jensen did a great job explaining how they had to reinvent the entire system. It's not just a new GPU, but six different chips that you mentioned. And I think a lot of that also spills into the rest of the stack. And the rest of the stack in this context, no pun intended, is reinventing how you do things like KV cache, and how do we evolve when models get bigger, when a longer reasoning starts to happen, when there's more turns that are happening on inferencing, that means that different paradigms require for storing KV cache. KV cache used to be very kind of local to the GPU, to the memory, to the high bandwidth memory, HBM, on a particular node. But obviously, that's not good enough if you're trying to store very long conversations. If you're trying to grow that context over time, you need a different method. And so a lot of that development that VAST has been doing with NVIDIA is in how do we build and re-architect that part of the stack for these new systems that are going into deployment?
Rob Strechay
>> Yeah. Which makes total sense with the VAST OS, the AIOS and where you're going with this and how you're really bringing things together. But like you said, you hit on it a little bit. People have been really investing in and using KV cache and using things like NVME inside the servers. Help people understand how this is different from that and how they can gain because Jensen showed some pretty impressive gains in reasoning with Vera Rubin.
John Mao
>> Yeah. So I mean, using local NVME SSDs as part of a GPU server is one way to extend and kind of grow that KV cache. But the other way to do it, especially given all of the innovation that NVIDIA is doing on faster networks, innovations like Spectrum X networking, also DPUs like the BlueField-4, which is a very cornerstone part of the announcement today, it allows us to think about could we build an even more scalable context cache outside of the physical GPU server. So spilling out across the network, doing that in a high bandwidth, low latency way is instrumental. So yes, local NVME is good, but imagine a world where we had an infinitely scalable pool of NVME across a very fast fabric being able to do that. And that's part of the announcement today.
Rob Strechay
>> Yeah. And helping feed those GPUs because as we know, AI only goes as fast as the data goes in that. So yeah, you hit on the BlueField-4 and things of that nature. And that has to be a piece of it, is moving that data very quickly. And you have your shared everything architecture that really scales that way. Has that been what your customers and what you've been hearing from the customers?
John Mao
>> Yeah. We've been partnering with NVIDIA on BlueField for gosh, many years now. We've been using BlueField in our designs and our systems since the first generation, BlueField-1s, back in the day. And part of what was really cool about the announcement today is not only kind of the direction of this is coming, but we've actually ... I think we're the first one to actually validate an end-to-end BlueField solution for context memory ... to be able to extend that context memory across the wire, across the network. So BlueField-4 becomes kind of an end-to-end solution from us, not only to house the SSDs, the NVME SSDs on the other end of the network, but also to run a lot of the vast AIOS software inside of the GPU machine while giving you crazy good bandwidth performance using RDMA for a low latency access straight back into GPU, HBM.
Rob Strechay
>> Yeah. I mean, I look at this, I mean, obviously it's consumer electronic show, CES, you're going to be on a panel on Wednesday. Kind of talk a little bit to that because I think you have so many customers in so many different fields that actually touch the consumer.
John Mao
>> I think I saw somewhere, I forgot who posted it, someone from NVIDIA, but yeah, this is a consumer electronic show, but the data center touches the consumers indirectly when it comes to AI. So we're seeing AI, as everyone knows. It's infiltrated every single industry, right? Sports and media is obviously very a big market when it comes to unstructured data around video, leveraging AI to be able to understand what's happening in video and being able to democratize that access is game changing for a lot of organizations, including in media entertainment. So yeah, we're super excited. We're working here at VAST, not only in entertainment, but you name it, right? Whether it's in robotics and physical AI, becoming a very, very ... I mean, we saw Jensen talk about it for the first 45 minutes today, right?
Rob Strechay
>> Right.
John Mao
>> Becoming a material part of the AI story in the market, but also your classic enterprises are starting to move AI into production more and more increasingly. So things like KV cache, again, come full circle, become an instrumental part of that design moving forward.
Rob Strechay
>> Absolutely. And I think that whole ... I mean, to me, that was the biggest announcement out of the Vera Rubin was the context storage and being able to bring that and have it more centralized and feed out over super low latency, high bandwidth. And I think to me that was great. Last parting thought.
John Mao
>> I haven't been to CES in quite some time, so I'm just very excited to check out. Jensen always does a bang up job. I think one of the things that was maybe skirted over, at least for me in the past, was the improvements on just how they've also changed the physical packaging from generation one of their supercomputing with Grace Hopper, Grace Blackwell to Vera Rubin, just the simplification going from two hours of manufacturing to five minutes. I mean, I'm super excited just to see this progress because it's going to be an accelerant for the industry to adopt some of these supercomputer technologies moving forward.
Rob Strechay
>> Yeah. Like you said, completely water cooled and they brought the water with them from California apparently. So I was having a little bit of trouble moving around. But yeah, great having you on, John. Thanks for coming by.
John Mao
>> Appreciate it.
Rob Strechay
>> And thank you for watching this segment. We'll be back with more from CES 2026. Stay tuned.