CES 2026 | John Mao, VAST Data | CES 2026

Clips
More from CES 2026

John Mao

Global Head of Business Development

VAST Data

Rob Strechay

Dir./Principal Analyst & Host

theCUBE Research

Vast Data weighs in on Nvidia’s Vera Rubin and the future of AI context storage

Nvidia Corp.’s CES 2026 keynote delivered more than a hardware reveal — it signaled a fundamental rethinking of how AI systems are built, scaled and fed with data. At the center of the conversation was Vera Rubin, Nvidia’s next-generation AI system, and a critical but often overlooked component: context storage.Vera Rubin isn’t a single GPU; it’s an entirely re-architected system composed of multiple chips working in concert, according to John Mao (pictured, left), vice president of global technology alliances at Vast Data Inc.“I think Jensen [Huang] did a great job explaining how they had to reinvent the entire system,” Mao said. “It’s not just a new GPU, but six different chips

play_circle_outline Introduction at CES 2026 with focus on NVIDIA's keynote and VAST Data's involvement.

play_circle_outline Exploring AI Opportunities Across Sports, Media, and Robotics: Classic Enterprises Embrace Transformation in the Digital Era

Info
Transcript

John Mao, VAST Data | CES 2026

John Mao

Global Head of Business Development VAST Data

Rob Strechay

Dir./Principal Analyst & Host theCUBE Research

HOST

In this interview from CES 2026, John Mao, vice president of global strategic alliances at VAST Data, joins theCUBE’s Rob Strechay to unpack VAST's pivotal role in NVIDIA’s Vera Rubin system announcement. The discussion centers on the reinvention of the AI stack, specifically the evolution of KV cache storage to support larger models and longer reasoning capabilities. Mao explains how VAST is moving beyond the limitations of local high-bandwidth memory by utilizing NVIDIA’s BlueField-4 DPUs and Spectrum-X networking to create an infinitely scalable pool of NV... Read more

explore Keep Exploring

What was discussed during the NVIDIA keynote at CES 2026 regarding the Vera Rubin system? add

What impact does AI have on various industries, particularly in relation to consumer electronics and data centers? add

bolt Powered by CUBE AI

John Mao, VAST Data | CES 2026

search

Rob Strechay

>> Hello and welcome back to CES 2026. We're here live from Las Vegas. And again, really diving into what's going on with some of the new and innovative stuff that's been announced here live at the NVIDIA keynote that they're doing at the Fountain Blue. And I'm joined by one of my friends here, John Mao, who's the VP of strategic or Global Strategic Alliances for VAST Data. You guys were on there. You were in there, you had your logo up on there and there was some really interesting stuff when they got into the Vera Rubin entire, I would say, system. Because I wouldn't call it a chip because it's many chips and it's many systems. But why don't you kind of help us understand and help people understand and unpack this, how you're playing with the context storage portion of Vera Rubin.

John Mao

>> Sure. Yeah. I mean, I think Jensen did a great job explaining how they had to reinvent the entire system. It's not just a new GPU, but six different chips that you mentioned. And I think a lot of that also spills into the rest of the stack. And the rest of the stack in this context, no pun intended, is reinventing how you do things like KV cache, and how do we evolve when models get bigger, when a longer reasoning starts to happen, when there's more turns that are happening on inferencing, that means that different paradigms require for storing KV cache. KV cache used to be very kind of local to the GPU, to the memory, to the high bandwidth memory, HBM, on a particular node. But obviously, that's not good enough if you're trying to store very long conversations. If you're trying to grow that context over time, you need a different method. And so a lot of that development that VAST has been doing with NVIDIA is in how do we build and re-architect that part of the stack for these new systems that are going into deployment?

Rob Strechay

>> Yeah. Which makes total sense with the VAST OS, the AIOS and where you're going with this and how you're really bringing things together. But like you said, you hit on it a little bit. People have been really investing in and using KV cache and using things like NVME inside the servers. Help people understand how this is different from that and how they can gain because Jensen showed some pretty impressive gains in reasoning with Vera Rubin.

John Mao

>> Yeah. So I mean, using local NVME SSDs as part of a GPU server is one way to extend and kind of grow that KV cache. But the other way to do it, especially given all of the innovation that NVIDIA is doing on faster networks, innovations like Spectrum X networking, also DPUs like the BlueField-4, which is a very cornerstone part of the announcement today, it allows us to think about could we build an even more scalable context cache outside of the physical GPU server. So spilling out across the network, doing that in a high bandwidth, low latency way is instrumental. So yes, local NVME is good, but imagine a world where we had an infinitely scalable pool of NVME across a very fast fabric being able to do that. And that's part of the announcement today.

Rob Strechay

>> Yeah. And helping feed those GPUs because as we know, AI only goes as fast as the data goes in that. So yeah, you hit on the BlueField-4 and things of that nature. And that has to be a piece of it, is moving that data very quickly. And you have your shared everything architecture that really scales that way. Has that been what your customers and what you've been hearing from the customers?

John Mao

>> Yeah. We've been partnering with NVIDIA on BlueField for gosh, many years now. We've been using BlueField in our designs and our systems since the first generation, BlueField-1s, back in the day. And part of what was really cool about the announcement today is not only kind of the direction of this is coming, but we've actually ... I think we're the first one to actually validate an end-to-end BlueField solution for context memory ... to be able to extend that context memory across the wire, across the network. So BlueField-4 becomes kind of an end-to-end solution from us, not only to house the SSDs, the NVME SSDs on the other end of the network, but also to run a lot of the vast AIOS software inside of the GPU machine while giving you crazy good bandwidth performance using RDMA for a low latency access straight back into GPU, HBM.

Rob Strechay

>> Yeah. I mean, I look at this, I mean, obviously it's consumer electronic show, CES, you're going to be on a panel on Wednesday. Kind of talk a little bit to that because I think you have so many customers in so many different fields that actually touch the consumer.

John Mao

>> I think I saw somewhere, I forgot who posted it, someone from NVIDIA, but yeah, this is a consumer electronic show, but the data center touches the consumers indirectly when it comes to AI. So we're seeing AI, as everyone knows. It's infiltrated every single industry, right? Sports and media is obviously very a big market when it comes to unstructured data around video, leveraging AI to be able to understand what's happening in video and being able to democratize that access is game changing for a lot of organizations, including in media entertainment. So yeah, we're super excited. We're working here at VAST, not only in entertainment, but you name it, right? Whether it's in robotics and physical AI, becoming a very, very ... I mean, we saw Jensen talk about it for the first 45 minutes today, right?

Rob Strechay

>> Right.

John Mao

>> Becoming a material part of the AI story in the market, but also your classic enterprises are starting to move AI into production more and more increasingly. So things like KV cache, again, come full circle, become an instrumental part of that design moving forward.

Rob Strechay

>> Absolutely. And I think that whole ... I mean, to me, that was the biggest announcement out of the Vera Rubin was the context storage and being able to bring that and have it more centralized and feed out over super low latency, high bandwidth. And I think to me that was great. Last parting thought.

John Mao

>> I haven't been to CES in quite some time, so I'm just very excited to check out. Jensen always does a bang up job. I think one of the things that was maybe skirted over, at least for me in the past, was the improvements on just how they've also changed the physical packaging from generation one of their supercomputing with Grace Hopper, Grace Blackwell to Vera Rubin, just the simplification going from two hours of manufacturing to five minutes. I mean, I'm super excited just to see this progress because it's going to be an accelerant for the industry to adopt some of these supercomputer technologies moving forward.

Rob Strechay

>> Yeah. Like you said, completely water cooled and they brought the water with them from California apparently. So I was having a little bit of trouble moving around. But yeah, great having you on, John. Thanks for coming by.

John Mao

>> Appreciate it.

Rob Strechay

>> And thank you for watching this segment. We'll be back with more from CES 2026. Stay tuned.