Betsy Chernoff, WEKA, & Ace Stryker, Solidigm
In this interview from the Nvidia GTC AI Conference and Expo, Betsy Chernoff, principal AI and product marketing manager at WEKA, joins Ace Stryker, director of AI marketing and ecosystem at Solidigm, to talk with theCUBE + NYSE Wired's Gemma Allen about why exploding context memory demands are creating an entirely new tier of storage in AI infrastructure. Stryker explains how Nvidia's CMX platform reflects a fundamental shift — storage has moved beyond feeding GPUs and housing shared data to now hosting dedicated nodes for petabytes of KV cache. Chernoff highlights how WEKA's augmented memory grid was built for persistent KV cache storage, complementing both STX and CMX across current and next-generation Nvidia platforms including Vera Rubin and BlueField-4. The conversation also explores the practical economics of keeping GPUs running at full utilization. Stryker points to MLPerf storage benchmark data showing a direct one-to-one correlation between storage bandwidth and the number of GPUs a system can keep busy, noting that real-world utilization often falls between 50% and 80% — significant untapped capacity in a supply-constrained environment. Chernoff shares results from a production-grade proof of concept with Firmus that delivered a 6x improvement in tokens per second using WEKA's augmented memory grid, demonstrating how persistent KV cache storage translates directly into throughput gains. The discussion also touches on the global NAND shortage and the evolving shape of buyer personas as AI clouds, model builders and enterprises converge around similar infrastructure challenges. From squeezing more cycles out of landed GPU capacity to building software-defined platforms that bridge today's hardware to tomorrow's AI factories, both guests provide a practical roadmap for navigating the memory-constrained AI era.