Val Bercovici, WEKA & Daniel Kearney, Firmus
In this interview from the Nvidia GTC AI Conference and Expo, Val Bercovici, chief AI officer of WEKA, joins Daniel Kearney, chief technology officer of Firmus, to talk with theCUBE + NYSE Wired's Gemma Allen about how memory is emerging as the critical bottleneck — and competitive advantage — in the shift from chatbots to agentic AI. Bercovici explains how insatiable agent-driven token demand has transformed KV cache from a background concern into a keynote-level topic in just twelve months. A joint proof of concept between WEKA and Firmus demonstrated that arbitraging storage for memory can yield 6.5 times more tokens from the same GPUs and energy budget — the equivalent of creating five and a half new data centers out of thin air. The conversation also explores how the rise of LPUs alongside GPUs is creating a heterogeneous inference architecture where prefill, context memory and decode each occupy distinct layers of the stack. Kearney details Firmus's "model to grid" philosophy, designing AI factories where every watt counts — from accelerated compute and thermal management to grid integration. He highlights the company's expansion into Australia through Project Southgate, which will deploy up to 2.7 gigawatts of AI capacity over the next two to three years alongside sovereign cloud deployments in Singapore. From engineering out hardware obsolescence to enabling sovereign nations to maximize token output within finite energy budgets, both leaders outline why efficiency-first infrastructure will separate the winners from the losers in the agentic era.