Andy Pernsteiner, Vast
Andy Pernsteiner of VAST Data speaks with theCUBE Research hosts at NVIDIA GTC '26 in the VAST Data VIP Lounge about the role of storage in artificial intelligence infrastructure. Pernsteiner outlines VAST Data's integrations with GPUDirect Storage, work on Dynamo and KV cache tuning for inference, open source foundation stacks and deployment playbooks and how these capabilities support training, multimodal pipelines and agentic workflows at scale. They emphasize storage as central to maximizing GPU utilization and minimizing idle time. Pernsteiner highlights that offloading large language model attention state to a KV cache yields a tenfold increase in inference throughput. They describe how VAST Data's global namespace, data-engine orchestration and integrated policy model enable organizations to scale pilots into production while meeting security, cost and performance targets. The conversation addresses practical considerations for adoption, including deployment playbooks, open source integration and optimizations for inference and training workloads to improve scalability and system efficiency.