Waleed Atallah, Makora
Waleed Atallah of Makora, co-founder and chief executive officer, joins theCUBE Research hosts Gemma Allen and John Furrier to discuss artificial intelligence, AI, factories at NYSE Wired. Atallah brings deep expertise in GPU and TPU kernel optimization, performance engineering and serving open-source models. The conversation examines GPU supply constraints, the role of kernels versus the CUDA moat, hardware-agnostic strategies and Makora's approach to delivering fast cost-efficient inference across diverse accelerators. Atallah emphasizes that small kernel and performance gains scale to substantial cost savings. They note that a 2–3% utilization improvement can free the equivalent capacity of thousands of GPUs in large clusters. They highlight Makora's inference platform, which delivers faster and lower-cost tokens with open-source models and enables cost-efficient inference across GPUs and TPUs. They predict consolidation or strategic partnerships as GPU supply constraints drive providers toward acquisition or alliances. The discussion addresses data center compute, AI infrastructure, CUDA performance considerations and tokenomics for model serving. This episode provides actionable insights for data center operators, performance engineers and AI infrastructure teams seeking to maximize inference throughput and cost-efficiency across accelerators.