The Emerging AI Procurement Triangle: AMD, ByteDance, and CoreWeave

NVIDIA’s announcement that its next-generation Vera Rubin AI platform has entered full production signals a pivotal shift in the AI infrastructure race—one now dominated by cost-per-token economics. Claiming tenfold throughput gains over Grace Blackwell at one-tenth the token cost, Rubin is engineered for 'agentic AI' workloads where systems autonomously reason and execute complex tasks. While this leap appears to cement NVIDIA’s dominance, it simultaneously exposes the fragility of its walled-garden ecosystem. As compute becomes commoditized, customers are actively seeking alternatives. Into this breach steps an unconventional procurement triangle: AMD, ByteDance, and CoreWeave—quietly reshaping the power dynamics of global AI infrastructure. ByteDance is no stranger to diversifying away from NVIDIA. As early as 2023, the company deployed AMD MI300X-based training clusters outside mainland China to support its large language model development. Internal benchmarks reportedly showed MI300X achieving 90% of A100’s performance on specific NLP tasks—at roughly 35% lower acquisition cost. Crucially, while AMD’s ROCm software stack still lags behind CUDA, its integration with PyTorch and other mainstream frameworks has reached production-grade viability. For a company with thousands of AI engineers, the migration overhead is substantial but strategically justified: avoiding vendor lock-in and gaining leverage in procurement negotiations. CoreWeave plays an even more disruptive role. Founded by former Wall Street traders, the cloud provider has specialized in GPU-as-a-Service since 2020, deliberately avoiding head-on competition with AWS or Azure by targeting AI-native startups. Its model hinges on rapid deployment of cutting-edge GPUs with flexible pricing. But with H100 rentals soaring to $40,000 per card per month amid persistent supply constraints, CoreWeave turned to AMD. In early 2025, it struck a strategic partnership to deploy over 100,000 MI300-series chips by 2026—the largest non-NVIDIA AI cloud globally. This move not only mitigates supply risk but unlocks pricing innovation, such as token-based billing that directly counters Rubin’s cost advantage. System vendors like Dell, HPE, Lenovo, and Supermicro are also shifting stance. After two years of near-exclusive reliance on NVIDIA, they now face pressure to diversify. AMD shipped over 300% more MI300X units in Q4 2025 year-over-year, compelling OEMs to act. Supermicro has rolled out a full MI300X server lineup; Dell and HPE have added AMD options to PowerEdge and Cray EX platforms, respectively. Oracle, though publicly silent, has internal data showing MI300X outperforming L40S in database acceleration—a sign that hardware diversification is gaining traction beyond inference workloads. Intel remains conspicuously sidelined. Despite Gaudi 3’s respectable benchmark scores, its lack of software ecosystem prevents mainstream adoption in training. More critically, process node delays mean its next-gen AI chip won’t ramp before 2026, missing a crucial window. AMD, by contrast, leverages TSMC’s 4nm and 3nm nodes to ensure stable MI300 supply—its physical foundation for competing with NVIDIA. I judge that the AI compute market is transitioning from winner-takes-all to multi-polar coexistence. NVIDIA’s Rubin may be formidable, but its high cost and closed architecture are catalyzing counter-forces. ByteDance embodies end-user demand for supply chain resilience; CoreWeave represents cloud providers’ need for differentiation; AMD delivers technical feasibility. Together, they form a credible alternative path that may not dethrone CUDA but will force NVIDIA to open interfaces and reduce licensing fees—ultimately steering the industry toward healthier competition. The critical question ahead: if AI training costs drop tenfold, lowering innovation barriers, does that erode the technological moats of Big Tech? Compute democratization could spur creativity—or trigger inefficient fragmentation. The real battleground may soon shift from raw chip performance to data quality and algorithmic efficiency.