Nvidia BlueField-4 STX provides a context reminiscence layer to storage to shut the agentic AI throughput hole

Nvidia BlueField-4 STX provides a context reminiscence layer to storage to shut the agentic AI throughput hole

Last Updated: March 17, 2026By


When an AI agent loses context mid-task as a result of conventional storage can't hold tempo with inference, it isn’t a mannequin drawback — it’s a storage drawback. At GTC 2026, Nvidia introduced BlueField-4 STX, a modular reference structure that inserts a devoted context reminiscence layer between GPUs and conventional storage, claiming 5x the token throughput, 4x the vitality effectivity and 2x the info ingestion velocity of standard CPU-based storage.

The bottleneck STX targets is key-value cache knowledge. KV cache is the saved file of what a mannequin has already processed — the intermediate calculations an LLM saves so it doesn’t need to recompute consideration throughout your entire context on each inference step. It’s what permits an agent to take care of coherent working reminiscence throughout periods, device calls and reasoning steps. As context home windows develop and brokers take extra steps, that cache grows with them. When it has to traverse a standard storage path to get again to the GPU, inference slows and GPU utilization drops.

STX isn’t a product Nvidia sells instantly. It’s a reference structure the corporate is distributing to its storage companion ecosystem so distributors can construct AI-native infrastructure round it.

STX places a context reminiscence layer between GPU and disk

The structure is constructed round a brand new storage-optimized BlueField-4 processor that mixes Nvidia's Vera CPU with the ConnectX-9 SuperNIC. It runs on Spectrum-X Ethernet networking and is programmable by Nvidia's DOCA software program platform.

The primary rack-scale implementation is the Nvidia CMX context reminiscence storage platform. CMX extends GPU reminiscence with a high-performance context layer designed particularly for storing and retrieving KV cache knowledge generated by giant language fashions throughout inference. Maintaining that cache accessible with out forcing a spherical journey by general-purpose storage is what CMX is designed to do.

"Conventional knowledge facilities present high-capacity, general-purpose storage, however typically lack the responsiveness required for interplay with AI brokers that must work throughout many steps, instruments and totally different periods," Ian Buck, Nvidia's vp of hyperscale and high-performance computing mentioned in a briefing with press and analysts.

In response to a query from VentureBeat, Buck confirmed that STX additionally ships with a software program reference platform alongside the {hardware} structure. Nvidia is increasing DOCA to incorporate a brand new element referred to within the briefing as DOCA Memo. 

"Our storage suppliers can leverage the programmability of the BlueField-4 processor to optimize storage for the agentic AI manufacturing unit," Buck mentioned. "Along with having a reference rack structure, we're additionally offering a reference software program platform for them to ship these improvements and optimizations for his or her prospects."

Storage companions constructing on STX get each a {hardware} reference design and a software program reference platform — a programmable basis for context-optimized storage.

Nvidia's companion record spans storage incumbents and AI-native cloud suppliers

Storage suppliers co-designing STX-based infrastructure embody Cloudian, DDN, Dell Applied sciences, Everpure, Hitachi Vantara, HPE, IBM, MinIO, NetApp, Nutanix, VAST Information and WEKA. Manufacturing companions constructing STX-based programs embody AIC, Supermicro and Quanta Cloud Expertise.

On the cloud and AI facet, CoreWeave, Crusoe, IREN, Lambda, Mistral AI, Nebius, Oracle Cloud Infrastructure and Vultr have all dedicated to STX for context reminiscence storage.

That mixture of enterprise storage incumbents and AI-native cloud suppliers is the sign value watching. Nvidia isn’t positioning STX as a specialty product for hyperscalers. It’s positioning it because the reference commonplace for anybody constructing storage infrastructure that has to serve agentic AI workloads — which, inside the subsequent two to a few years, is prone to embody most enterprise AI deployments operating multi-step inference at scale.

STX-based platforms shall be obtainable from companions within the second half of 2026.

IBM exhibits what the info layer drawback seems like in manufacturing

IBM sits on each side of the STX announcement. It’s listed as a storage supplier co-designing STX-based infrastructure, and Nvidia individually confirmed that it has chosen IBM Storage Scale System 6000 — licensed and validated on Nvidia DGX platforms — because the high-performance storage basis for its personal GPU-native analytics infrastructure.

IBM additionally introduced a broader expanded collaboration with Nvidia at GTC, together with GPU-accelerated integration between IBM's watsonx.knowledge Presto SQL engine and Nvidia's cuDF library. A manufacturing proof of idea with Nestlé put numbers on what that acceleration seems like: an information refresh cycle throughout the corporate's Order-to-Money knowledge mart, overlaying 186 international locations and 44 tables, dropped from quarter-hour to a few minutes. IBM reported 83% value financial savings and a 30x price-performance enchancment.

The Nestlé result’s a structured analytics workload. It doesn’t instantly exhibit agentic inference efficiency. Nevertheless it makes IBM and Nvidia's shared argument concrete: the info layer is the place enterprise AI efficiency is presently constrained, and GPU-accelerating it produces materials ends in manufacturing.

Why the storage layer is changing into a first-class infrastructure determination

STX is a sign that the storage layer is changing into a first-class concern in enterprise AI infrastructure planning, not an afterthought to GPU procurement.

Basic-purpose NAS and object storage weren’t designed to serve KV cache knowledge at inference latency necessities. STX-based programs from companions together with Dell, HPE, NetApp and VAST Information are what Nvidia is placing ahead as the sensible various, with the DOCA software program platform offering the programmability layer to tune storage conduct for particular agentic workloads.

The efficiency claims — 5x token throughput, 4x vitality effectivity, 2x knowledge ingestion — are measured in opposition to conventional CPU-based storage architectures. Nvidia has not specified the precise baseline configuration for these comparisons. Earlier than these numbers drive infrastructure selections, the baseline is value pinning down.

Platforms are anticipated from companions within the second half of 2026. Given that almost all main storage distributors are already co-designing on STX, enterprises evaluating storage refreshes for AI infrastructure within the subsequent 12 months ought to anticipate STX-based choices to be obtainable from their present vendor relationships.


Source link

Leave A Comment

you might also like