When NVIDIA announced the BlueField-4-powered Inference Context Memory Storage Platform at CES in January, we immediately recognized its significance. As we wrote in our January blog post, the KV cache represents a paradigm shift — a new class of ephemeral AI data that demands purpose-built storage infrastructure optimized for speed and efficiency rather than traditional redundancy. That announcement validated what we had been building toward with our RDMA for S3-compatible storage capabilities.

vera rubin storage architecture

Today at GTC, NVIDIA is expanding that vision significantly with NVIDIA STX — a modular, rack-scale reference architecture that accelerates not just context memory, but the entire AI storage stack. Cloudian is proud to be a launch partner for NVIDIA STX, and we’re committed to delivering integrated solutions built on this architecture.

What Is NVIDIA STX?

NVIDIA STX is a composable reference architecture purpose-built for the AI factory. Where the Context Memory Storage Platform addressed a single — albeit critical — layer of the inference stack, STX provides the architectural blueprint for the full data lifecycle of enterprise AI. Think of it as the universal data engine for AI-native storage, spanning model training, real-time analytics, and agentic inference.

The architecture is built on three modular systems, each powered by next-generation NVIDIA silicon:

Storage Frontend System — Built on NVIDIA Vera CPUs, this layer accelerates partner data and storage software, enabling partners like Cloudian to run high-performance storage services on purpose-built AI infrastructure rather than legacy controllers.

AI Data Platform System — Powered by NVIDIA Rubin GPUs, this system transforms passive data lakes into high-speed knowledge engines for RAG pipelines, multimodal indexing, and enterprise AI data processing.

Storage Backend System — Driven by NVIDIA BlueField-4 DPUs and Vera CPUs, this is where the Context Memory Storage Platform lives. It accelerates context storage, large-scale indexing, and KV cache management for agentic AI workloads — delivering up to 5x higher tokens-per-second and 5x greater power efficiency than traditional storage.

All three systems are interconnected through NVIDIA Spectrum-X Ethernet networking, providing the AI-optimized RDMA fabric that links storage and compute with predictable, low-latency, high-bandwidth connectivity. The entire design is air-cooled with optional liquid cooling, and can be configured rack by rack to match specific workload requirements.

From Context Memory to Full-Stack AI Storage

If you read our CES blog post on ephemeral AI storage, you’ll recall the core insight: as agentic AI systems move from single-prompt queries to hours-long reasoning sessions, KV caches grow exponentially. When this context exceeds local GPU memory, performance collapses. The Context Memory Storage Platform (or “CMX”) — now a core component within the STX architecture — solves this by creating an entirely new storage tier specifically engineered for KV cache data.

This is not an extension of traditional storage. It is a fundamentally new layer in the data center — purpose-built infrastructure that sits between GPU memory and persistent network storage, optimized for the ephemeral, latency-sensitive nature of inference context. It demands its own hardware, its own software stack, and its own operational model.

STX takes this further by recognizing that context memory is just one part of a much larger challenge. Enterprises running AI at scale also need high-performance data access for model training, the ability to transform unstructured business data into AI-ready knowledge, and a storage backend that can sustain the throughput demands of massive GPU clusters. STX addresses all of these requirements in a single, modular reference design.

In NVIDIA’s own framing: STX transforms passive storage into a high-speed knowledge retrieval engine — the foundational data engine for the era of agentic AI.

Why This Matters for Cloudian Customers

Cloudian has been building toward this moment through deep investment in NVIDIA’s AI storage ecosystem:

RDMA for S3-Compatible Storage — Cloudian is a lead partner in NVIDIA’s effort to standardize RDMA for S3-compatible storage, enabling direct data transfers between object storage and GPU memory while bypassing CPU overhead entirely. This high-performance data transport layer delivers over 200 GB/s sustained throughput — 3x faster than non-RDMA flash — while reducing GPU server CPU utilization by 45%. It is foundational to how Cloudian integrates with the STX architecture.

AI Data Platform — Our HyperScale AI Data Platform, built on the NVIDIA AI Data Platform reference design, already delivers enterprise document RAG and multimodal AI capabilities — aligning directly with the AI Data Platform System within STX.

Context Memory Storage — The ephemeral storage tier within STX represents an entirely new class of infrastructure, distinct from persistent enterprise storage. Cloudian is building dedicated support for this new layer, enabling organizations to deploy high-speed KV cache infrastructure alongside — but architecturally separate from — their persistent data platforms.

With STX, these capabilities come together within a unified, NVIDIA-validated architecture. Cloudian customers deploying AI factories will be able to leverage STX-based configurations that span the full spectrum — from high-performance training data access through enterprise RAG to real-time agentic inference.

Expanding HyperScale AIDP: Support for RTX PRO 4500 Blackwell

Alongside STX, NVIDIA is also introducing a new GPU designed for mainstream enterprise AI. Cloudian will support the new GPU, the RTX PRO 4500 Blackwell Server Edition, as part of our HyperScale® AI Data Platform product family. The RTX PRO 4500 Blackwell is a power-efficient, single-slot GPU based on the NVIDIA Blackwell architecture, delivering breakthrough performance for AI inference, data processing, and visual computing workloads in mainstream enterprise data center and edge deployments.

With fifth-generation Tensor Cores, 32GB of GDDR7 memory, and a compact 165W form factor, the RTX PRO 4500 Blackwell is well-suited for enterprise AI workloads including agentic AI applications, RAG pipelines, vector search, and AI-enabled video — making it a natural complement to the HyperScale AI Data Platform’s on-premises, NVIDIA-validated reference design.

What Comes Next

We’re actively working with NVIDIA to build and validate Cloudian solutions on the STX reference architecture. Our goal is to deliver integrated, production-ready configurations that allow enterprises to deploy AI-native storage infrastructure across all three STX system layers.

The storage industry is undergoing its most significant architectural shift in decades. The emergence of agentic AI has created entirely new infrastructure requirements — not incremental improvements to existing storage, but fundamentally new storage tiers and data engines that didn’t exist before. NVIDIA STX provides the blueprint. Cloudian is building on it.

To learn more about how Cloudian and NVIDIA are advancing AI storage infrastructure, contact us to speak with our team.

Contact Cloudian here