Request a Demo
Join a 30 minute demo with a Cloudian expert.
Milvus is an open-source vector database engineered for managing, indexing, and searching massive collections of vector embeddings efficiently. It’s developed to handle complex similarity search workloads that arise with machine learning and artificial intelligence applications.
Unlike traditional relational databases that are optimized for structured data and tabular queries, Milvus is built to work with high-dimensional data, which is common in images, text, and audio used in AI tasks. Its architecture provides scalable, low-latency retrieval over billions of vectors, making it a go-to solution when you need real-time or near real-time search performance.
The database natively supports various vector indexing algorithms such as IVF, HNSW, and ANNOY, catering to diverse accuracy and latency demands. It also includes integration hooks for ML frameworks and supports multimodal data. Milvus is designed for cloud-native deployment, enabling elasticity, fault tolerance, and simple scaling across environments.
Developers can easily interact with Milvus using its RESTful API, SDKs, and integration with upstream tools, making it suitable for building production-grade similarity search and retrieval systems in enterprises or research.
This is part of a series of articles about AI infrastructure
In this article:
Retrieval Augmented Generation (RAG) is a natural use case for Milvus, as it involves integrating an external knowledge base with a large language model (LLM). In this setup, Milvus stores contextual vector embeddings of documents, articles, or other data you want the LLM to reference.
At query time, embeddings of the user input are generated and matched against the stored vectors to retrieve relevant contexts, which are fed to the LLM for more informed responses. This approach dramatically improves the accuracy and relevance of generated answers, especially when the base model alone lacks up-to-date or domain-specific knowledge.
Milvus is well-suited for powering recommendation systems that rely on high-dimensional vector similarity search. In such systems, user profiles, items, and contextual signals are embedded into vectors capturing preferences and item attributes.
Milvus efficiently matches user vectors against item vectors using nearest neighbor search, surfacing recommendations that align closely with the user’s past behavior or stated preferences. The high-throughput and low-latency search performance make it suitable for real-time recommendation serving in eCommerce, streaming services, and online platforms.
Milvus is also used in anomaly and fraud detection by enabling fast, scalable search for unusual patterns in large-scale vectorized data. Here, events (e.g., transactions, log entries, sensor readings) are converted into vectors via feature engineering or through learned embeddings.
The core task is to identify vectors that are distant from clusters of normal behavior, indicating potential anomalies or fraudulent activity. Milvus’s indexing and query capabilities allow organizations to perform these searches quickly, even as data scale grows.

The access layer consists of a group of stateless proxy nodes that serve as the entry point for all client requests. These proxies are responsible for:
Since proxies are stateless, they can be scaled horizontally without coordination overhead and can quickly recover from failures. They use load balancing mechanisms such as Nginx, Kubernetes Ingress, NodePort, and LVS to provide a unified service endpoint.
Milvus follows a massively parallel processing (MPP) model. Each proxy coordinates distributed execution by dispatching tasks to various worker nodes, collecting their responses, and then merging and refining the output before responding to the client.
The Coordinator is the control plane and the central decision-maker of the Milvus cluster. Only one coordinator is active at any given time, ensuring a single source of truth for managing system operations. Its responsibilities include:
This layer ensures consistency, system-wide coordination, and robust task scheduling across the Milvus ecosystem.
Worker nodes are the execution units that handle real-time ingestion, querying, and offline processing. They are stateless, which allows them to scale easily and restart without risk of data loss, thanks to Milvus’s separation of storage and computation. There are three distinct types of worker nodes:
Each worker node operates independently, following instructions from the Coordinator, which enables scalable parallel processing and fault isolation.
The storage layer provides persistent, distributed storage for all Milvus data, including metadata, logs, vectors, indexes, and query results. It consists of three main components:
Milvus and Pinecone differ significantly in architecture, deployment options, and feature depth. Milvus offers flexible deployment modes including local, standalone, cluster-based, and managed cloud (Zilliz Cloud), whereas Pinecone is restricted to a SaaS-only model. This gives Milvus more versatility for on-prem, hybrid, or BYOC (Bring Your Own Cloud) use cases.
Milvus supports a broader set of SDKs (Python, Java, Node.js, Go, C#, Rust) along with a RESTful API, while Pinecone is limited to Python and JavaScript/TypeScript. Milvus also supports GPU acceleration and more indexing options such as IVF, HNSW, SCANN, and GPU-accelerated indexes, offering better performance tuning and faster retrieval at scale. Pinecone lacks GPU support.
In terms of scalability, Milvus supports both scale-out and scale-up architectures with distributed compute and storage layers, making it suitable for workloads exceeding 10 billion vectors. Pinecone uses a pod-based architecture that only allows vertical scaling.
Data modeling in Milvus is more advanced, with support for multiple vector fields, structured scalar fields (including JSON), and strict or flexible schema modes. Pinecone supports only flat metadata with limited types and a 40KB metadata size cap. Milvus also includes tools for observability, backup, and integration such as Attu, Birdwatcher, CLI, and CDC connectors.
Milvus is focused on high-performance, large-scale vector search through a purpose-built distributed engine. It separates compute and storage, supports stateless nodes, and scales horizontally across billions of vectors with low latency. Weaviate, while also scalable, takes a modular approach with pluggable components and emphasizes semantic search via tight integration with transformers and vectorizers.
Milvus supports a wide variety of index types, including IVF, HNSW, DiskANN, and GPU-accelerated indexes, giving developers more control over latency-accuracy tradeoffs. Weaviate uses HNSW and Flat indexes but offers fewer performance-tuning knobs in comparison.
In terms of deployment, Milvus can run in standalone, clustered, or cloud-native environments (via Zilliz Cloud or BYOC), whereas Weaviate supports self-hosting and cloud deployments through its own managed service. Both systems provide RESTful APIs and SDKs, but Milvus supports more languages and tools, including robust connectors for Spark and Kafka.
On the data modeling side, Milvus offers structured schemas with support for multiple vector fields, scalar fields, and complex JSON types. Weaviate leans towards a semantic schema model, integrating vectorization at ingestion, which simplifies setup but can reduce flexibility in some advanced workflows.
Learn more in our detailed guide to vector databases (coming soon)
Milvus Lite makes it easy to get started with vector search using just Python, without the need for external services or infrastructure. This quick tutorial walks through setting up a local Milvus instance, inserting data, and running semantic search queries. Instructions are adapted from the Milvus documentation.
First, ensure you have Python 3.8 or higher. Install the pymilvus package, which includes both the Python client and Milvus Lite:
pip install -U pymilvus
To generate vector embeddings from text, you’ll also need the optional model package:
pip install "pymilvus[model]"
Milvus Lite stores data in a local file. Instantiate a client and create a new collection:
from pymilvus import MilvusClient
client = MilvusClient("milvus_demo.db")
if client.has_collection("demo_collection"):
client.drop_collection("demo_collection")
client.create_collection(collection_name="demo_collection", dimension=768)
This creates a collection with a vector field of 768 dimensions. The primary key and metric type (COSINE) are set by default.
Use the built-in model utility to generate vector representations for your text documents:
from pymilvus import model
embedding_fn = model.DefaultEmbeddingFunction()
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
vectors = embedding_fn.encode_documents(docs)
data = [
{"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}
for i in range(len(vectors))
]
client.insert(collection_name="demo_collection", data=data)
To search semantically, embed your query text and pass it to the search() method:
query_vectors = embedding_fn.encode_queries(["Who is Alan Turing?"])
res = client.search(
collection_name="demo_collection",
data=query_vectors,
limit=2,
output_fields=["text", "subject"],
)
print(res)
Results include matching documents, distances, and metadata fields.
You can insert additional documents with different metadata and perform filtered searches:
# Insert more documents under the "biology" subject
bio_docs = [
"Machine learning has been used for drug design.",
"Computational synthesis with AI algorithms predicts molecular properties.",
"DDR1 is involved in cancers and fibrosis.",
]
bio_vectors = embedding_fn.encode_documents(bio_docs)
bio_data = [
{"id": 3 + i, "vector": bio_vectors[i], "text": bio_docs[i], "subject": "biology"}
for i in range(len(bio_vectors))
]
client.insert(collection_name="demo_collection", data=bio_data)
# Filtered search: only return biology-related results
res = client.search(
collection_name="demo_collection",
data=embedding_fn.encode_queries(["tell me AI related information"]),
filter="subject == 'biology'",
limit=2,
output_fields=["text", "subject"],
)
print(res)
You can also query by metadata or ID, and delete entities:
# Query by filter
res = client.query(
collection_name="demo_collection",
filter="subject == 'history'",
output_fields=["text", "subject"],
)
# Delete by ID
client.delete(collection_name="demo_collection", ids=[0, 2])
# Delete by filter
client.delete(collection_name="demo_collection", filter="subject == 'biology'")
All data is stored in a local file (e.g., milvus_demo.db). Reconnect to the same database later by reusing the file path:
client = MilvusClient("milvus_demo.db")
To remove all data in a collection:
client.drop_collection("demo_collection")
This minimal setup is useful for prototyping and testing AI applications that require fast, local vector search. For production, Milvus supports scalable deployments on Docker, Kubernetes, and cloud environments.
Milvus leverages S3-compatible object storage as a persistent storage backend for vector data, index files, and metadata, making Cloudian’s object storage platforms an ideal foundation for Milvus deployments at any scale. In Milvus’s architecture, object storage serves as the durable layer for all vector embeddings, indexes, and binlogs, while the stateless query and data nodes access this data through the S3 API. Cloudian’s full S3 compatibility ensures seamless integration with Milvus’s storage layer, allowing organizations to deploy Milvus with enterprise-grade storage infrastructure that remains entirely on-premises or in private cloud environments.
Performance and Scalability Benefits
Cloudian HyperStore provides the high-throughput, low-latency object storage required for Milvus’s read-intensive workloads, particularly during vector similarity searches and index loading operations. The platform’s ability to scale horizontally across multiple nodes ensures that storage performance grows alongside Milvus clusters, supporting deployments from millions to billions of vectors without storage bottlenecks. For organizations running demanding AI workloads, Cloudian’s HyperScale AIDP with S3 RDMA technology delivers exceptional performance for Milvus deployments integrated with GPU-accelerated inference pipelines, minimizing latency between vector retrieval and downstream AI processing.
Operational Advantages
Using Cloudian as Milvus’s object storage backend provides several operational benefits critical for production AI systems. Cloudian’s multi-site replication and erasure coding ensure vector data durability and availability across geographic locations, protecting against data loss while enabling disaster recovery strategies. The platform’s support for versioning and lifecycle management allows organizations to implement sophisticated data retention policies for vector embeddings and indexes, archiving historical versions or managing storage costs for large-scale deployments. Additionally, Cloudian’s enterprise security features—including encryption at rest, access controls, and audit logging—ensure that sensitive embedding data and proprietary knowledge assets stored in Milvus remain protected and compliant with regulatory requirements.
Cost and Sovereignty
For organizations managing large vector databases, Cloudian’s economics provide significant advantages over cloud-based object storage. The platform’s unlimited scalability without egress fees or API charges makes it cost-effective for Milvus deployments with frequent read operations and large-scale vector ingestion workflows. Perhaps most importantly, Cloudian enables complete data sovereignty for Milvus deployments, allowing organizations to maintain full control over their vector embeddings and semantic search infrastructure—critical for enterprises in regulated industries or those building proprietary AI systems where data cannot be stored in public cloud environments.