Milvus Vector Database: Uses, Architecture && Quick Tutorial

What Is Milvus Vector Database?

Milvus is an open-source vector database engineered for managing, indexing, and searching massive collections of vector embeddings efficiently. It’s developed to handle complex similarity search workloads that arise with machine learning and artificial intelligence applications.

Unlike traditional relational databases that are optimized for structured data and tabular queries, Milvus is built to work with high-dimensional data, which is common in images, text, and audio used in AI tasks. Its architecture provides scalable, low-latency retrieval over billions of vectors, making it a go-to solution when you need real-time or near real-time search performance.

The database natively supports various vector indexing algorithms such as IVF, HNSW, and ANNOY, catering to diverse accuracy and latency demands. It also includes integration hooks for ML frameworks and supports multimodal data. Milvus is designed for cloud-native deployment, enabling elasticity, fault tolerance, and simple scaling across environments.

Developers can easily interact with Milvus using its RESTful API, SDKs, and integration with upstream tools, making it suitable for building production-grade similarity search and retrieval systems in enterprises or research.

This is part of a series of articles about AI infrastructure

In this article:

Common Milvus Use Cases
Milvus Architecture Overview
Milvus vs. Other Vector Databases
Tutorial: Getting Started with Milvus

Common Milvus Use Cases

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a natural use case for Milvus, as it involves integrating an external knowledge base with a large language model (LLM). In this setup, Milvus stores contextual vector embeddings of documents, articles, or other data you want the LLM to reference.

At query time, embeddings of the user input are generated and matched against the stored vectors to retrieve relevant contexts, which are fed to the LLM for more informed responses. This approach dramatically improves the accuracy and relevance of generated answers, especially when the base model alone lacks up-to-date or domain-specific knowledge.

Recommendation Systems

Milvus is well-suited for powering recommendation systems that rely on high-dimensional vector similarity search. In such systems, user profiles, items, and contextual signals are embedded into vectors capturing preferences and item attributes.

Milvus efficiently matches user vectors against item vectors using nearest neighbor search, surfacing recommendations that align closely with the user’s past behavior or stated preferences. The high-throughput and low-latency search performance make it suitable for real-time recommendation serving in eCommerce, streaming services, and online platforms.

Anomaly and Fraud Detection

Milvus is also used in anomaly and fraud detection by enabling fast, scalable search for unusual patterns in large-scale vectorized data. Here, events (e.g., transactions, log entries, sensor readings) are converted into vectors via feature engineering or through learned embeddings.

The core task is to identify vectors that are distant from clusters of normal behavior, indicating potential anomalies or fraudulent activity. Milvus’s indexing and query capabilities allow organizations to perform these searches quickly, even as data scale grows.

Milvus Architecture Overview

milvus_architecture_2_6

Layer 1: Access Layer

The access layer consists of a group of stateless proxy nodes that serve as the entry point for all client requests. These proxies are responsible for:

Validating and preprocessing client requests
Aggregating and post-processing intermediate results from worker nodes
Returning final results to the user

Since proxies are stateless, they can be scaled horizontally without coordination overhead and can quickly recover from failures. They use load balancing mechanisms such as Nginx, Kubernetes Ingress, NodePort, and LVS to provide a unified service endpoint.

Milvus follows a massively parallel processing (MPP) model. Each proxy coordinates distributed execution by dispatching tasks to various worker nodes, collecting their responses, and then merging and refining the output before responding to the client.

Layer 2: Coordinator

The Coordinator is the control plane and the central decision-maker of the Milvus cluster. Only one coordinator is active at any given time, ensuring a single source of truth for managing system operations. Its responsibilities include:

DDL/DCL/TSO management: Handles Data Definition Language (DDL) operations such as creating or deleting collections, partitions, and indexes, as well as Data Control Language (DCL) operations. It also issues timestamps using the Timestamp Oracle (TSO) and manages time tickers to maintain temporal consistency across distributed components.
Streaming service management: Manages bindings between the Write-Ahead Log (WAL) and Streaming Nodes, ensuring correct data ingestion. It also provides service discovery for real-time data ingestion components.
Query management: Maintains the topology of Query Nodes, performs load balancing, and oversees the assignment of query workloads. It generates and updates the query views used for intelligent query routing.
Historical data management: Oversees offline data processing tasks like compaction and index building by dispatching them to Data Nodes. It also tracks the topology and lifecycle of data segments and views.

This layer ensures consistency, system-wide coordination, and robust task scheduling across the Milvus ecosystem.

Layer 3: Worker Nodes

Worker nodes are the execution units that handle real-time ingestion, querying, and offline processing. They are stateless, which allows them to scale easily and restart without risk of data loss, thanks to Milvus’s separation of storage and computation. There are three distinct types of worker nodes:

Streaming node: Acts as a shard-level “mini-coordinator.” It ensures shard-level consistency and fault recovery using the WAL. It handles real-time data ingestion, processes queries against growing data, and is responsible for generating query plans. Over time, it transitions growing data into sealed segments for historical querying.
Query node:: Loads sealed (historical) data from object storage and processes queries against it. It supports high-throughput querying of large-scale, pre-processed datasets stored persistently in the storage layer.
Data node: Responsible for offline processing tasks on historical data, including data compaction (merging smaller segments into larger ones for efficiency) and index building (creating vector and scalar indexes for fast query performance).

Each worker node operates independently, following instructions from the Coordinator, which enables scalable parallel processing and fault isolation.

Layer 4: Storage

The storage layer provides persistent, distributed storage for all Milvus data, including metadata, logs, vectors, indexes, and query results. It consists of three main components:

Meta storage: Stores metadata such as collection schemas and consumption checkpoints. Milvus uses etcd for this purpose due to its strong consistency, high availability, and support for transactions. Etcd is also used for service registration and health checking within the cluster.
Object storage: Stores large binary files such as sealed segment data, index files, and query results. Milvus supports MinIO for on-prem deployments and integrates with cloud storage solutions like AWS S3 and Azure Blob Storage. Object storage provides elasticity and durability, but has higher access latency and charges per request. To optimize performance and cost, Milvus plans to implement cold-hot data separation using in-memory or SSD-based caching for frequently accessed data.
WAL (Write-Ahead Log) storage: Ensures data durability and recovery by recording all changes before they are committed to permanent storage. WAL allows the system to recover from failures by replaying logs. Milvus supports common WAL implementations such as Kafka, Pulsar, and its native solution, Woodpecker. Woodpecker uses a cloud-native, zero-disk design, writing directly to object storage. This eliminates the need for local disk management and supports effortless scaling.

Milvus vs. Other Vector Databases

Milvus vs. Pinecone

Milvus and Pinecone differ significantly in architecture, deployment options, and feature depth. Milvus offers flexible deployment modes including local, standalone, cluster-based, and managed cloud (Zilliz Cloud), whereas Pinecone is restricted to a SaaS-only model. This gives Milvus more versatility for on-prem, hybrid, or BYOC (Bring Your Own Cloud) use cases.

Milvus supports a broader set of SDKs (Python, Java, Node.js, Go, C#, Rust) along with a RESTful API, while Pinecone is limited to Python and JavaScript/TypeScript. Milvus also supports GPU acceleration and more indexing options such as IVF, HNSW, SCANN, and GPU-accelerated indexes, offering better performance tuning and faster retrieval at scale. Pinecone lacks GPU support.

In terms of scalability, Milvus supports both scale-out and scale-up architectures with distributed compute and storage layers, making it suitable for workloads exceeding 10 billion vectors. Pinecone uses a pod-based architecture that only allows vertical scaling.

Data modeling in Milvus is more advanced, with support for multiple vector fields, structured scalar fields (including JSON), and strict or flexible schema modes. Pinecone supports only flat metadata with limited types and a 40KB metadata size cap. Milvus also includes tools for observability, backup, and integration such as Attu, Birdwatcher, CLI, and CDC connectors.

Milvus vs. Weaviate

Milvus is focused on high-performance, large-scale vector search through a purpose-built distributed engine. It separates compute and storage, supports stateless nodes, and scales horizontally across billions of vectors with low latency. Weaviate, while also scalable, takes a modular approach with pluggable components and emphasizes semantic search via tight integration with transformers and vectorizers.

Milvus supports a wide variety of index types, including IVF, HNSW, DiskANN, and GPU-accelerated indexes, giving developers more control over latency-accuracy tradeoffs. Weaviate uses HNSW and Flat indexes but offers fewer performance-tuning knobs in comparison.

In terms of deployment, Milvus can run in standalone, clustered, or cloud-native environments (via Zilliz Cloud or BYOC), whereas Weaviate supports self-hosting and cloud deployments through its own managed service. Both systems provide RESTful APIs and SDKs, but Milvus supports more languages and tools, including robust connectors for Spark and Kafka.

On the data modeling side, Milvus offers structured schemas with support for multiple vector fields, scalar fields, and complex JSON types. Weaviate leans towards a semantic schema model, integrating vectorization at ingestion, which simplifies setup but can reduce flexibility in some advanced workflows.

Learn more in our detailed guide to vector databases (coming soon)

Tutorial: Getting Started with Milvus

Milvus Lite makes it easy to get started with vector search using just Python, without the need for external services or infrastructure. This quick tutorial walks through setting up a local Milvus instance, inserting data, and running semantic search queries. Instructions are adapted from the Milvus documentation.

1. Install Milvus and Dependencies

First, ensure you have Python 3.8 or higher. Install the pymilvus package, which includes both the Python client and Milvus Lite:

pip install -U pymilvus

To generate vector embeddings from text, you’ll also need the optional model package:

pip install "pymilvus[model]"

2.Create a Local Milvus Database

Milvus Lite stores data in a local file. Instantiate a client and create a new collection:

from pymilvus import MilvusClient

client = MilvusClient("milvus_demo.db")

if client.has_collection("demo_collection"):

client.drop_collection("demo_collection")

client.create_collection(collection_name="demo_collection", dimension=768)

This creates a collection with a vector field of 768 dimensions. The primary key and metric type (COSINE) are set by default.

3.Generate Embeddings and Insert Data

Use the built-in model utility to generate vector representations for your text documents:

from pymilvus import model

embedding_fn = model.DefaultEmbeddingFunction()

docs = [

"Artificial intelligence was founded as an academic discipline in 1956.",

"Alan Turing was the first person to conduct substantial research in AI.",

"Born in Maida Vale, London, Turing was raised in southern England.",

]

vectors = embedding_fn.encode_documents(docs)

data = [

{"id": i, "vector": vectors[i], "text": docs[i], "subject": "history"}

for i in range(len(vectors))

]

client.insert(collection_name="demo_collection", data=data)

4. Perform Semantic Search

To search semantically, embed your query text and pass it to the search() method:

query_vectors = embedding_fn.encode_queries(["Who is Alan Turing?"])

res = client.search(

collection_name="demo_collection",

data=query_vectors,

limit=2,

output_fields=["text", "subject"],

)

print(res)

Results include matching documents, distances, and metadata fields.

5. Add More Data and Use Filters

You can insert additional documents with different metadata and perform filtered searches:

# Insert more documents under the "biology" subject

bio_docs = [

"Machine learning has been used for drug design.",

"Computational synthesis with AI algorithms predicts molecular properties.",

"DDR1 is involved in cancers and fibrosis.",

]

bio_vectors = embedding_fn.encode_documents(bio_docs)

bio_data = [

{"id": 3 + i, "vector": bio_vectors[i], "text": bio_docs[i], "subject": "biology"}

for i in range(len(bio_vectors))

]

client.insert(collection_name="demo_collection", data=bio_data)

# Filtered search: only return biology-related results

res = client.search(

collection_name="demo_collection",

data=embedding_fn.encode_queries(["tell me AI related information"]),

filter="subject == 'biology'",

limit=2,

output_fields=["text", "subject"],

)

print(res)

6. Query and Delete Data

You can also query by metadata or ID, and delete entities:

# Query by filter

res = client.query(

collection_name="demo_collection",

filter="subject == 'history'",

output_fields=["text", "subject"],

)

# Delete by ID

client.delete(collection_name="demo_collection", ids=[0, 2])

# Delete by filter

client.delete(collection_name="demo_collection", filter="subject == 'biology'")

7. Persist and Reload Data

All data is stored in a local file (e.g., milvus_demo.db). Reconnect to the same database later by reusing the file path:

client = MilvusClient("milvus_demo.db")

To remove all data in a collection:

client.drop_collection("demo_collection")

This minimal setup is useful for prototyping and testing AI applications that require fast, local vector search. For production, Milvus supports scalable deployments on Docker, Kubernetes, and cloud environments.

Vector Database Storage with Cloudian

Milvus leverages S3-compatible object storage as a persistent storage backend for vector data, index files, and metadata, making Cloudian’s object storage platforms an ideal foundation for Milvus deployments at any scale. In Milvus’s architecture, object storage serves as the durable layer for all vector embeddings, indexes, and binlogs, while the stateless query and data nodes access this data through the S3 API. Cloudian’s full S3 compatibility ensures seamless integration with Milvus’s storage layer, allowing organizations to deploy Milvus with enterprise-grade storage infrastructure that remains entirely on-premises or in private cloud environments.

Performance and Scalability Benefits

Cloudian HyperStore provides the high-throughput, low-latency object storage required for Milvus’s read-intensive workloads, particularly during vector similarity searches and index loading operations. The platform’s ability to scale horizontally across multiple nodes ensures that storage performance grows alongside Milvus clusters, supporting deployments from millions to billions of vectors without storage bottlenecks. For organizations running demanding AI workloads, Cloudian’s HyperScale AIDP with S3 RDMA technology delivers exceptional performance for Milvus deployments integrated with GPU-accelerated inference pipelines, minimizing latency between vector retrieval and downstream AI processing.

Operational Advantages

Using Cloudian as Milvus’s object storage backend provides several operational benefits critical for production AI systems. Cloudian’s multi-site replication and erasure coding ensure vector data durability and availability across geographic locations, protecting against data loss while enabling disaster recovery strategies. The platform’s support for versioning and lifecycle management allows organizations to implement sophisticated data retention policies for vector embeddings and indexes, archiving historical versions or managing storage costs for large-scale deployments. Additionally, Cloudian’s enterprise security features—including encryption at rest, access controls, and audit logging—ensure that sensitive embedding data and proprietary knowledge assets stored in Milvus remain protected and compliant with regulatory requirements.

Cost and Sovereignty

For organizations managing large vector databases, Cloudian’s economics provide significant advantages over cloud-based object storage. The platform’s unlimited scalability without egress fees or API charges makes it cost-effective for Milvus deployments with frequent read operations and large-scale vector ingestion workflows. Perhaps most importantly, Cloudian enables complete data sovereignty for Milvus deployments, allowing organizations to maintain full control over their vector embeddings and semantic search infrastructure—critical for enterprises in regulated industries or those building proprietary AI systems where data cannot be stored in public cloud environments.

Milvus Vector Database: Uses, Architecture && Quick Tutorial

What Is Milvus Vector Database?

Common Milvus Use Cases

Retrieval Augmented Generation (RAG)

Recommendation Systems

Anomaly and Fraud Detection

Milvus Architecture Overview

Layer 1: Access Layer

Layer 2: Coordinator

Layer 3: Worker Nodes

Layer 4: Storage

Milvus vs. Other Vector Databases

Milvus vs. Pinecone

Milvus vs. Weaviate

Tutorial: Getting Started with Milvus

1. Install Milvus and Dependencies

2.Create a Local Milvus Database

3.Generate Embeddings and Insert Data

4. Perform Semantic Search

5. Add More Data and Use Filters

6. Query and Delete Data

7. Persist and Reload Data

Vector Database Storage with Cloudian

Get Started With Cloudian Today

Request a Demo

Download a Free Trial

Pricing