How to Integrate Apache Druid with Cloudian

Posted by Van Flowers on June 5, 2024

This blog outlines the steps to integrate Imply Druid with Cloudian Object Storage and configure Druid to use Cloudian as its deep storage.

Apache Druid is a high-performance, real-time analytics database designed for fast slice-and-dice analytics on large datasets. It integrates both streaming and batch data ingestion, enabling users to perform real-time data exploration and analysis with low latency. Druid leverages object storage such as Cloudian for scalable, durable data storage, ensuring efficient data retrieval and management.

Here is a step-by-step guide on how to integrate it with a Cloudian AI data lake:

Step 1: Prepare Cloudian Object Storage

Create a Bucket: In the Cloudian console, create a bucket that will be used by Druid for deep storage.
Access Credentials: Ensure you have the access key and secret key for accessing the Cloudian bucket.

Step 2: Configure Druid to Use Cloudian

Modify the Configuration Files: Edit this file in your Druid cluster to include the Cloudian configuration:

common.runtime.properties

Add S3 Extension: Ensure that…

druid-s3-extensions

…is included in the:

druid.extensions.loadList

Configure Deep Storage: Add the following configurations in

common.runtime.properties

properties

Copy code

# Extensions to load

druid.extensions.loadList=["druid-s3-extensions"]

# Deep storage type

druid.storage.type=s3

# Cloudian (S3 compatible) settings

druid.s3.accessKey=<your-access-key>

druid.s3.secretKey=<your-secret-key>

druid.storage.bucket=<your-bucket-name>

druid.storage.baseKey=druid

druid.s3.endpoint=<your-cloudian-endpoint> #

Replace these parameters with your Cloudian details:

<your-access-key>, <your-secret-key>, <your-bucket-name>, <your-cloudian-endpoint>

Step 3: Validate the Configuration

Restart Druid Services: Restart all the Druid services to apply the new configurations.
Check Logs: Verify that there are no errors related to deep storage in the Druid logs.
Ingest Data: Try ingesting some data and ensure it gets stored in the Cloudian bucket.

Step 4: Monitor and Troubleshoot

Cloudian Console: Monitor the Cloudian console to ensure objects are being created as expected.
Druid Console: Use the Druid console to monitor tasks and data availability.

Additional Tips

IAM Policies: If using IAM roles/policies, ensure they have the necessary permissions to access the Cloudian bucket.
Security: Use secure connections (HTTPS) for accessing the Cloudian endpoint.
Performance: Fine-tune configurations such as connection timeout, retries, etc., based on your network and performance requirements.

By following these steps, you can successfully integrate Imply Druid with Cloudian Object Storage for efficient and scalable deep storage.

Learn more at cloudian.com

Or, sign up for a free trial

How to Integrate Apache Druid with Cloudian

Categories

Get Started With Cloudian Today

Request a Demo

Download a Free Trial

Pricing