storage for druidThis blog outlines the steps to integrate Imply Druid with Cloudian Object Storage and configure Druid to use Cloudian as its deep storage.

Apache Druid is a high-performance, real-time analytics database designed for fast slice-and-dice analytics on large datasets. It integrates both streaming and batch data ingestion, enabling users to perform real-time data exploration and analysis with low latency. Druid leverages object storage such as Cloudian for scalable, durable data storage, ensuring efficient data retrieval and management.

Here is a step-by-step guide on how to integrate it with a Cloudian AI data lake:

Step 1: Prepare Cloudian Object Storage

  • Create a Bucket: In the Cloudian console, create a bucket that will be used by Druid for deep storage.
  • Access Credentials: Ensure you have the access key and secret key for accessing the Cloudian bucket.

Step 2: Configure Druid to Use Cloudian

  • Modify the Configuration Files: Edit this  file in your Druid cluster to include the Cloudian configuration:
common.runtime.properties
  • Add S3 Extension: Ensure that…
druid-s3-extensions

…is included in the:

druid.extensions.loadList
  • Configure Deep Storage: Add the following configurations in
common.runtime.properties
properties

Copy code

# Extensions to load

druid.extensions.loadList=["druid-s3-extensions"]

# Deep storage type

druid.storage.type=s3

# Cloudian (S3 compatible) settings

druid.s3.accessKey=<your-access-key>

druid.s3.secretKey=<your-secret-key>

druid.storage.bucket=<your-bucket-name>

druid.storage.baseKey=druid

druid.s3.endpoint=<your-cloudian-endpoint> #

Replace these parameters with your Cloudian details:

<your-access-key>, <your-secret-key>, <your-bucket-name>, <your-cloudian-endpoint>

Step 3: Validate the Configuration

  1. Restart Druid Services: Restart all the Druid services to apply the new configurations.
  2. Check Logs: Verify that there are no errors related to deep storage in the Druid logs.
  3. Ingest Data: Try ingesting some data and ensure it gets stored in the Cloudian bucket.

Step 4: Monitor and Troubleshoot

  1. Cloudian Console: Monitor the Cloudian console to ensure objects are being created as expected.
  2. Druid Console: Use the Druid console to monitor tasks and data availability.

Additional Tips

  • IAM Policies: If using IAM roles/policies, ensure they have the necessary permissions to access the Cloudian bucket.
  • Security: Use secure connections (HTTPS) for accessing the Cloudian endpoint.
  • Performance: Fine-tune configurations such as connection timeout, retries, etc., based on your network and performance requirements.

By following these steps, you can successfully integrate Imply Druid with Cloudian Object Storage for efficient and scalable deep storage.

Learn more at cloudian.com

Or, sign up for a free trial