This blog discusses how to configure Cloudian with Dremio’s Unified Lakehouse.  Dremio enhances data analysts’ ability to perform exploratory analysis and visualization, providing swift query responses. It streamlines the workflow for data engineers by allowing in-place data management within the data lake.

For this example of how to connect Dremio to a Cloudian HyperStore bucket we will be leveraging the following:

  • Dremio (This How-To leverages Community version 25.0)
  • Cloudian HyperStore (Version 7.5.3 or newer)
  • Major League Baseball Data in CSV format

Prepare HyperStore User and Bucket for Dremio

  • Log into HyperStore Cloudian Management Console (CMC) and create the appropriate group/user and target bucket. Note the user’s Access & Secret Key information.

Dremio

 

 

Dremio

Connect Dremio to HyperStore bucket(s)

From the Dremio main console, under Sources click “+”

Dremio

Then select “Amazon S3”:

Dremio

On the Source Setting panel, select a name for this connection.

Input the AWS Access Key and Secret Key information, select “Encrypt Connection”.

Under Public Buckets enter the desired bucket name.

 

Dremio

 

dremio

Add two properties:

  • Set “fs.s3a.path.style.access” to “true”
  • Set “fs.s3a.endpoint” to your HyperStore S3 endpoint ie s3-west.cloudian.com but do not include http or https

In this example we leveraged an IP address of a HyperStore node and connected using HTTP (vs HTTPs)

Under “Allowlisted buckets”:

  • Select the desired HyperStore buckets

In this how-to we are using buckets “dremio” and “dremiocloudian”

Click “Save”.

Select the Cloudian data source under Sources -> Object Storage.

dremio

For this example, we are using a CSV file with Major League Baseball players

Select the .csv and it will present a data Format tab similar to below.

Dremio

In this instance, we selected “Extract Column Names” and “Skip First Like”, then click “Save”.

 

Dremio

Select “SQL Runner” from the Dremio menu, then “Run” and it will output the contents of the CSV file.

Dremio

We have successfully connected Dremio to a HyperStore bucket!

Troubleshooting:

HyperStore does not leverage CA certificate

If the target HyperStore instance does not leverage a verified CA certificate, Dremio will not be able to connect via HTTPs

To remedy this, de select “Encrypt Connection” on the Source Settings -> General tab

 

Dremio

Under the Advanced Options tab, enter a the s3 endpoint value with specifying port 80.

Port 80 is the default HTTP connection port for HyperStore.

Dremio