This blog discusses how to configure Cloudian with Dremio’s Unified Lakehouse. Dremio enhances data analysts’ ability to perform exploratory analysis and visualization, providing swift query responses. It streamlines the workflow for data engineers by allowing in-place data management within the data lake.
For this example of how to connect Dremio to a Cloudian HyperStore bucket we will be leveraging the following:
- Dremio (This How-To leverages Community version 25.0)
- Cloudian HyperStore (Version 7.5.3 or newer)
- Major League Baseball Data in CSV format
Prepare HyperStore User and Bucket for Dremio
- Log into HyperStore Cloudian Management Console (CMC) and create the appropriate group/user and target bucket. Note the user’s Access & Secret Key information.
Connect Dremio to HyperStore bucket(s)
From the Dremio main console, under Sources click “+”
Then select “Amazon S3”:
On the Source Setting panel, select a name for this connection.
Input the AWS Access Key and Secret Key information, select “Encrypt Connection”.
Under Public Buckets enter the desired bucket name.
Add two properties:
- Set “fs.s3a.path.style.access” to “true”
- Set “fs.s3a.endpoint” to your HyperStore S3 endpoint ie s3-west.cloudian.com but do not include http or https
In this example we leveraged an IP address of a HyperStore node and connected using HTTP (vs HTTPs)
Under “Allowlisted buckets”:
- Select the desired HyperStore buckets
In this how-to we are using buckets “dremio” and “dremiocloudian”
Click “Save”.
Select the Cloudian data source under Sources -> Object Storage.
For this example, we are using a CSV file with Major League Baseball players
Select the .csv and it will present a data Format tab similar to below.
In this instance, we selected “Extract Column Names” and “Skip First Like”, then click “Save”.
Select “SQL Runner” from the Dremio menu, then “Run” and it will output the contents of the CSV file.
We have successfully connected Dremio to a HyperStore bucket!
Troubleshooting:
HyperStore does not leverage CA certificate
If the target HyperStore instance does not leverage a verified CA certificate, Dremio will not be able to connect via HTTPs
To remedy this, de select “Encrypt Connection” on the Source Settings -> General tab
Under the Advanced Options tab, enter a the s3 endpoint value with specifying port 80.
Port 80 is the default HTTP connection port for HyperStore.