Multi-Cloud S3 Configuration: An Introduction
A multi-cloud architecture lets you create a single storage environment that spans two or more clouds. Those clouds could be public clouds (AWS, GCP, Azure), or a mix of public and private clouds. Either way, you can create a multiple-cloud setup that is accessible and searchable as if it were a single cloud.
This blog will take you through the high-level infrastructure of deploying Cloudian HyperStore S3 object storage in a Multi-Cloud environment, including what components are required and how they communicate to create a single namespace covering all the HyperScalers.
The S3 Software: Cloudian HyperStore
For this setup, we will use Cloudian HyperStore, which is S3 API-compatible software-defined-storage. HyperStore consists of a single software image that can be deployed on any of the public clouds, on industry-standard servers, in containers, or on VMs. With HyperStore, data can be accessed and managed on any cloud platform using exactly the same S3 API and management tools regardless of the platform.
Each HyperStore software instance becomes a cluster node. The end result is a collection of S3 API endpoints that are fully compatible with each other and with software written for the S3 API.
Consistent Naming Convention
Each cloud platform has its own idiosyncrasies and naming conventions. Although the IT industry is known for the excessive use of acronyms, the cloud takes this to new heights. The same names and acronyms are often used for subtly different products. Running Cloudian HyperStore in the cloud gives you consistent naming and management, but it is recommended that you have experience with each cloud before deploying a cluster.
Multi-Cloud Setup for Cloudian HyperStore
For this deployment, Cloudian used three hyperscalers (AWS, GCP, and Azure) as the cloud providers, demonstrating an object storage platform can run on any cloud and is therefore consistent across all.
There are many considerations when using distributed storage across multiple clouds. (For an in-depth look, please download our Multi-Cloud Technical Guide.) Latency is one such consideration. To minimise latency, we chose data centres entirely on the West Coast of the US for this configuration. Low latency helps ensure high bandwidth, so keeping it as low as possible delivers a more performant system. If you are protecting large datasets across multiple locations, then you need bandwidth and lots of it! When it is not possible to minimise latency, there are data protection methods that are less latency sensitive. I will cover those in another blog.
5-minute Demo of a Cloudian Multi-Cloud S3 Configuration
ARVE Error: src mismatch
src in org: https://www.youtube-nocookie.com/embed/-mu8YVGNu2c?wmode=transparent&rel=0&feature=oembed
src in mod: https://www.youtube-nocookie.com/embed/-mu8YVGNu2c?wmode=transparent&rel=0
src gen org: https://www.youtube-nocookie.com/embed/-mu8YVGNu2c
Nodes Deployed in This Example Multi-Cloud Configuration
For this configuration, a total of nine Cloudian HyperStore nodes were deployed, with three nodes on each of the hyperscaler platforms. Nine nodes is a useful number for demonstration purposes, as it allows maximum flexibility in terms of storage policies. A mixture of SSD and HDD disks was configured on each node; the SSD was used for the metadata database and the HDD for the objects. For load balancing among the nodes, an instance of Cloudian HyperBalance was also deployed on each platform.
These nodes communicate internally using HA VPN tunnels between the clouds. To communicate externally, traffic is routed through the load balancers.
Load Balancing for Multi-Cloud
The Cloudian HyperBalance load balancers ensured that traffic was directed to the correct nodes and data centres. This also provides high availability to the storage through a combination of GSLB and L7. The below diagram shows at a high level the LB configuration. Learn more about Cloudian HyperBalance.
This diagram further illustrates how DNS delegation was configured.
Multi-Cloud Management and Analytics
Cloudian provides a GUI called the Cloudian Management Console (CMC). This is used for management operations and basic reporting. Its straightforward design means a short learning curve for administrators coming to the system for the first time, whilst hiding the complexities of these advanced, distributed, scale-out clusters.
Working in conjunction with the CMC is Cloudian HyperIQ Observability and Analytics software, which offers rich analytics of both the system and user behaviour. Capable of easily integrating with both HyperStore and HyperBalance, it gives a clear picture of the entire storage stack, making troubleshooting far simpler and giving valuable insights into user behaviour.