How to Use HyperStore S3 as a Velero Storage Target

by Gary Ogasawara, CTO, Cloudian

Introduction

Velero is open-source software to backup and restore Kubernetes cluster resources. Cloudian HyperStore is S3-compatible object storage software with an elastic, scalable architecture.  Velero uses the object storage as the storage target of the backup data, writing data for new backups and reading data to restore a backup.  Using the AWS S3 plugin with no changes, Velero can use HyperStore as the object storage layer.  In contrast to using AWS S3, HyperStore can be deployed on-premise or on public clouds, in many cases offering cost savings and operational flexibility.  In the following sections, we provide a simple how-to example of using Velero (both v1.2.0 and v1.3.1 were tested) with HyperStore v7.2.

velero s3

Setup

1. The following steps are prerequisites to using Velero and HyperStore for backup/restore.
HyperStore installed using the included installation wizard on top of CentOS.  HyperStore can run on VMs, bare-metal, or Cloudian appliances, and a free trial is available.  Typically, HyperStore is installed on multiple nodes and multiple data centers.  Once installed, S3 requests can be sent to any node. In our example, we will send the S3 requests to the HyperStore node at 10.10.3.102.

2. Kubernetes installed. We used minikube v1.7.2 on CentOS 8 (instructions) with Kubernetes v1.17.2.
velero setup 1

3. A DNS server on the cluster with the IP address of the S3 server and the S3 URL set in the ConfigMap.
velero setup 2

Add the following code after the block of kubernetes cluster.local in-addr.arpa ip6.arpavelero setup 3

For example, in the below 10.10.3.102 is the HyperStore IPADDR and s3-region1.geminimobile.com is the S3 URL.
velero setup 4

4. Create an S3 bucket in Hyperstore where the Velero data is stored.  The bucket name is configurable, and we’ll use the name “velero”. There are multiple S3 tools that can be used to do S3 operations.  In this case, we are using the AWS Command Line Interface v2.

velero setup 5velero setup 6

Velero Installation

Now that HyperStore and Kubernetes are configured, Velero can be installed and configured.

1. Install Velero using the “Basic Install” instructions:
velero setup 7

2. Create a Velero-specific credentials file (credentials-velero) in thevelero directory (e.g., ~/opt/velero-v1.3.1-linux-amd64/)
velero setup 8

The HyperStore access key and secret key can be retrieved using the Cloudian Management Console (CMC) menus: User -> Security Credentials -> S3 Access Credentials.velero setup 9

3. Then run the Velero “install” command using the AWS S3 plugin.velero setup 10

Options:

– provider. “aws” is used for HyperStore.

– plugins. “velero-plugin-for-aws” is designed for AWS S3, but works fine for HyperStore.

– bucket. This is the name of the bucket previously created in HyperStore where backups will be stored.

– secret-file.  The file path of the AWS credentials (access key, secret key) created earlier.

– use-volume-snapshots. The default is true, so need to set false in order to not trigger the persistent volume snapshots.

– backup-location-config. Key-value pairs that are specific to the AWS plugin.

Test Backup and Restore

The Velero tarball includes a useful examples/nginx-app directory to do a basic test of the backup and restore procedure.

1. Deploy the example nginx application.velero install 11

2. Confirm that both the velero and nginx-example deployments are successfully created:velero install 12

3. Create a backup for any object that matches the app=nginx label selector:velero install 13 v

The above figure shows a wireshark view of the S3 PUT requests that are writing backup data to HyperStore.

After running this command, a folder named backups with the backup data is in the Hyperstore velero bucket.

velero setup 15

Also, you can use the Velero CLI to view the backups.velero setup 16

4. To test a restore from a backup, we first delete the existing nginx-example namespace:velero setup 17
and then restore from the previously created backup:velero setup 19

Use Cases

The primary use cases with the combination of Velero and HyperStore are the same as using any object storage, namely, to backup and restore data and also to migrate data from one Kubernetes cluster to a different Kubernetes cluster at a different location.  In addition, some different use cases result from HyperStore being able to run as on-premises software; for example, being on-premises is important if data must be maintained as private and secure. We are also investigating combining Velero with additional HyperStore S3 features like access control by bucket- or IAM policies, Object Lock, encryption, and lifecycle management such as tiering objects to another storage system and the automatic deletion of old objects.

A Century of Healthcare Data

The need for a lasting storage infrastructure

By Neil Stobart, CTO, Cloudian

This post was originally written for HPE TECHnative.

The ongoing data deluge is affecting virtually all industries, but perhaps nowhere is this issue having more impact than in healthcare.

Due to a combination of factors, the use of and reliance on data is growing exponentially within the healthcare industry. The rise of the Internet of Things (IoT) is proving to be one of the biggest drivers, with connected devices now being used to collect patient data like never before in order to provide improved diagnoses, specialist treatments, and proactive care.

And things aren’t going to slow down –as advanced modes of medical care continue to become more prominent and larger file types become the norm. A 2018 report from analyst firm IDC predicts that the volume of data being collected will increase faster in healthcare than in any other sector, with the ‘datasphere’ of the healthcare sector set to experience a Compound Annual Growth Rate (CAGR) of 36% through to 2025. In comparison, manufacturing, financial services and media and entertainment are expected to grow at 30%, 26% and 25% respectively.

With this all in mind, it’s clear that the collection, storage, and management of data will be central to the industry’s future. But what issues will this present and how can healthcare institutions ensure data is stored in an efficient and cost-effective way?

healthcare data

100 years of data

In addition to the sheer volume of data being generated, healthcare organisations are likely to come up against a number of other related challenges over the next few years and beyond.

Firstly, there is the length of time that patient data will have to be preserved. People are now living longer than ever before, and current UK legislation states that GP records must be retained for ten years after the death of a patient. This means healthcare data being created today may need to be kept on file for up to 100 years or more.

Secondly, the rate of technological development means this data may also have to be migrated between formats multiple times over its lifespan – which is both labour-intensive and expensive. Data storage technology and organisational priorities will continue to evolve, while the data itself will typically come from various sources. As a result, healthcare organisations will face a huge amount of complexity when it comes to preserving data and making it accessible, on top of the growing costs involved as data scales.

Medical organisations, therefore, need to ensure that their storage infrastructure provides the highest possible scalability, flexibility and portability – especially with data volumes becoming so vast that just migrating data from one format or provider to another can require significant investment. Healthcare providers need to take their long-term data requirements into consideration to ensure they put themselves in the best position – both financially and operationally – to respond to future changes.

Time for a storage health exam

What’s becoming apparent is that modern healthcare organisations require next-generation storage solutions that can connect to various systems in a flexible, scalable and cost-effective manner in order to future-proof their infrastructures.

This is where object storage makes all the difference. Modular object storage platforms provide the seamless scalability required in today’s data-driven world, enabling healthcare organisations to accommodate hundreds of petabytes of unstructured data by adding nodes whenever extra capacity is needed – all without impacting management complexity. Organisations can start small and grow across on-premises or private cloud infrastructure as required, without compromising operations or incurring the dramatic cost increases that can arise with accessing data from the public cloud.

These capabilities will be vital for healthcare organisations in the years to come, enabling them to make use of increasing data volumes from the likes of PET scans, MRIs and X-rays while gaining cost-efficiencies.

Object storage also has a key role to play in the long-term preservation of data through user-defined metadata tagging. Whereas traditional block storage has very limited metadata capabilities, object storage enables users to add rich metadata tags, making it much easier to organise, identify and retrieve data. Metadata adds structure to previously unstructured data, which makes it less time-consuming for healthcare professionals to manage, search and share the information they collect.

Another key consideration for the highly-regulated healthcare industry is data protection. Object storage systems are designed to offer greater levels of data resiliency and redundancy than traditional block and file storage, without impacting cost-efficiency. Encryption ensures that the data is always protected, thereby minimising the risk of data loss and ensuring regulatory compliance.

A final consideration is compatibility with the Amazon Simple Storage Service (S3) API. S3-compatible storage has quickly become the de-facto standard for not only public cloud but also private cloud deployments, offering significant cost savings and providing the low latency, high bandwidth performance now required. Deploying an S3-compatible object storage solution will, therefore, boost interoperability and futureproof healthcare providers as storage technologies and requirements continue to evolve.

Staying on top of data storage trends is clearly no mean feat – particularly in the healthcare sector where the data revolution is in full swing. As the volume of data being collected continues to grow and its strategic value continues to increase, healthcare organisations need a storage platform that puts scalability, portability and cost-efficiency front and centre.

Learn more about Cloudian healthcare solutions

 

VMware Cloud Director and Cloudian: A Closer Look at the Integration

Cloudian and VMware now offer an integrated solution that offers a seamless experience for all vCloud Director service providers and its customers/tenants to leverage Cloudian HyperStore object storage.

Cloudian and VMware now offer an integrated solution that offers a seamless experience for all vCloud Director service providers and its customers/tenants to leverage Cloudian HyperStore object storage.

Read the overview

Read the datasheet

View the VMware lightboard video

The integrated solution for the first time brings S3 API support and Cloudian object storage to VMware vCloud Director environments.

vmware cloud director use caseThe solution combines the power of:

  • VMware Cloud Director — a leading cloud service-delivery platform used by thousands of cloud providers to operate and manage successful cloud-service businesses
  • Cloudian HyperStore — an S3 API-based, infinitely scalable, durable and multi-tenant cloud object storage platform used by customers worldwide to address their ever-growing storage capacity needs

Now, cloud providers can now deliver new S3-compatible storage and and other high-value services to enterprises and IT teams across the world.

Under the Hood

So let’s dig a little deeper to better understand what this partnership and integrated solution offer. Every IT team has cloud on their mind and with vCloud Director, VMware is leading the charge by powering a network of thousands of cloud providers who guide their customers’ journey from on-premises to private cloud, hybrid cloud, or even multi-cloud roll out.

What was missing was a scalable, cost-effective storage layer. This is now addressed with the release of Object Storage Extension (OSE) and the integration of Cloudian HyperStore with VMware Cloud Director. The VMware Cloud Director admin can install OSE — just like they would install any other extension — which allows them to integrate and manage Cloudian HyperStore via the VMware Cloud Director admin portal. The VMware Cloud Director admin can also leverage SSO to sign on to the Cloudian management console to set up and configure a Cloudian HyperStore cluster.

vmware cloud director blogVMware Cloud Director creates virtual data centers with elastic pools of cloud resources that are seamless to provision and easy to consume. It creates a fluid hybrid cloud fabric between an on-premise infrastructure and Cloud Service Provider, offering a best-in-class private/hybrid cloud with on-demand elasticity, streamlined on-ramp, native security, and hybridity.

Deep Integration for Seamless Management

This integration is not just about offering S3 API-based storage. It’s fully integrated management. Now, a VMware Cloud Director admin can centrally manage, monitor and consume Cloudian HyperStore just like they would any other storage resource, such as vSAN. This integration covers three areas:

  1. Data APIs: S3 APIs have become the de facto language of cloud storage. Cloudian has a fully native implementation of S3 APIs, which means we have the industry’s most compliant S3 API solution out there. This is key because if a service provider wants to build services that leverage S3 APIs, it needs to support all of the S3 API verbs like MPU, Sig V4, Tagging, etc. Cloud service providers don’t have visibility into customers’ applications and what S3 API calls they are using. Not supporting certain S3 API will result in poor customer satisfaction and higher support costs, thereby impacting profit. Cloudian offers the highest S3 API support, ensuring the best customer experience.
  2. Object Storage Features: VMware Cloud Director is a multi-tenant framework, a key component of a VMware Cloud Provider platform. So, for a storage solution to seamlessly fit into that framework it must be securely sharable, and limitlessly scalable. Cloudian is a scale-out platform that offers multi-tenancy, QoS, geo-distribution, global namespace, integrated billing and reporting. It is cloud provider-ready.
  3. Control Plane APIs: Most important are the Control Plane APIs that allow the VMware Cloud Director admin to seamlessly manage, operate and report from a central VMware Cloud Director portal. It allows VMware Cloud Director tenants to self-service their environment – create users, buckets, assign policies and provide reports at a granular level.

With these, cloud providers can deploy and manage profitable, high value services is use cases such as:

  • Storage-as-a-Service (STaaS)
  • Backup-as-a-Service (BaaS)
  • Archive-as-a-Service (AaaS)
  • Disaster-Recovery-as-a-Service (DRaaS)
  • Big Data-as-a-Service (BDaaS)
  • Containers-as-a-Service (CaaS)
  • Software Test/Dev

Read the overview

Read the datasheet

View the demo

View the VMware lightboard video

S3 Compatible Storage Solutions Compared

S3 Compatible Storage, On-Prem

Today’s emerging on-prem enterprise storage medium is S3 compatible storage. Initially used only in the cloud, S3 storage is now being extended to on-prem and private cloud deployments.

The term “S3 compatible” means that the storage employs the S3 API as its “language.” Applications that speak the S3 API should be able to plug and play with S3 compatible storage.

A growing number of applications now support this storage type, thus benefitting from its unique attributes:

  • Scale: Designed to grow limitlessly within a single namespace
  • Geo-distribution: A single storage system can span multiple sites
  • Cost: Purpose-built to run on industry-standard servers, thus benefitting from the volume and efficiencies of that industry
  • Reliable data transport: The only storage type invented in the age of the Internet, S3-compatible storage is built to manage and move massive data volumes over WANs

Cloudian specializes in S3-compatible storage, but other examples of applications and devices the now employ S3 are Rubrik, Veeam, Commvault, Splunk, Pure Storage, Adobe, VERITAS, Hadoop, NetApp, EMC, Komprise, and more.

This is part of an extensive series of articles about S3 Storage.

Clarifying the Terms

But what is S3-compatible storage? This storage type goes by multiple names and can also be called:

Object storage: The underlying technology for S3 compatible storage is object storage. Over the years, multiple APIs have been used to access object storage, but the S3 API is now the most common.

Cloud storage: Most large-scale cloud storage today is object storage, and most of it employs the S3 API. There are multiple ways of referring to essentially the same thing: S3-compatible storage.

Benefits of S3 Compatible Storage On-Prem

There are 5 key reasons to deploy S3 compatible storage in your data center:

  1. Scale: S3-compatible solutions are designed to scale in a single namespace, and without disruption, to an exabyte. Grow your storage without adding workload.
  2. 70% less cost than public cloud: With industry-standard hardware, these solutions deliver the greatest value: less cost per GB and higher density. Also, no ingress/egress fees.
  3. Performance: Hardware is in your data center for low latency and high bandwidth.
  4. Control: Data is behind your firewall, so you consistently apply security and control access.
  5. Cloud compatibility: S3 is compatible with cloud storage, so you can employ cloud when you need it, without disruption. Capitalize on the growing ecosystem of S3 compatible applications. Seamlessly move data and applications from on-prem to cloud.

The S3 API

S3 compatible storage is built on the Amazon S3 Application Programming Interface, better known as the S3 API, the most common way in which data is stored, managed, and retrieved by object stores. Originally created for the Amazon S3 Simple Storage Service (read about the API here), the widely adopted S3 API is now the de facto standard for object storage, employed by vendors and cloud providers industry-wide.

Not All S3 Compatible Storage APIs Are Equal

Compared with established file protocols such as NFS, the S3 API is relatively new and rapidly evolving. Among object storage vendors, S3 API compliance varies from below 50% to over 90%. This difference becomes material when an application — or an updated version of that app— fails due to S3 API incompatibility.

Cloudian is the only object storage solution to exclusively support the S3 API. Launched in 2011, Cloudian’s many years of S3 API development translate to the industry’s highest level of compliance.

Employing the S3 API makes an object storage solution flexible and powerful for three reasons:

1) Standardization in S3 Compatible Storage

With Cloudian, any object written using the S3 API can be used by other S3-enabled applications and object storage solutions; the existing code works out of the box.

S3 compatible storage software

2) Maturity 

The S3 API provides a wide variety of features that meet virtually every need for an object store. End users planning to deploy object stores can access the plentiful resources of the S3 community — both individuals and companies.

3) Rich Feature Set

The S3 API is the only storage “language” created in the era of the internet. The other common storage protocols (SMB and NFS) were created prior to the internet’s meteoric growth, and therefore did not factor in the needs of this infrastructure. As a result, only the S3 API includes features such as multi-part upload that make it easy to reliably transfer large files over dodgy WAN links.

 

The Cloudian Difference

Among the S3 compatible storage vendors, only Cloudian HyperStore was built from the start on the S3 API.

Cloudian S3 compatible storage API is designed into the Cloudian storage layer

 

Translation Layers Introduce Potential Compatibility Challenges

Competitive solutions employ a translation layer (or some sort of “access layer” or software gateway), which introduces the risk of compatibility challenges. Cloudian has no translation layer, hence we refer to it as “S3 Native.”

Translation layer leads to incompatibility

Cloud Storage in the Data Center

The combination of object storage and the de facto language standard now creates the option for cloud-connected storage in the data center. For the cloud, AWS has set the standard with the S3 Storage Service. Now data center managers can capitalize on that identical set of capabilities in their own data center with Cloudian S3 compatible storage.

See the S3 API at Work

The City of Montebello uses the S3 API as a mechanism for streaming live video from busses to a central monitoring facility where it is recorded and stored with metadata to assist with search.

S3 API & Extensions for Enterprise Object Storage

Amazon’s S3 API is the de-facto standard for object storage APIs. Having multiple service providers, software providers, and applications standardize on S3 has made it easier to interchange between them and rapidly stand up new uses for object storage. But there are different grades of S3 compatibility. Some software and solutions provide only the basic CRUD (create, remove, update, delete) functions. At the other end is Cloudian’s Hyperstore, committed to providing the highest fidelity S3 compatibility backed by a guarantee.

The S3 API is an HTTP/S REST API where all operations are via HTTP PUT, POST, GET, DELETE, and HEAD requests. Each object is stored in a bucket. Beyond the basic object CRUD operations provided by S3, there are many advanced APIs like versioning, multi-part upload, access control list, and location constraint. There are multiple options for encryption including (1) server-side encryption where the server manages encyrption keys, (2) server-side encyption with customer keys, and (3) client-side encryption where the data is encrypted/decrypted at the client side. Though no single S3 user is likely to use all of the advanced APIs, the union of APIs used by different users quickly covers them all. The table below highlights some advanced object storage APIs supported by S3:

S3 Feature Azure Google Cloud OpenStack Swift
Object versioning No Yes Yes
Object ACL No Yes No
Bucket Lifecycle Expiry No Yes Yes
Multi-object delete No Yes Yes
Server-side encryption No Yes Yes
Server-side encryption with customer keys No No No
Cross-region replication Yes No Yes
Website No No No
Bucket logging No No No
POST object No No No

Table 1 – Comparison of some S3 advanced object storage APIs[1]

S3 API compatibility is a prerequisite, but not sufficient to provide object storage for enterprises. There are 4 additional areas that Cloudian has added to make S3 object storage enterprise-ready.

 

  1. Software or Appliance, not a service.The software-only package includes a Puppet-based installer with a wizard-style interface. It runs on commodity software (CentOS/RedHat) and commodity hardware. The appliances come in a few fixed models ranging from 1U (24TB) to the FL3000 series of PB-scale in 8U form.
  1. APIs for all functions
    • Configuration
    • Multi-Tenancy: User/Tenant provisioning
    • Quality of Service (QoS)
    • Reporting
    • S3 Extensions: Compression, Metadata APIs, Per-bucket Protection Policies.

    Highlighting the per-bucket protection policies feature, each bucket can have its own protection policy. For example, a“UK3US2” policy can be defined as UK DC with 3 replicas and US DC with 2 replicas. Another example is a “ECk6m2” policy as DC1 with Erasure Coding with 6 data and 2 coding fragments. As buckets are created they can be assigned a policy.

Bucket
Figure 1 – Per-bucket protection policies example

  1. O&M tools to install, monitor, and manage.In addition to the installer, a single pane web-based Cloudian Management Console (CMC) does system administration from the perspective of the system operator, a tenant/group administrator, and a regular user. It’s used to provision groups and users, view reports, manage the cluster, and monitor the cluster.

Cloudian Management Console

Figure 2 – CMC dashboard

  1. Integration with Other Products
    • NFS/CIFS file interface
    • OpenStack, CloudPlatform
    • Tiering to any S3 system (public or private).
    • Active Directory, LDAP

The opportunity and use case for enterprises and object storage has never been more compelling. Amazon S3 API compatibility ensures full portability of already working applications. Using Cloudian’s HyperStore platform instead of AWS, enterprise data can be brought on-premise for better data security and manageability at lower cost. For STaaS providers, S3 API compatibility, backed by a full guarantee, provides the same benefits of a fully controlled storage platform, and opens up a large range of compatible applications. Beyond the S3 API, Cloudian is committed to providing all operations by API and has added APIs to make the platform enterprise-ready, including multi-tenancy.

If you would like a technical overview, you can check out this webinar I recently presented, “S3 Technical Deep Dive” and make sure to check out more information on our S3 Guarantee…we’ll run all your S3 Apps anytime and anywhere – Guaranteed!

– Gary


[1] References:
http://docs.openstack.org/developer/swift/#object-storage-v1-rest-api-documentation
https://cloud.google.com/storage/docs/xml-api-overview
https://msdn.microsoft.com/en-us/library/azure/dd135733.aspx