Object Storage Bucket-Level Auto-Tiering with Cloudian

As discussed in my previous blog post, ‘An Introduction to Data Tiering’, there is huge value in using different storage tiers within a data storage architecture to ensure that your different data sets are stored on the appropriate technology. Now I’d like to explain how the Cloudian HyperStore system supports object storage ‘auto-tiering’, whereby objects can be automatically moved from local HyperStore storage to a destination storage system on a predefined schedule based upon data lifecycle policies.

As discussed in my previous blog post, ‘An Introduction to Data Tiering’, there is huge value in using different storage tiers within a data storage architecture to ensure that your different data sets are stored on the appropriate technology. Now I’d like to explain how the Cloudian HyperStore system supports object storage ‘auto-tiering’, whereby objects can be automatically moved from local HyperStore storage to a destination storage system on a predefined schedule based upon data lifecycle policies.

Cloudian HyperStore can be integrated with any of the following destination cloud storage platforms as a target for tiered data:

  • Amazon S3
  • Amazon Glacier
  • Google Cloud Platform
  • Any Cloud service offering S3 API connectivity
  • A remotely located Cloudian HyperStore cluster

Granular Control with Cloudian HyperStore

For any data storage system, granularity of control and management is extremely important –  data sets often have varying management requirements with the need to apply different Service Level Agreements (SLAs) as appropriate to the value of the data to an organisation.

Cloudian HyperStore provides the ability to manage data at the bucket level, providing flexibility at a granular level to allow SLA and management control (note: a “bucket” is an S3 data container, similar to a LUN in block storage or a file system in NAS systems). HyperStore provides the following as control parameters at the bucket level:

  • Data protection – Select from replication or erasure coding of data, plus single or multi-site data distribution
  • Consistency level – Control of replication techniques (synchronous vs asynchronous)
  • Access permissions – User and group control access to data
  • Disaster recovery – Data replication to public cloud
  • Encryption – Data at rest protection for security compliance
  • Compression – Reduction of the effective raw storage used to store data objects
  • Data size threshold – Variable storage location of data based upon the data object size
  • Lifecycle policies – Data management rules for tiering and data expiration

Cloudian HyperStore manages data tiering via lifecycle policies as can be seen in the image below:

Auto-tiering is configurable on a per-bucket basis, with each bucket allowed different lifecycle policies based upon rules. Examples of these include:

  1.      Which data objects to apply the lifecycle rule to. This can include:
  • All objects in the bucket
  • Objects for which the name starts with a specific prefix (such as prefix “Meetings/2015/”)
  1.      The tiering schedule, which can be specified using one of three methods:
  • Move objects X number of days after they’re created
  • Move objects if they go X number of days without being accessed
  • Move objects on a fixed date — such as December 31, 2016

When a data object becomes a candidate for tiering, a small stub object is retained on the HyperStore cluster. The stub acts as a pointer to the actual data object, so the data object still appears as if it’s stored in the local cluster. To the end user, there is no change to the action of accessing data, but the object does display a special icon denoting the fact that the data object has been moved.

For auto-tiering to a Cloud provider such as Amazon or Google, an account is required along with associated account access credentials.

Accessing Data After Auto-Tiering

To access objects after they’ve been auto-tiered to public cloud services, the objects can be accessed either directly through a public cloud platform (using the applicable account and credentials) or via the local HyperStore system. There are three options for retrieving tiered data:

  1.      Restoring objects –   When a user accesses a data file, they are directed to the local stub file held on HyperStore which then redirects the user request to the actual location of the data object (tiered target platform).

A copy of the data object is restored back to a local HyperStore bucket from the tiered storage and the user request will be performed on the data object once copied back. A time limit can be set for how long to retain the retrieved object locally, before returning to the secondary tier.

This is considered the best option to use when accessing data relatively frequently and you want to avoid any performance impact incurred by traversing the internet and any access costs applied by service providers for data access/retrieval. Storage capacity must be managed on the local HyperStore cluster to ensure that there is sufficient “cache” for object retrievals.

  1.      Streaming objects – Streams data directly to the client without restoring the data to the local HyperStore cluster first. When the file is closed, any modifications are made to the object in situ on the tiered location. Any metadata modifications will be updated in both local HyperStore database and on the tiered platform.

This is considered the best option to use when accessing data relatively infrequently and concern about the storage capacity of the local HyperStore cluster is an issue, but performance will be lower as the data requests are traversing the internet and access costs may be applied by the service provider every time this file is read.

  1.      Direct access – Objects auto-tiered to public cloud services can be accessed directly by another application or via your standard public cloud interface, such as the AWS Management Console. This method fully bypasses the HyperStore cluster. Because objects are written to the cloud using the standard S3 API, and include a copy of the object’s metadata, they can be referenced directly.

Storing objects in this openly accessible manner — with co-located rich metadata — is useful in several instances:

  1. A disaster recovery scenario where the HyperStore cluster is not available
  2. Facilitating data migration to another platform
  3. Enabling access from a separate cloud-based application, such as content distribution
  4. Providing open access to data, without reliance on a separate database to provide indexing

HyperStore provides great flexibility for leveraging hybrid cloud deployments where you get to set the policy on which data is stored in a public or private cloud. Learn more about HyperStore here.

 

YOU MAY ALSO BE INTERESTED IN

Object Storage vs. Block Storage: What’s the Difference?

An Introduction to Data Tiering

All data is not equal due to factors such as frequency of access, security needs, and cost considerations, therefore data storage architectures need to provide different storage tiers to address these varying requirements. Storage tiers differ depending on disk drive types, RAID configurations or even completely different storage sub-systems, which offer different IP profiles and cost impact.

Data tiering allows the movement of data between different storage tiers, which allows an organization to ensure that the appropriate data resides on the appropriate storage technology. In modern storage architectures, this data movement is invisible to the end-user application and is typically controlled and automated by storage policies. Typical data tiers may include:

  1. Flash storage – High value, high-performance requirements, usually smaller data sets and cost is less important compare to the performance Service Level Agreement (SLA) required
  2. Traditional SAN/NAS Storage arrays – Medium value, medium performance, medium cost sensitivity
  3. Object Storage – Less frequently accessed data with larger data sets. Cost is an important consideration
  4. Public Cloud –  Long-term archival for data that is never accessed

Typically, structured data sets belonging to applications/data sources such as OLTP databases, CRM, email systems and virtual machines will be stored on data tiers 1 and 2 as above. Unstructured data is more commonly moving to tiers 3 and 4 as these are typically much larger data sets where performance is not as critical and cost becomes a more significant factor in management and purchasing decisions.

Some Shortcomings of Data Tiering to Public Cloud

Public cloud services have become an attractive data tiering solution, especially for unstructured data, but there are considerations around public cloud use:

  1. Performance – Public network access will typically be a bottleneck when reading and writing data to public cloud platforms, along with data retrieval times (based on the SLA provided by the cloud service). Especially for backup data, backup and recovery windows are still incredibly important, so for the most relevant backup sets it is worth considering to hold onsite and only archive older backup data to the cloud.
  2. Security – Certain data sets/industries have regulations stipulating that data must not be stored in the cloud. Being able to control what data is sent to the cloud is of major importance.
  3. Access patterns – Data that is re-read frequently may incur additional network bandwidth costs imposed by the public cloud service provider. Understanding your use of data is vital to control the costs associated with data downloads.
  4. Cost – As well as bandwidth costs associated with reading data, storing large quantities of data in the cloud may not make the most economical sense, especially when compared to the economics of on-premise cloud storage. Evaluations should be made.

Using Hybrid Cloud for a Balanced Data Tier Strategy

For unstructured data, a hybrid approach to data management is key with an automation engine, data classification and granular control of data necessary requirements to really deliver on this premise.

With a hybrid cloud approach, you can push any data to the public cloud while also affording you the control that comes with on-premises storage. For any data storage system, granularity of control and management is extremely important as different data sets have different management requirements with the need to apply different SLAs as appropriate to the value of the data to an organization.

Cloudian HyperStore is a solution that gives you that flexibility for easily moving between data tiers 3 and 4 listed earlier in this post. Not only do you get the control and security from your data center, you can integrate HyperStore with many different destination cloud storage platforms, including Amazon S3/Glacier, Google Cloud Platform, and any other cloud service offering S3 API connectivity.

Learn more about our solutions today.

Learn more about NAS backup here.

 

Better Backup With the Software You Already Have

You know the challenges of the backup process. Veritas and Commvault are good products, but backup is still a chore. Your three choices for a backup target all have challenges: Tape is troublesome, disk is expensive, and backup to the cloud is slow.

How to save cost, reduce stress, and keep using the software you already know

The New Backup Target: Hybrid Cloud

You know the challenges of the backup process. Veritas and Commvault are good products, but backup is still a chore. Your three choices for a backup target all have challenges: Tape is troublesome, disk is expensive, and backup to the cloud is slow.

As an IT manager, you pick the best solution you can afford, but you’re often forced to make compromises along the way. Too often, the result is busted backup windows and unmet RTO and RPO SLAs, not to mention hours of wasted time and accumulated stress.

Now there’s a fourth backup target option:  Hybrid Cloud. (see Backup Solutions Note)

Hybrid cloud as a target gives you a faster, more reliable, lower cost process — free of capacity constraints. It works right now with the software you already know. And you can get started at zero upfront cost.

How the Hybrid Cloud Helps

Hybrid cloud integrates an on-premises disk-based target with a cloud-based target. Both the on-prem storage and cloud storage use the same interface and are managed as a single storage pool.

Their respective functions are:

  • On-prem target: Fast disk-backup. Provides predictable backup time; ensures immediate access for RTO/RPO SLAs
  • Public cloud target: DR repository; low-cost and offsite, it provides the ideal long-term archive, plus overflow capacity for limitless scalability

Works with Existing Backup Software

Backup procedures are proven through years of development. And you know well the software you have. The hybrid cloud approach leverages all of that investment and learning by preserving your existing processes.

To the backup software, the hybrid cloud appears exactly as cloud storage. (Connectors to Amazon S3 and other services are now available with most popular backup software.)

With hybrid cloud, that connector is simply directed at the on-prem storage. The on-prem storage then connects to the cloud. The two are managed as a single, limitlessly scalable storage pool.

The on prem S3-compatible storage is then directed at the S3 public cloud for data tiering purposes. The most recent backups — ie, the ones you’re most likely to use — are kept on prem. The older copies are migrated to the cloud.

The combined solution becomes a simple, drop-in replacement for existing backup target technologies. The result: on-site storage for fast access, and cloud storage for low-cost archive and DR.

In summary, hybrid combines a petabyte-scalable, high-performance on-premises backup target with seamless cloud storage integration. Together they let you retain a familiar workflow while ensuring success on the objectives that matter to you: backup window predictability, and repeatable RTO / RPO.

Start Small and Grow

Best of all, you can start with a small deployment, prove it out, and grow. On-prem S3 storage can be deployed on servers you already have, or deployed as preconfigured appliances.

There are even zero-upfront-cost options using Amazon metered-by-use software from the Amazon Marketplace.

Eight Ways Hybrid Cloud from Cloudian Makes Backup Better

Cloudian is the on-prem storage node in a hybrid storage configuration. It features the industry’s highest level of S3 compatibility, ensuring full interoperability with Veritas, Commvault, and Rubrik.

The Cloudian architecture is a scale-out storage cluster comprised of shared-nothing storage nodes. Your media servers connect to the on-prem Cloudian cluster via Ethernet and communicates via an S3-compatible API. Your backup software views the cluster exactly as it views cloud storage. It stores data to Cloudian exactly as it would to cloud storage.

The difference with Cloudian vs cloud alone is that all recent backups are stored locally for quick recovery when needed. Policy-based migration then allows older snapshots to be migrated to the public cloud. This frees up local capacity, and also provides an offsite copy for DR use.

Here are eight ways this helps:

1) Performance to handle the largest environments

Cloudian scales to petabytes with a scaling model that grows in both capacity and bandwidth. Predictable backup windows result from Cloudian’s high streaming bandwidth: Writes in excess of 5000 MB/s can be achieved, or 18TB per hour.

2) Petabyte-scalable

You can start small with just three nodes, and scale to petabytes simply by adding nodes. Scaling is seamless and does not require downtime.

3) 70% less cost than conventional disk

Built on industry-standard hardware, Cloudian drives down the cost of on-prem, disk-based storage to 1¢/GB/month or less, depending on capacity.

4) Manage one data pool

Cloudian maintains data in a single pool across all nodes. You get one-to-many auto-replication, enhancing data durability. No need to juggle what’s “active” or “passive,” create complex policies and snapshot management techniques, or track which sites are replicating to where.

5) Distributed architecture for global data protection

Enterprises struggle to manage backup at remote offices. With Cloudian, clustered nodes can be deployed globally and interconnected, thus allowing data to be automatically replicated across sites.  Because the nodes form a single namespace, you can implement policy-based data migration to the cloud for DR purposes. You get global data protection with fast local recovery, all managed from a single location.

6) Deploy as appliances, or on your own servers

Cloudian is built on industry-standard hardware. You have the flexibility to buy either pre-configured, fully supported appliances, or software for installation on the servers you choose. Either way, you benefit from the value of commodity hardware.

7) Drop-in integration

Cloudian can be immediately integrated with backup software packages that support cloud storage, including Veritas NetBackup, Veritas Backup Exec, Commvault Simpana and Rubrik. Cloudian is viewed exactly as cloud storage for both backup and recovery. For information that has been migrated to the cloud, Cloudian transparently retrieves that data and presents it to the media server.

8) Start small, even at zero upfront cost

Contact Cloudian to get started. We can even show you options that get you started at zero upfront cost, with Cloudian from the Amazon Marketplace.

For more information, read the Backup Solutions Note or specific data protection solutions with

Configuration Guides for Veritas and Commvault are also available.

  • Veritas
  • Commvault