Are all object storage solutions the same?
I am super excited about the MythBusters series that we are running at Cloudian to debunk some common myths around object storage and its use – some very simple to some highly debated amongst the industry pundits. It also gives me an opportunity to talk about what Cloudian is doing to bring cloud technology to your datacenters.
I have spent 20+ years in storage and have talked to countless customers, analysts, CIOs, and so on. I have seen the waves of transition – distributed computing, network-attached storage, virtualization, convergence, and believe we are on the cusp of another big wave of IT transition – Cloud storage and data management.
There is no argument that unstructured data is growing at an exponential rate. There is also no argument that traditional storage technologies like SAN/NAS cannot scale to meet the storage needs associated with this unprecedented data growth. CIOs and IT folks around the world are looking for a storage solution that gives them cloud-like scale and flexibility while meeting their enterprise requirements around security, compliance, workflow integration, and so on. Object storage offers a viable alternative to meet the needs of data growth, but the question is which object storage solution meets the criteria for enterprise readiness. This brings us to our first MythBusters topic.
All object storage solutions are the same. Myth or Fact?
To answer this question, we should start by evaluating facts and define what constitutes an enterprise-grade object storage solution. So, here are some facts that we know for sure are true.
- Object storage has been around for 15+ years
Since the mid-2000s there have been many attempts to build standards around object storage – XAM object storage standard, the Storage Networking Industry Association (SNIA), and the creation of Cloud Data Management Interface (CDMI), which was released as a formal industry standard in 2011. Many vendors including Dell EMC, NetApp, IBM, Redhat, Hitachi, etc. have built their object storage offering around these standards.
- Object storage uses flat addressing format to store and access data
Object storage uses a flat hierarchical system making it extremely scalable. There is no concept of folder structure, but objects can be arranged using key name prefixes on the object name. An object is comprised of the BLOB and its metadata. BLOB stands for Binary Large Object and is the actual data file. Metadata is descriptive searchable information used to describe that data object. This is a big-ticket item with Object storage as the ability to store extended metadata goes way beyond the capabilities of traditional storage platforms, which only provide limited system metadata such as file name and file size.
- Object storage uses restful APIs for data and admin functionality
An object could be a file, email, or anything stored as binary. A Bucket is a collection of objects controlled by use permissions similar to traditional storage. Every object has an http reference making it extremely open and flexible for data workflow and application integration purposes. Applications connect and access object storage with details such as:
Endpoint – Bucket.hyperstore.com
Access key – unique user-assigned key
Secret key – unique user-assigned key
Now that we have established some facts about object storage, let’s dig a little deeper. What’s also true is:
- S3 APIs are the defacto standard for data path and hence S3 API compatibility is very important
AWS launched Amazon S3 cloud storage (object storage) in the United States on March 14, 2006, then in Europe in November 2007. The dominance of the S3-API has been driven by the success of the AWS S3 cloud service. The S3-APIs provide many benefits over traditional storage protocols, which were designed to simply have read and write data commands. It enables applications to embed advanced data management functionality directly into the application. It is also important to note that the S3-API supports over 400 verbs, but not all S3-API based object storage solutions support all the S3-API verbs, resulting in application compatibility issues. This is huge as you don’t want to build modern cloud-native applications and later learn that the underlying object storage cannot support it as it does not have the ability to support all S3-API verbs.
- Underlying architecture dictates S3-API compatibility, performance, data durability and scale
One of the reasons for many prominent object vendor solutions inabilities to support all the S3-API verbs/calls is their underlying legacy architecture. Not having a native implementation of S3-APIs, forces them to have S3-API gateway that acts as a translator of S3-API calls into their internal storage APIs (most of those object storage solutions are based on standards we mentioned above that did not get adopted due to advent of AWS S3-APIs).
With regards to scale and performance, a true distributed, scale-out object storage solution should be able to start small and scale infinitely to meet the application’s performance and capacity demands. With compute and drive technologies evolving at a rapid pace, there should be no limitations on the node types, node capacity, underlying node hardware, etc. To offer maximum performance, every node in the cluster should run the entire software stack and be able to communicate to other nodes in a peer-to-peer mode so that application requests can be received and processed by any node. Access to data should never be more than a hop away to ensure optimal performance and demands of modern cloud-native applications.
The cluster should be deployable across multiple sites such that customers can not only sustain node failures but can have their entire DC or Site go down and still have data available.
- Control plane (Admin) APIs allow for workflow/application integration
When we talk about integration, it’s not just about offering S3-API based object storage for data management. It’s about an integrated experience where an IT admin can centrally manage, monitor, and consume object storage just like they would any other storage resources in their infrastructure. It’s about the control/admin APIs that allow users to integrate into operational frameworks to seamlessly manage, operate, and report from a central portal. The Admin APIs should allow for storage tenants to self-service their environment – create users, buckets, assign policies, and provide reports at a granular level.
Based on the information and facts captured above, we can confidently say that not all object storage solutions are the same. That is a MYTH and I am happy that we BUSTED it!
Now that we have established that not all object storage solutions are the same, let’s define what does an enterprise-grade object storage look like. What are the features and functionality that are absolutely a must for enterprises to adopt object storage as their choice of storage platform for building their own enterprise private cloud?
Here is the checklist you should use to evaluate object storage offerings:
- Data and Control APIs:
- Does the object storage offer native implementation of S3-APIs?
- What kind of S3-API compatibility does the object storage solution have?
- Does the object storage solution offer guarantee for S3-API compatibility?
- True Scale-out Architecture:
- What does the underlying object storage architecture look like?
- Does it have a peer-to-peer scale-out architecture?
- Does it support multi-site, multi-tenant deployment?
- Can the deployments start small and grow seamlessly, non-disruptively to 100’s of nodes?
- Can you mix different capacity of nodes?
- Can you mix different hardware technologies (compute and disk)?
- Data Management
- Can you define granular data management policies (at the bucket level)?
- Can you deploy the cluster in Hybrid or Multi-Cloud configuration?
- Can you tier data to any of the public cloud and have the data available in the public cloud in its native format (avoid cloud lock-in)?
- Does it offer tools for intelligent monitoring and predictive maintenance?
- Data Protection and Security
- Can you deploy the solution to support any 9’s of data durability?
- Does the solution support encryptions with advance key management support?
- Does the solution support data immutability through WORM?
- Does the solution have federal certifications – FIPS 140-2, CC, etc.
- Does the solution have compliance certifications – SEC17a-4, etc.
Cloudian HyperStore is the only complete enterprise-grade object storage solution available in the market today. At Cloudian, our focus is to become the de-facto storage and data management platform for the burgeoning amount of data. Cloudian product portfolio is focused on capitalizing on key storage trends where existing legacy storage systems are falling short by bringing cloud technology to your data center.