On-Premises Object Storage: Building S3 in Your Backyard
Object storage is a new way to store data, which is used to implement massively scalable, elastic cloud storage solutions. However, object storage is not only used in the public cloud. On-premise object storage solutions are readily available, and many organizations are realizing the benefits of creating elastic storage pools in their local data center.
In fact, you can create your very own local version of Amazon S3 and get the benefits of elastic scalability, redundancy and resilience without paying a monthly fee for each GB of data.
In this article you will learn:
What is Object Storage?
Object storage, also known as object-based storage, is a technique that manages and manipulates data storage using objects stored in a central location, not structured as files within folders. Object storage pulls together all the bits of information that constitute a file, adds relevant metadata, and attaches a unique object identifier.
Object storage systems use comprehensive metadata schemes that allow users to organize data without a tiered file structure. Everything is placed into a flat address space, called a storage pool. Metadata is critical to object storage, because it enables the data system and its users to understand the context (for example, what the data is about) and lifecycle attributes (for example, how old is the data) of data in the storage pool.
4 Benefits of Object Storage
Unlike block or file storage systems, object storage can grow infinitely by adding storage resources, and distributing the storage pools across those resources (see our in-depth article on distributed storage).
2. Easier search and analysis
Because object storage is organized in one flat data layer, data can be retrieved much more quickly, and users can perform queries across very large volumes of data, supporting big data analytics scenarios.
3. Cost reduction
Object storage relies on commodity storage hardware, letting users add more inexpensive hardware units to the cluster to grow storage capacity.
4. High performance
Because object storage does not have a file hierarchy, and metadata is totally customizable, there are fewer constraints than with block or file storage.
Object Storage Use Cases
Object storage is typically used to support the following uses:
Disaster recovery—setting up a massive storage repository to backup an entire organization’s data to a disaster recovery site.
Static web hosting and content distribution—object storage is very useful for storing static files used to serve large-scale websites.
Data lakes—organizations are shifting from storing data in tightly structured data warehouses, to storing them in “data lakes” based on object storage. A data lake is a huge, elastically scalable data repository that enables fast, large-scale querying.
Rich media—object storage can be used to store large quantities of images, video, or audio files and serve them to an international audience.
Logs and Internet of Things (IoT) data—IT systems and IoT devices generate huge volumes of log and machine-to-machine data. Object storage is highly suitable for storing this unstructured data and enabling analysis, for example using AI algorithms.
Public Cloud vs On-Premise Object Storage: Key Considerations
The following are the most common considerations for choosing to consume object storage as a cloud service or deploying your own object storage system on-premises.
1. Cost and Ease of Deployment
The public cloud offers object storage on demand. It’s fast to set up, easy to use, and there is no need to buy, install and handle the physical infrastructure. Most public cloud services provide elastically scalable object storage starting from just a few dollars a month. They charge as you grow for additional storage capacity, usually only a few cents per Gigabyte per month.
But as you scale up cloud storage to support big data, this cost model can become expensive fast. Organizations need to store exponentially growing data volumes, monthly storage bills can become large, and there are additional charges for operations performed on data, network egress, and more.
On-premises object storage solutions have an upfront infrastructure cost, but no ongoing costs for storage capacity and data usage. Object storage systems do not require special expertise to deploy and operate, and are not maintenance intensive—storage is redundant and when a storage unit fails, it can usually be replaced seamlessly with another unit.
An IDC report showed that the 5-year total cost of ownership (TCO) is 65% lower for on-premise object storage compared to public cloud storage.
2. Security and Compliance
When employing any storage system, organizations need to safeguard data against accidental loss or malicious cyber-attack. In many environments, there are regulations or compliance standards, such as HIPAA, SOX or GDPR, that specify how sensitive or private data should be stored and protected.
Public cloud object storage services provide robust information security and reliability capabilities and may be able to meet the requirements of your compliance standards. Public cloud services also provide the ability to specify in which geographical region the data should be stored (data sovereignty), which is a requirement in some standards, provided that the service is offered in that region.
Investigate if your cloud of choice supports the relevant standards, and more importantly, what is required on your end to ensure compliance. Remember that responsibility for security and compliance is divided between cloud providers and users, and the cloud only provides tools and the ultimate responsibility for passing the compliance audit is yours.
On-premises object storage allows you to support any compliance standard and host data in any location. Prefer object storage technology that already supports the compliance requirements, and plan to deploy object storage in an environment already secured and prepared for compliance needs.
3. Availability and Reliability
On the public cloud storage services typically provide a high level of reliability through redundancy. Data is replicated to two, three or more physical locations across the cloud provider’s different data centers. For example, Amazon S3 stores data with 11 nines durability (99.999999999%) over a given year.
On-premises you have to rely on object storage technology that supports reliability standards similar to those offered on the cloud. Advanced object storage systems provide automated replication and redundancy of storage across multiple nodes in a storage cluster, to achieve near-public-cloud standards.
However, on-premises you will typically host all your data in a single geographical location, which is still susceptible to a site-wide disaster. To mitigate this risk, you can deploy object storage across multiple geographical sites, for example, in your head office and a disaster recovery site, with replication between them. Another alternative is to use a hybrid system, with critical data replicated to the public cloud for additional reliability.
Best Practices for On-Premise Object Storage Deployment
If you choose to use object storage on-premises, use the following best practices to plan a successful deployment.
Identify workloads that make sense for object storage
Object storage is best for large-scale, data-intensive use cases, such as backup pools, data archives, IoT data, CCTV, voice records, log files, and media files.
Object storage is often combined with other storage technologies. Consider a tiered storage infrastructure that will allow you to move data from high-performance storage to lower-cost, low-performance storage, combining object storage with classic disk arrays and SSD, which may be cost-effective for high IOPS, low latency applications that use smaller data sizes.
Beware of mega storage failure
High-density storage servers with over 1 petabyte in a single device are highly attractive, but this also creates a major risk for organizational IT. Carefully plan how to protect these mega storage devices from data loss.
Beyond that, plan for long rebuild times in case of disaster, when you need to recover from a backup. Transferring a petabyte or more of data can take days or weeks, depending on the available bandwidth, and failure during rebuild can be catastrophic. A way to mitigate this risk is to logically divide large servers into separate nodes, allowing you to recover critical systems more quickly, and reducing the damage caused by the failure of a specific restore operation.
Use multi-tenancy to combine different workloads on one device
A major advantage of object storage is the ability to simplify management by consolidating users and applications onto one system. Within the shared environment, the system must deliver service levels for different data consumers. Each organization or type of user needs a specific level of storage capacity, security, and performance.
To support multiple workloads with one object storage system, ensure that your system is configured using isolated storage domains, and select an object storage system that offers quality-of-service (QoS) controls.
Integrate data management into your application
On-premise object storage systems are standardizing around the Amazon S3 API. The S3 API is powerful and flexible, with over 400 operations that enable not only reading and writing but also management, reporting, integration with other cloud services.
Prefer object storage systems that use the S3 API, and build S3 data management commands into your application. This will allow you to seamlessly switch to other S3-compatible local storage systems, and also use your application seamlessly with the Amazon S3 service itself.
Meet Cloudian: Low-Cost, Massively Scalable On-Premise Object Storage
Cloudian® HyperStore® is a massive-capacity object storage device that is fully compatible with Amazon S3. It can store up to 1.5 Petabytes in a 4U Chassis device, allowing you to store up to 18 Petabytes in a single data center rack. HyperStore comes with fully redundant power and cooling, and performance features including 1.92TB SSD drives for metadata, and 10Gb Ethernet ports for fast data transfer.
HyperStore is an object storage solution you can plugin and start using with no complex deployment. It also offers advanced data protection features, supporting use cases like compliance, healthcare data storage, disaster recovery, ransomware protection, and data lifecycle management.