In the evolving landscape of enterprise storage, the distinction between scale-up and scale-out storage architectures remains a focal point. As organizations face exponential data growth, understanding the nuances of these architectures is crucial for efficient storage management and expansion.
Storage capacity is the primary benchmark for evaluating storage devices, closely followed by the ease of capacity expansion. The urgency of scaling is a critical concern for storage administrators, often requiring a choice between adding hardware to an existing system or architecting a more complex solution such as a new data center. The former, known as scale-up, and the latter, scale-out, are differentiated by their inherent architectural designs.
The Traditional Scale-Up Model
Scale-up storage has been the traditional approach. It typically involves a central pair of controllers overseeing multiple shelves of drives. Expansion is linear and limited; when space runs out, additional shelves of drives are integrated. The limitation of this model lies in the finite scalability of the storage controllers themselves.
As storage demands increase, the scale-up model encounters bottlenecks. New systems must be introduced to manage additional data, leading to increased complexity and isolated storage silos. This architecture also struggles with resource allocation inefficiency, as determining the optimal location for workloads becomes increasingly challenging.
RAID technology underpins drive failure protection in scale-up systems. However, RAID does not extend across multiple storage controllers, anchoring the drives to a specific controller and consequently cementing the scalability challenge of this architecture.
Figure 1 – Modular/Scale-up Storage Architecture
As an organization’s data volume grows, completely new systems need to be added to cope with the additional demands. Ultimately, this architecture becomes highly complex to manage. Inefficient resource allocation becomes an issue in deciding where workloads need to reside.
Figure 2 shows the potential for storage system sprawl.
Figure 2 – Modular/Scale-up Storage Silos
The Modern Scale-Out Strategy
In contrast, scale-out storage architectures, particularly those utilizing object storage, offer a dynamic alternative. Constructed with industry-standard servers, storage is linked to each node, reminiscent of Direct Attached Storage (DAS). Object storage software on each node unifies the nodes into a single cluster, creating a pooled storage resource with a unified namespace accessible to users and applications.
Protection against drive failure in a scale-out environment is not reliant on RAID but on RAIN (Redundant Array of Independent Nodes), which offers data resilience across nodes. RAIN supports several data protection methods, including replicas and erasure coding, which mirror RAID’s data safeguarding principles but are optimized for multi-node environments.
Figure 3 – Object/Scale-out Storage Architecture
Scale-Out with Cloudian HyperStore
Cloudian HyperStore exemplifies the scale-out storage solution. HyperStore utilizes object storage technology to enable seamless scalability, providing a storage platform that expands horizontally by adding nodes. Each node addition enhances storage capacity, as well as compute and networking capabilities, ensuring that performance scales with capacity.
HyperStore’s architecture allows for simple integration of new nodes, which the system then incorporates into the existing cluster. Data is intelligently distributed across the new configuration, maintaining performance and reliability without the limitations of traditional scale-up architectures.
In a multi-data center setup, Cloudian HyperStore’s geo-distributed capabilities shine. Nodes can be deployed across various geographical locations, and thanks to HyperStore’s geo-awareness, data can be strategically placed to optimize access speeds. Users access storage through a virtual address, with the system directing requests to the closest or most optimal node. This ensures fast response times and consistent data availability, irrespective of the user’s location.
HyperStore’s innovative approach not only addresses the immediate scalability challenges but also provides a future-proof solution that accommodates the ever-increasing volume and complexity of enterprise data. Its efficient use of resources, simplified management, and robust data protection mechanisms make it a compelling choice for enterprises looking to overcome the traditional hurdles of storage expansion.
In summary, the evolution from scale-up to scale-out storage, epitomized by solutions like Cloudian HyperStore, marks a significant transition in enterprise storage. Organizations can now address their data growth challenges more effectively, with architectures designed for the demands of modern data management.
Amazon Simple Storage Service (Amazon S3) is an object storage solution that provides data availability, performance, security and scalability. Organizations from all industries and of every size may use Amazon S3 storage to safeguard and store any amount of information for a variety of use cases, including websites, data lakes, backup and restore, mobile applications, archives, big data analytics, IoT devices, and enterprise applications.
What Is AWS S3 Bucket?
Amazon Simple Storage Service (Amazon S3) is an object storage solution that provides data availability, performance, security and scalability. Organizations from all industries and of every size may use Amazon S3 storage to safeguard and store any amount of information for a variety of use cases, including websites, data lakes, backup and restore, mobile applications, archives, big data analytics, IoT devices, and enterprise applications.
To retain your information in Amazon S3, you use resources called objects and buckets. A bucket is a container that houses objects. An object contains a file and all metadata used to describe the file.
To retain an object in Amazon S3, you develop a bucket and upload the object into it. Once the object is within the bucket, you may move it, download it, or open it. When you don’t require the bucket or object any longer, you can discard them to trim back on your resources.
This is part of an extensive series of articles about S3 Storage.
How to Use an Amazon S3 Bucket
An S3 customer starts by establishing a bucket in the AWS region of their choosing and assigns it a unique name. AWS suggests that customers select regions that are geographically close to them in order to minimize costs and latency.
After creating the bucket, the user chooses a storage tier based on the usage requirements for the data—there are various S3 tiers ranging in terms of price, accessibility and redundancy. A single bucket can retain objects from distinct S3 storage tiers.
The user may then assign particular access privileges regarding the objects retained in the bucket using various mechanisms, including bucket policies, the AWS IAM service, and ACL.
An AWS customer may work with an Amazon S3 bucket via the APIs, the AWS CLI, or the AWS Management Console.
Before you can store content in S3, you need to open a new bucket, selecting a bucket name and Region. You may also wish to select additional storage management choices for your bucket. Once you have configured a bucket, you can’t modify the Region or bucket name.
The AWS account that opened the bucket remains the owner. You may upload as many objects as you like to the bucket. According to the default settings, you can have as many as 100 buckets for each AWS account.
S3 lets you create buckets using the S3 Console or the API.
Keep in mind that buckets are priced according to data volume stored in them, and other criteria. Learn more in our guide to S3 pricing
The bucket name must be unique, begin with a number or lowercase letter, be between 3-63 characters, and may not feature any uppercase characters.
4. Select the AWS Region for the bucket. Select a Region near you to keep latency and cost to a minimum and to address regulatory demands. Keep in mind there are special charges for moving objects outside a region.
5. In Bucket settings for Block Public Access, specify if you want to allow or block access from external networks.
6. You can optionally enable the Object Lock feature in Advanced settings > Object Lock.
7. Select Create bucket.
What Is S3 Bucket Policy?
S3 provides the concept of a bucket policy, which lets you define access permissions for a bucket and the content stored in it. Technically, it is an Amazon IAM policy, which employs a JSON-based policy language.
For instance, policies permit you to:
Enable read access for unknown users
Restrict a particular IP address from accessing the bucket
Place a limit on access to a particular HTTP referrer
Require multi-factor authorization
S3 Bucket URLs and Other Methods to Access Your Buckets
You can perform almost any operation using the S3 console, with no need for code. However, S3 also provides a powerful REST API that gives you programmatic access to buckets and objects. You can reference any bucket or the objects within it via a unique Uniform Resource Identifier (URI).
Amazon S3 provides support for path-style and virtual-hosted-style URLs to gain access to a bucket. Given that buckets are accessible to these URLs, it is suggested that you establish buckets with bucket names that are DNS-compliant.
Virtual-Hosted-Style Access
In a virtual-hosted-style request, the bucket name is a component of the domain name within the URL.
Amazon S3 virtual-hosted-style URLs employ this format:
https://bucket-name.s3.Region.amazonaws.com/key name
For example, if you name the bucket bucket-one, select the US East 1 (Northern Virginia) Region, and use kitty.png as your key name, the URL will look as follows:
https://s3.Region.amazonaws.com/bucket-name/key name
For example, if you created a bucket in the US East (Northern Virginia) Region and named it bucket-one, the path-style URL you use to access the kitty.jpg object in the bucket will look like this:
Certain AWS services need you to specify an Amazon S3 bucket via S3://bucket, where you will need to follow this format:
S3://bucket-name/key-name
Note that when employing this format the bucket name does not feature the AWS Region. For example, a bucket called bucket-one with a kitty.jpg key will look like this:
AWS provides various tools for Amazon S3 buckets. An IT specialist may enable different versions for S3 buckets to retain every version of an object when an operation is carried out on it, for example a delete or copy operation. This may help stop IT specialists from accidentally deleting an object. Similarly, when creating a bucket, a user can establish server access logs, tags, object-level API logs, and encryption.
S3 Transfer Acceleration can assist with the execution of secure and fast transfers from the client to an S3 bucket via AWS edge locations.
Amazon S3 provides support for different alternatives for you to configure your bucket. Amazon S3 offers support for subresources so you can manage and retain the bucket configuration details. You can employ the Amazon S3 API to manage and develop these subresources. You may also utilize the AWS SDKs or the console.
These are known as subresources since they function in the context of a certain object or bucket. Below lists subresources that let you oversee bucket-specific configurations.
cors (cross-origin resource sharing): You may configure your bucket to permit cross-origin requests.
event notification: You may permit your bucket to alert you of particular bucket events.
lifecycle: You may specify lifecycle regulations for objects within your bucket that feature a well-outlined lifecycle.
location: When you establish a bucket, you choose the AWS Region where you want Amazon S3 to develop the bucket. Amazon S3 retains these details in the location subresources and offers an API so you can gain access to this information.
logging: Logging lets you monitor requests for access to the bucket. All access log records give details regarding one access request, including bucket name, requester, request action, request time, error code, and response status.
object locking: Enables the object lock feature for a bucket. You may also wish to configure a default period of retention and mode that applies to the latest objects that are uploaded to the bucket.
policy and ACL (access control list): Both buckets and the objects stored within them are private, unless you specify otherwise. ACL and bucket policies are two ways to grant permissions for an entire bucket.
replication: This option lets you automatically copy the content of the bucket to additional buckets, within the Amazon Region. Replication is asynchronous.
requestPayment: By default, the AWS account that sets up a bucket also receives bills for requests made to the bucket. This setting lets the bucket creator pass on the cost of downloading data from the bucket to the account downloading the content.
tagging: This setting allows you to add tags to an S3 bucket. This can help you track and organize your costs on S3. AWS shows the tags on your charges allocation report, with costs and usage aggregated via the tags.
transfer acceleration: Transfer acceleration enables easy, secure and fast movement of files over extended distances between your S3 bucket and your client. Transfer acceleration leverages the globally distributed edge locations via Amazon CloudFront.
versioning: Versioning assists you when recovering accidental deletes and overwrites.
website: You may configure the bucket for static website hosting.
Best Practices for Keeping Amazon S3 Buckets Secure
AWS S3 Buckets may not be as safe as most users believe. In many cases, AWS permissions are not correctly configured and can expose an organization’s AWS S3 buckets or some of their content.
Although misconfigured permissions are by no means a novel occurrence for many organizations, there is a specific permission that entails increased risk. If you allow objects to be public, this establishes a pathway for cyberattackers to write to S3 buckets that they don’t have the correct permissions to access. Misconfigured buckets are a major root cause behind many well-known attacks.
To protect your S3 buckets, you should apply the following best practices.
Block Public S3 Buckets at the Organization Level
Assign AWS accounts for public S3 utilization and stop all other S3 buckets from accidentally becoming public by putting in place S3 Block Public Access. Employ Organizations Service control policies (SCPs) to ensure that the Block Public Access setting is not alterable. S3 Block Public Access offers a degree of safety that functions at the level of the account and also on single buckets, encompassing those that you develop in the future.
You retain the capacity to prevent existing public access—irrespective of whether it was specified by a policy or an ACL—and to make sure that public access is not given to items you newly create. This provides only specific AWS accounts with public S3 buckets and stops all other AWS accounts.
Implement Role-Based Access Control
Outline roles that cover the access needs of users and objects. Make sure those roles have the least access needed to carry out the job so that if a user’s account is breached, the damage is kept to a minimum.
AWS security is founded on AWS Identity and Access Management (IAM) strategies. A principal is an identity that may be validated, for example, with a password. Roles, users, applications, and federated users (from separate systems) may all be principals. When a validated principal requests an entity, resource, service, or a different asset, verification begins.
Verification policies determine what access the principal has to the resource being requested. Approval is given based on resource-based methods or identity. Matching each validated principal with each validated policy will ascertain if the request is permitted.
Another data security methodology is splitting or sharing data into different buckets. For instance, a multi-tenant application could require separate Amazon S3 buckets for every tenant. You can use another AWS tool, Amazon VPC, which grants your endpoints secure access to sections of your Amazon S3 buckets.
Encrypt Your Data
Even with your greatest efforts, it remains good practice to assume that information is always at risk of being exposed. Given this, you should use encryption to stop unauthorized individuals from using your information if they have managed to access it.
Make sure that your Amazon S3 buckets are encrypted during transit and while sitting on the server. If you just have a single bucket, this is likely not complex, but if buckets are being developed dynamically, it may be difficult to keep track of them and manage encryption appropriately.
On the server side, Amazon S3 buckets support encryption, but this has to be enabled. Once encryption is turned on, the information is encrypted at rest. Encrypting the bucket will make sure that any individual who manages to access the data will require a password (key) to decrypt the data.
For transport security, HTTPS is used to make sure that information is encrypted from one end to another. Every additional version of Transport Layer Security (TLS) ensures that the protocol is more secure and does away with out-of-date, now insecure, encryption methods.
S3-Compatible Storage On-Premises with Cloudian
Cloudian® HyperStore® is a massive-capacity object storage device that is fully compatible with Amazon S3. It allows you to easily set up an object storage solution in your on-premises data center, enjoying the benefits of cloud-based object storage at much lower cost.
HyperStore can store up to 1.5 Petabytes in a 4U Chassis device, allowing you to store up to 18 Petabytes in a single data center rack. HyperStore comes with fully redundant power and cooling, and performance features including 1.92TB SSD drives for metadata, and 10Gb Ethernet ports for fast data transfer.
HyperStore is an object storage solution you can plug in and start using with no complex deployment. It also offers advanced data protection features, supporting use cases like compliance, healthcare data storage, disaster recovery, ransomware protection and data lifecycle management.
As your business expands, you have to manage isolated but rapidly growing pools of data from various sources, which are used for a variety of business processes and applications. Nowadays, many organizations grapple with a fragmented storage portfolio that slows down innovation and adds complexity to an organization’s applications. Object storage can help your organization break down these silos. It provides cost-effective, highly scalable storage that can retain any type of data in its original format.
Object storage is highly suitable for the cloud as it is flexible, elastic and can be more easily scaled into many petabytes to support indefinite data growth. The architecture manages and stores data as objects, as opposed to block storage, which relates to data as logical volumes, blocks and files storage, where data is stored in hierarchical files.
Let’s review the object storage offerings by some of the world’s leading cloud providers: Amazon Web Services, Microsoft Azure, Google Cloud, and IBM Cloud.
AWS Object Storage
AWS provides a distinct variety of storage classes for different use cases. Amazon S3 is the main object storage platform of AWS, with S3 Standard-IA providing cool storage, and Glacier providing cold storage:
Amazon S3 Standard—this is the storage choice for information that is often accessed, and is great for numerous use cases including dynamic websites, cloud applications, content distribution, data analytics and gaming. It delivers high throughput as well as low latency.
Amazon S3 Standard-Infrequent Access (Amazon S3 Standard—IA)—this is a storage alternative for data which is accessed less often, such as disaster recovery and long-term backups.
Amazon Glacier—this highly durable storage system is optimized for data that is not often accessed, or “cold” data, such as end-of-lifecycle data kept for compliance and regulatory backup purposes. Data is archived for long-term storage, and is immutable and encrypted.
Azure Object Storage
Microsoft offers Azure Blob Storage for object storage in the cloud. Blob storage is suited to storing any form of unstructured data, such as binary or text. This includes videos, images, documents, audio and more. Azure storage offers high-quality data integrity, flexibility and mutability.
Blob storage is employed for serving documents or images directly to a browser, for retaining files for distributed access, streaming audio and video, writing to log files, disaster recovery, storing data for restore and backup, and archiving, so it can be analyzed by an Azure-hosted or on-premises service.
Azure has several storage tiers, including:
Hot access tier— for information that is in or anticipated to be in active use and staged for processing and subsequent migration to the Cool storage tier.
Cool access tier—for data that is intended to stay in the Cool tier for more than 30 days. This includes disaster recovery datasets and short-term backup, media content that is older and intended to be immediately available when drawn on and large data sets.
Archive access tier—for data which will stay in the Archive tier for more than 180 days, and which can tolerate hours of retrieval latency.
Note: The Archive storage tier is not accessible at the storage account level, but only at the blob level. Azure also provides a Premium tier, which is for workloads that need consistent and fast response times.
Google Cloud Storage
Google Cloud Storage (GCS) provides united object storage for all workloads. It has four classes for backup and archival storage and high-performance object storage. All four classes provide high durability and low latency:
Hot (high-performance) storage—GCS provides regional and multi-regional storage for high-frequency access information.
Multi-regional storage—allows for the storing of information that is often accessed around the world, including streaming videos, serving website content, or mobile and gaming applications.
Regional storage—allows for frequent access to information in the corresponding region of Google Compute Engine instance or Google Cloud DataProc, for example data analytics.
Nearline (cool) storage—for data that only needs to be accessed less than once a month, but several times a year. Suitable for backups and long-tail multimedia content.
Coldline (cool) storage—for data that only needs to be accessed less than once a year. Suitable for archival data and disaster recovery.
IBM Cloud Object Storage
IBM Cloud provides scalable and flexible cloud storage with policy-driven archive abilities for unstructured data. This cloud storage service is intended for data archiving, for example for the long term retention of data that is infrequently accessed, including for mobile and web applications, and for backup and analytics.
IBM has four storage-class tiers integrated with an Asperaâ high-speed information transfer option. This allows for the easy transfer of data from and to Cloud Object Storage, and query-in-place functionality.
IBM Cloud Object Storage class tiers:
Standard storage—for active workloads that need high performance and low latency, and data that requires frequent and multiple access in a month. Usage scenarios are for example, active content repositories, analytics, mobile streaming and web content, collaboration and DevOps.
Vault storage—for less active workloads which need real-time, on-demand access but only infrequently, up to once a month. Use cases include digital asset retention and backup.
Cold vault—for cold workloads, where data needs on-demand, real-time access when needed but is mainly archived. For example, data that is accessed several times a year. Common use cases involve long-term backup, large data set preservation such as older media content and scientific data.
Flex storage—this class tier is utilized for dynamic workloads (combining cold and hot workloads) and data based on access patterns. Typical use cases include cognitive workloads, cloud-native analytics and user-generation applications.
Cloud Object Storage Pros and Cons
The following are some of the key advantages and disadvantages of object storage in the cloud.
Cloud Object Storage Pros
The key advantages of object storage include:
Data is highly distributed, which ensures it is more resilient to hardware failures or disasters. This way, it is available even if various nodes fail.
Objects are kept in a flat address space, which minimizes complexity and scalability issues.
Data protection is built into this architecture in the form of erasure coding or replication technology.
Object storage is most suitable for cloud storage and static data. Common use cases for object storage include archiving and cloud backup—the technology functions best with data that is more frequently read than written to.
Object storage has developed to the point where it scales at the exabyte level and represents trillions of objects. The use of VMs or commodity hardware enables nodes to be added easily, with the disk space being used more efficiently.
Object storage systems, via the use of object IDs (OIDs) or identifiers, can gain access to any piece of data without knowing on which physical storage device, directory, or file system it resides on. The abstraction lets object storage devices operate with storage hardware configured in distributed node architecture. This way, processing power can scale together with data storage capacity.
I/O requests don’t need to pass via a central controller, allowing for a true global storage system for large amounts of data overseen by objects, physically kept anywhere, and retrieved through the internet or a WAN.
Cloud Object Storage Pros
The key disadvantages of object storage include:
Object storage systems are not steady enough for real-time systems, including transactional databases. An undesirable use case for object storage is an environment or application with a high transactional rate.
Object storage doesn’t guarantee that read requests will produce the most up-to-date version of the data.
This technology isn’t alway appropriate for applications that have high performance demands.
Cloud-based storage often ends up being more expensive because you need to pay for storage on an ongoing basis. With on-premises equipment you pay once and the storage is yours.
Bring Object Storage On-Premises with Cloudian
Cloudian® HyperStore® is a massive-capacity object storage device that is fully compatible with Amazon S3. It allows you to easily set up an object storage solution in your on-premises data center, enjoying the benefits of cloud-based object storage at much lower cost.
HyperStore can store up to 1.5 Petabytes in a 4U Chassis device, allowing you to store up to 18 Petabytes in a single data center rack. HyperStore comes with fully redundant power and cooling, and performance features including 1.92TB SSD drives for metadata, and 10Gb Ethernet ports for fast data transfer.
HyperStore is an object storage solution you can plug in and start using with no complex deployment. It also offers advanced data protection features, supporting use cases like compliance, healthcare data storage, disaster recovery, ransomware protection and data lifecycle management.
Object storage is a data storage architecture that stores and manages unstructured data in units called objects. Objects can be any size or format, and can include data, metadata, and a unique identifier.
Unlike other storage systems, object storage is not organized into folders or a hierarchical path, so objects can be reached through multiple paths. In object storage, objects are stored in a flat data environment and can be accessed through multiple paths, rather than being organized into folders.
Objects can store photos, videos, emails, audio files, network logs, or any other type of structured or unstructured data. All of the major public cloud services, including Amazon, Google and Microsoft, employ object storage as their primary storage.
This is part of an extensive series of guides about data security.
Object storage is a technology that manages data as objects. All data is stored in one large repository which may be distributed across multiple physical storage devices, instead of being divided into files or folders.
It is easier to understand object-based storage when you compare it to more traditional forms of storage – file and block storage.
File Storage
File storage stores data in folders. This method, also known as hierarchical storage, simulates how paper documents are stored. When data needs to be accessed, a computer system must look for it using its path in the folder structure.
File storage uses TCP/IP as its transport, and devices typically use the NFS protocol in Linux and SMB in Windows.
Block Storage
Block storage splits a file into separate data blocks, and stores each of these blocks as a separate data unit. Each block has an address, and so the storage system can find data without needing a path to a folder. This also allows data to be split into smaller pieces and stored in a distributed manner. Whenever a file is accessed, the storage system software assembles the file from the required blocks.
Block storage uses FC or iSCSI for transport, and devices operate as direct attached storage or via a storage area network (SAN).
Object Storage
In object storage systems, data blocks that make up a file or “object”, together with its metadata, are all kept together. Extra metadata is added to each object, which makes it possible to access data with no hierarchy. All objects are placed in a unified address space. In order to find an object, users provide a unique ID.
Object-based storage uses TCP/IP as its transport, and devices communicate using HTTP and REST APIs.
Metadata is an important part of object storage technology. Metadata is determined by the user, and allows flexible analysis and retrieval of the data in a storage pool, based on its function and characteristics.
The main advantage of object storage is that you can group devices into large storage pools, and distribute those pools across multiple locations. This not only allows unlimited scale, but also improves resilience and high availability of the data.
Object Storage Architecture: How Does It Work?
Anatomy of an Object
Object storage is fundamentally different from traditional file and block storage in the way it handles data. In an object storage system, each piece of data is stored as an object, which can include data, metadata, and a unique identifier, known as an object ID. This ID allows the system to locate and retrieve the object without relying on hierarchical file structures or block mappings, enabling faster and more efficient data access.
Objects can be any size or format, and can store photos, videos, emails, audio files, network logs, or any other type of structured or unstructured data.
Data Storage Layer: Flat Data Environment
The data storage layer is where the actual data objects are stored. Object storage is not organized into folders or a hierarchical path, so objects can be reached through multiple paths. Objects are stored in a flat data environment and can be accessed through multiple paths.
In an object storage system, data is typically distributed across multiple storage nodes to ensure high performance, durability, and redundancy. Each storage node typically contains a combination of hard disk drives (HDDs) and solid-state drives (SSDs) to provide the optimal balance between capacity, performance, and cost. Data objects are automatically replicated across multiple nodes, ensuring that data remains available and protected even in the event of hardware failures or other disruptions.
Metadata Index
The metadata index is a critical component of object storage architecture, as it maintains a record of each object’s unique identifier, along with other relevant metadata, such as access controls, creation date, and size. This information is stored separately from the actual data, allowing the system to quickly and efficiently locate and retrieve objects based on their metadata attributes. The metadata index is designed to be highly scalable, enabling it to support millions or even billions of objects within a single object storage system.
API Layer
The API layer is responsible for providing access to the object storage system, allowing users and applications to store, retrieve, and manage data objects. Most object storage systems support a variety of standardized APIs, such as the Simple Storage Service (S3) API from Amazon Web Services (AWS), the OpenStack Swift API, and the Cloud Data Management Interface (CDMI). These APIs enable developers to easily integrate object storage into their applications, regardless of the underlying storage technology or vendor.
5 Expert Tips to help you better optimize your object storage care
Jon Toor, CMO
With over 20 years of storage industry experience in a variety of companies including Xsigo Systems and OnStor, and with an MBA in Mechanical Engineering, Jon Toor is an expert and innovator in the ever growing storage space.
Leverage lifecycle policies to manage storage costs: Implement object lifecycle management to automatically transition objects between storage classes based on their age or access patterns. This can help you reduce storage costs by moving infrequently accessed data to colder storage tiers.
Optimize metadata for faster search and analytics: Invest time in designing your object metadata schema. Adding meaningful, searchable metadata can dramatically enhance retrieval speed and enable powerful analytics without needing to process the entire object.
Use erasure coding for efficient data protection: While replication is common, erasure coding provides more efficient storage utilization, especially in environments with large datasets. It offers high durability while using less storage space than simple replication.
Enable versioning for data integrity and compliance: Activate object versioning to protect against accidental overwrites or deletions. This is critical for compliance in industries where data integrity is required over long retention periods.
Implement policy-driven data tiering: Automate data movement between hot, warm, and cold storage using policy-based rules. This approach allows you to maximize cost efficiency by aligning storage costs with data value and access frequency.
Object Storage Benefits
Exabyte Scalable
Unlike file or block storage, object storage services enable scalability that goes beyond exabytes. While file storage can hold many millions of files, you will eventually hit a ceiling. With unstructured data growing at 50+% per year, more and more users are hitting those limits, or they expect to in the future.
Scale Out Architecture
Object storage makes it easy to start small and grow. In enterprise storage, a simple scaling model is golden. And scale-out storage is about as simple as it gets: you simply add another node to the cluster and that capacity gets folded into the available pool.
HyperStore is an S3-compatible storage system. HyperFile is a connector that allows files to be stored on HyperStore.
Customizable Metadata
While file systems have metadata, the information is limited and basic (date/time created, date/time updated, owner, etc.). Object storage allows users to customize and add as many metadata tags as they need to easily locate the object later. For example, an X-ray could have information about the patient’s age and height, the type of injury, etc.
High Sequential Throughput Performance
Early object storage systems did not prioritize performance, but that’s now changed. Now, object stores can provide high sequential throughput performance, which makes them great for streaming large files. Also, object storage services help eliminate networking limitations. Files can be streamed in parallel over multiple pipes, boosting usable bandwidth.
Flexible Data Protection Options
To safeguard against data loss, most traditional storage options utilize fixed RAID groups (groups of hard drives joined together), sometimes in combination with data replication. The problem is, these solutions generally lead to one-size-fits-all data protection. You can not vary the protection level to suit different data types.
Object storage solutions employ a flexible tool called erasure coding that is similar to old-fashioned RAID in some ways, but is far more flexible. Data is striped across multiple drives or nodes as needed to achieve the needed protection for that data type. Between erasure coding and configurable replication, data protection is both more robust and more efficient.
Support for the S3 API
Back when object storage solutions were launched, the interfaces were proprietary. Few application developers wrote to these interfaces. Then Amazon created the Simple Storage Service, or “S3”. They also created a new interface, called the “S3 API”. The S3 API interface has since become a de-facto standard for object storage data transfer.
The existence of a de facto standard changed the game. Now, S3-compatible application developers have a stable and growing market for their applications. And service providers and S3-compatible storage vendors such as Cloudian have a growing user set deploying those applications. The combination sets the stage for rapid market growth.
Lower Total Cost of Ownership (TCO)
Cost is always a factor in storage. And object storage services offer the most compelling story, both in hardware/software costs and in management expenses. By allowing you to start small and scale, this technology minimizes waste, both in the form of extra headcount and unused space. Additionally object storage systems are inherently easy to manage. With limitless capacity within a single namespace, configurable data protection, geo replication, and policy-based tiering to the cloud, it’s a powerful tool for large-scale data management.
To learn more about Cloudian’s fully native S3-compatible storage in your data center, and how it can cut down your TCO, check out our free trial. Or visit cloudian.com for more information.
Object Storage Use Cases
There are numerous use cases for object storage, thanks to its scalability, flexibility, and ease of use. Some of the most common use cases include:
Backup and archiving
Object storage is an excellent choice for storing backup and archive data, thanks to its durability, scalability, and cost-effectiveness. The ability to store custom metadata with each object allows organizations to easily manage retention policies and ensure compliance with relevant regulations.
Big data analytics
The horizontal scalability and programmability of object storage make it a natural choice for storing and processing large volumes of unstructured data in big data analytics platforms. Custom metadata schemes can be used to enrich the data and enable more advanced analytics capabilities.
Media storage and delivery
Object storage is a popular choice for storing and delivering media files, such as images, video, and audio. Its scalability and performance make it well-suited to handling large volumes of media files, while its support for various data formats and access methods enables seamless integration with content delivery networks and other media delivery solutions.
Internet of Things (IoT)
As the number of connected IoT devices continues to grow, so too does the amount of data they generate. Object storage is well-suited to handle the storage and management of this data, thanks to its scalability, flexibility, and support for unstructured data formats.
How to Choose an Object-Based Storage Solution
When choosing an object storage solution, there are several factors to consider. Some of the most important factors include:
Scalability: One of the primary strengths of object storage is its ability to scale horizontally, so it’s essential to choose a platform that can grow with your organization’s data needs. Look for a solution that can easily accommodate massive amounts of data without sacrificing performance or manageability.
Data durability and protection: Ensuring the integrity and availability of your data is critical, so look for an object storage platform that offers robust data protection features, such as erasure coding, replication, or versioning. Additionally, consider the platform’s durability guarantees – how likely is it that your data will be lost or corrupted?
Cost: Cost is always a consideration when choosing a storage solution, and object storage is no exception. Be sure to evaluate the total cost of ownership (TCO) of the platform, including factors such as hardware, software, maintenance, and support costs. Additionally, if you’re considering a cloud-based solution, be sure to factor in the costs of data transfer and storage.
Performance: While object storage is not typically designed for high-performance, low-latency workloads, it’s still important to choose a platform that can deliver acceptable performance for your organization’s specific use cases. Consider factors such as throughput, latency, and data transfer speed when evaluating performance.
Integration and compatibility: The ability to integrate the object storage platform with your existing infrastructure and applications is essential. Look for a solution that supports industry-standard APIs and protocols, as well as compatibility with your organization’s preferred development languages and tools.
See Additional Guides on Key Data Security Topics
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of data security.
Mobile video surveillance can do a lot to ensure safety on transit systems. After all, bus and train operators must focus on operating their vehicles, not on policing riders.
Real-time mobile video surveillance would allow one staff member to monitor multiple vehicles, which could save cost and increase safety.
The problem is this: traditional technologies record video on the vehicle for later retrieval after the vehicle returns home. The obvious problem here is the lack of a real-time view. When an incident occurs, you can only see what happened after-the-fact.
Also, when an incident occurs finding the relevant clip takes a long time. The manual process consumes expensive resources and slows a response.
A better video surveillance answer was devised by the City of Montebello. View this video to learn more.
The Challenges in Storing Video Surveillance
Montebello Bus Lines currently operates 72 buses that serve over 8 million passengers a year, and each bus houses five cameras and a recording system. All videos were only recorded locally on the buses. Transferring the data into the operations center at the end of the day took time.
Then, MBL had to manually locate clips using time codes. This made it difficult to follow up on reported incidents in a timely manner.
Another storage issue was budget. Budget limitations meant MBL couldn’t keep the video data for more than a few days. If someone filed a complaint after the video was deleted, the city of Montebello would face financial risk.
Finding the Answer in Object Storage
What MBL needed was the ability to wirelessly upload video in addition to storing the data locally. This would allow for immediate review by transit staff or law enforcement and would serve as an additional layer of backup to prevent data loss.
MBL first tried using a Network Attached Storage (NAS) system, but the problem with NAS is that the entry systems simply aren’t fast enough while the better performing systems are cost-prohibitive. Another challenge was the file structure, which did not allow graceful transfer over a wireless network. An interrupted transfer resulted in re-starting the process. Finally, NAS systems allowed limited metadata tagging, containing only the most basic information.
But this is where Cloudian steps in. With Cloudian and Transportation Security Systems (TSS) IRIS, MBL is now able to add metadata tagging on their videos. The metadata search also makes it easier to locate videos based on parameters such as time, location, vehicle, and more.
Large clips are broken into smaller pieces before being transferred concurrently, resulting in better reliability and successful use of wireless data transfers. Additionally, object storage is more cost-efficient, meaning it’s easy (and affordable) to scale up as more videos are stored.
David Tsuen, IT Manager for the City of Montebello, stated that “Cloudian and TSS together allowed us to solve a very challenging problem. We now have a path to significant cost savings for the City and a safer experience for our riders. That’s a genuine win-win.”
You can learn more about how we solved MBL’s challenges by reading our case study, or you can try Cloudian out for yourself with our free trial.
We just announced our Data Management Partners program to help our customers solve more capacity management problems in less time. The program combines technology, testing, and support to make it easy to put object storage to work. Inaugural members of this program are Rubrik, Komprise, Evolphin, and CTERA Networks.
Here’s why this program is exciting: object storage has the potential to solve many capacity management problems in the data center. It’s 2/3 less costly and infinitely scalable. In a recent survey, Gartner found that capacity management was the #1 concern of Infrastructure and Operations managers, so these are important benefits.
The question is how to get started with object storage? You can piece together solutions on your own, but that can be risky. We’ve done the homework for you and proved out these solutions.
The Solution for Unstructured Data Consolidation
These solutions solve capacity-intensive challenges where Cloudian’s scalability and cost benefits deliver huge savings. Cloudian consolidates data into one big storage pool, so you can add as many nodes as you want. With one set of users, groups, permissions, file structures, etc, storage managers see still only see one thing to manage. This cuts management workloads by 90% and makes it possible to grow with less headache and cost.
Solution areas in this program include:
Data protection: Rubrik and Cloudian together unify and automate backup, instant recovery, replication, global indexed search, archival, compliance, and copy data management into a single scale-out fabric across the data center and public cloud.
Data lifecycle management: Komprise and Cloudian tackle one of the biggest challenges in the data center industry, unstructured data lifecycle management, with solutions that offload non-critical data that is typically 70%+ of the footprint from costly Tier-1 NAS to a limitless scalable storage pool.
Media active archiving: Evolphin and Cloudian help media professionals address capacity-intensive formats (e.g., 4k, 8k, VR/360) with the performance to handle time-pressed workflows.
File sync and share: CTERA Networks and Cloudian provide enterprises with tools for collaboration in capacity-rich environments.
Reducing Risk with Proven Partners
This program is 100% proven solutions. All are deployed, with customers, in live production data centers, right now. They solve real capacity management problems and do not create new problems along the way.
Object storage is seeing rapid adoption. It costs significantly less than traditional storage and fixes the capacity problem with infinite scalability. If you’re looking into object storage, make sure you’re getting a complete solution, though. Learn more about our Data Management Partners today.
All data is not equal due to factors such as frequency of access, security needs, and cost considerations, therefore data storage architectures need to provide different storage tiers to address these varying requirements. Storage tiers differ depending on disk drive types, RAID configurations or even completely different storage sub-systems, which offer different IP profiles and cost impact.
Data tiering allows the movement of data between different storage tiers, which allows an organization to ensure that the appropriate data resides on the appropriate storage technology. In modern storage architectures, this data movement is invisible to the end-user application and is typically controlled and automated by storage policies. Typical data tiers may include:
Flash storage – High value, high-performance requirements, usually smaller data sets and cost is less important compare to the performance Service Level Agreement (SLA) required
Traditional SAN/NAS Storage arrays – Medium value, medium performance, medium cost sensitivity
Object Storage – Less frequently accessed data with larger data sets. Cost is an important consideration
Public Cloud – Long-term archival for data that is never accessed
Typically, structured data sets belonging to applications/data sources such as OLTP databases, CRM, email systems and virtual machines will be stored on data tiers 1 and 2 as above. Unstructured data is more commonly moving to tiers 3 and 4 as these are typically much larger data sets where performance is not as critical and cost becomes a more significant factor in management and purchasing decisions.
Some Shortcomings of Data Tiering to Public Cloud
Public cloud services have become an attractive data tiering solution, especially for unstructured data, but there are considerations around public cloud use:
Performance – Public network access will typically be a bottleneck when reading and writing data to public cloud platforms, along with data retrieval times (based on the SLA provided by the cloud service). Especially for backup data, backup and recovery windows are still incredibly important, so for the most relevant backup sets it is worth considering to hold onsite and only archive older backup data to the cloud.
Security – Certain data sets/industries have regulations stipulating that data must not be stored in the cloud. Being able to control what data is sent to the cloud is of major importance.
Access patterns – Data that is re-read frequently may incur additional network bandwidth costs imposed by the public cloud service provider. Understanding your use of data is vital to control the costs associated with data downloads.
Cost – As well as bandwidth costs associated with reading data, storing large quantities of data in the cloud may not make the most economical sense, especially when compared to the economics of on-premise cloud storage. Evaluations should be made.
Using Hybrid Cloud for a Balanced Data Tier Strategy
For unstructured data, a hybrid approach to data management is key with an automation engine, data classification and granular control of data necessary requirements to really deliver on this premise.
With a hybrid cloud approach, you can push any data to the public cloud while also affording you the control that comes with on-premises storage. For any data storage system, granularity of control and management is extremely important as different data sets have different management requirements with the need to apply different SLAs as appropriate to the value of the data to an organization.
Cloudian HyperStore is a solution that gives you that flexibility for easily moving between data tiers 3 and 4 listed earlier in this post. Not only do you get the control and security from your data center, you can integrate HyperStore with many different destination cloud storage platforms, including Amazon S3/Glacier, Google Cloud Platform, and any other cloud service offering S3 API connectivity.
Internet Unie, a service provider in the Netherlands, has recently deployed an innovative hybrid cloud service, combining Cloudian object storage in their data center together with Amazon S3 storage.
The new service allows their colocation customers to employ local S3 storage in their data center, with additional capacity available in the AWS public cloud.
Why would a service provider launch a service that employs another service provider (in this case Amazon)?
The answer is simple: it fills a real business need and gives Internet Unie a competitive advantage.
By offering their customers this hybrid service, Internet Unie meets multiple possible requirements:
Performance: Local storage provides cloud-compatible capacity without the latency of a long network hop
Data governance: Locally stored data does not leave the data center
Capacity flexibility: Data can be tiered off to the cloud when desired, meaning capacity is always there
Disaster recovery: Backup information can be moved off site at any time
Cost: Locally stored information costs nothing to access, meaning that cloud storage invoices become far more predictable
Archival storage: Cloud archival services are very cost effective for rarely accessed information
Business simplicity: One invoice for both on prem and cloud storage, thanks to the Amazon Marketplace metered-by-use program
Internet Unie summed it up this way:
“This hybrid service opens up enormous possibilities for those using the AWS service cloud offerings and need to store certain data types in a private cloud, for reasons such as data governance policies. With Cloudian’s new offering on AWS, our customers can point their applications to either cloud storage or on-premises storage, and it’s completely transparent,” said Arvid Cauwels, Sales Director at Internet Unie. “With AWS metering now available for Cloudian storage, customers get one AWS invoice for both their public and private cloud storage usage.”
Cloudian is a natural fit due to our native support for the Amazon S3 API, which makes it easy to tier between a Cloudian storage system in the Internet Unie data center and AWS cloud storage. Additionally, Cloudian supports AWS metering, which pulls all usage and billing (for both public and private cloud) into a single monthly AWS invoice.
Hybrid cloud represents a ‘best of both worlds’ solution, giving customers extra flexibility and control while providing limitless scalability. Read our blog post to learn more about why you should consider a hybrid cloud solution.
The enterprise storage industry is going through a massive transformation, and over the last several years I’ve had the good fortune of being on the front lines. As founding CEO of Nexenta, I helped that company disrupt the storage industry by creating and leading the open storage market. These days I’m having a blast as a senior advisor and investor at companies including Cloudian, who is taking off as a leader in what is typically called “object storage”.
In this blog I’d like to share what I’m seeing – across the IT industry – and why change is only accelerating. The sources of this acceleration are much larger than any one technology vendor or, indeed, than the technology itself.
Let’s start at the top – the top of the stack, where developers and their users reside. From there we will dive into the details before summarizing the implications for the storage industry.
Software eats everything
What does “software eats everything” really mean? To me it means that more than ever start-ups are successfully targeting entire industries and transforming them through technology-enabled “full stack” companies. The canonical example is a few guys that thought about selling better software to taxi companies… and instead became Uber.
Look around and you’ll see multiple examples where software has consumed an industry. And today, Silicon Valley’s appetite is larger than it ever has been.
So why now? Why is software eating everything? A few reasons:
Cloud and AWS – When I started Clarus back in the early 2000s, it cost us at least $7 million to get to what we now would call a minimum viable product. These days, it costs perhaps 10% of that, largely thanks to the shift to the cloud. Maybe more importantly, thanks to SaaS and AWS, many users now see that cloud-hosted software is often safer than on-premises software.
SaaS and Cloud have enabled a profound trend: DevOps – DevOps first emerged in technology companies that deliver software via the cloud. Companies such as Netflix, Facebook, and GitHub achieve developer productivity that is 50-60x that of older non-DevOps approaches. Highly automated end-to-end deployment and operations pipelines allow innovation to occur massively faster – with countless low risk changes being made and reverted as needed to meet end user needs.
Pocket sized supercomputers – Let’s not forget that smartphones enable ubiquitous user interactions and also smart-sensing of the world – a trend that IoT only extends.
Open source and a deep fear of lock-in – Open source now touches every piece of the technology stack. There are a variety of reasons for this including the role that open source plays as a way for developers to build new skills and relationships. Another reason for the rise of open source is a desire to avoid lock-in. Enterprises such as Bank of America and others are saying they simply will *not* be locked in again.
Machine learning – Last but not least, we are seeing the emergence of software that teaches itself. For technology investors, this builds confidence since it implies a fundamental method of sustaining differentiation. Machine learning is turning out to the be the killer-app for big data. This has massive second-order effects that have yet to be fully considered. For example, how will the world change as weather prediction continues to improve? Or will self-driving cars finally lead to pedestrian-friendly suburban environments in the US?
Ok, so those are at least a few of the trends…let’s get more concrete now. What does software eating everything – and scaring the heck out of corporate America wrestling with a whole new batch of competitors – mean for storage?
Macro trends drive new storage requirements
Let’s hit each trend quickly in turn.
1) Shift to AWS
By now you probably know that Cloudian is by far the most compliant Amazon S3 storage. And this S3 compliance is not just about data path commands – it is also about the management experience such as establishing buckets.
What’s more, doubling down on this differentiation, Cloudian and Amazon recently announced a relationship whereby you can bill via Amazon for your on-premise Cloudian storage. In both cases Cloudian is the first solution with this level of integration and partnership.
2) DevOps
If you’re an enterprise doing DevOps, you should look at Cloudian. That’s because the automation that serves as the foundation for DevOps is greatly simplified by the API consistency that Cloudian delivers.
If your developers are on the front lines of responding to new full stack competitors, you don’t want them hacking together their own storage infrastructure. To deliver on the promise of “just like Amazon S3, on premise and hybrid”, Cloudian has to make distributed system management simple. This is insanely difficult.
In a recent A16Z podcast, Marc Andreessen commented that there are only a few dozen great distributed systems architects and operators in the world today. If you already employ a few of them, and they have time on their hands, then maybe you should just grab Ceph and attempt to roll your own version of what Cloudian delivers. Otherwise, you should be a Cloudian user.
3) Mobility
Architectures have changed with mobility in mind. User experience is now further abstracted from the underlying infrastructure.
In the old scale-up storage world, we worried a lot about IOPS for particular read/write workloads. But when RF is your bottleneck, storage latency is less of a concern. Instead, you need easy to use, massively scalable, geographically disperse systems like object storage, S3, and Cloudian.
4) Open source and a fear of lock-in
Enterprises want to minimize their lock-in to specific service providers. The emergence of a de-facto standard, Amazon S3, now allows providers and ISVs to compete on a level playing field. Google is one example. They now offer S3 APIs on their storage service offerings. If your teams need to learn a new API or even a new set of GUIs to go with a new storage vendor, then you are getting gradually locked in.
5) Machine learning
Machine learning may be the killer-app for big data. In general, there is one practical problem with training machine learning: That is, how do we get the compute to the data rather than the other way around?
The data is big and hard to move. The compute is much more mobile. But even then, you typically require advanced schedulers at the compute layer – which is the focus of entire projects and companies.
The effectiveness of moving the compute to the data is improved if information about the data is widely available as metadata. Employing metadata, however, leads to a new problem: it’s hard to store, serve, and index this metadata to make it useful at scale. It requires an architecture that is built to scale and to serve emerging use cases such as machine learning. Cloudian is literally years ahead of competitors and open source projects in this area.
For a real world example, look no further than Cloudian’s work with advertising giant Dentsu to deliver customized ads to Tokyo drivers. Here, Cloudian demonstrates the kind of breakthrough applications that can be delivered, due in part to a rich metadata layer Read more here, and see what is possible today with machine learning and IoT.
As I’ve written elsewhere, there is a lot to consider when investing in technology. You need companies that understand and can exploit relevant trends. But even more so, you need a great team. In Cloudian you’ve got a proven group that emphasizes product quality and customer success over big booths and 5 star parties.
Nonetheless, I thought it worth putting Cloudian’s accelerating growth into the context of five major themes. I hope you found this useful. I’d welcome any feedback in the comments below or via Twitter. I’m at @epowell101 and I’ll try to catch comments aimed at @CloudianStorage as well.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookies
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.
If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.