Object Storage Bucket-Level Auto-Tiering with Cloudian

As discussed in my previous blog post, ‘An Introduction to Data Tiering’, there is huge value in using different storage tiers within a data storage architecture to ensure that your different data sets are stored on the appropriate technology. Now I’d like to explain how the Cloudian HyperStore system supports object storage ‘auto-tiering’, whereby objects can be automatically moved from local HyperStore storage to a destination storage system on a predefined schedule based upon data lifecycle policies.

As discussed in my previous blog post, ‘An Introduction to Data Tiering’, there is huge value in using different storage tiers within a data storage architecture to ensure that your different data sets are stored on the appropriate technology. Now I’d like to explain how the Cloudian HyperStore system supports object storage ‘auto-tiering’, whereby objects can be automatically moved from local HyperStore storage to a destination storage system on a predefined schedule based upon data lifecycle policies.

Cloudian HyperStore can be integrated with any of the following destination cloud storage platforms as a target for tiered data:

  • Amazon S3
  • Amazon Glacier
  • Google Cloud Platform
  • Any Cloud service offering S3 API connectivity
  • A remotely located Cloudian HyperStore cluster

Granular Control with Cloudian HyperStore

For any data storage system, granularity of control and management is extremely important –  data sets often have varying management requirements with the need to apply different Service Level Agreements (SLAs) as appropriate to the value of the data to an organisation.

Cloudian HyperStore provides the ability to manage data at the bucket level, providing flexibility at a granular level to allow SLA and management control (note: a “bucket” is an S3 data container, similar to a LUN in block storage or a file system in NAS systems). HyperStore provides the following as control parameters at the bucket level:

  • Data protection – Select from replication or erasure coding of data, plus single or multi-site data distribution
  • Consistency level – Control of replication techniques (synchronous vs asynchronous)
  • Access permissions – User and group control access to data
  • Disaster recovery – Data replication to public cloud
  • Encryption – Data at rest protection for security compliance
  • Compression – Reduction of the effective raw storage used to store data objects
  • Data size threshold – Variable storage location of data based upon the data object size
  • Lifecycle policies – Data management rules for tiering and data expiration

Cloudian HyperStore manages data tiering via lifecycle policies as can be seen in the image below:

Auto-tiering is configurable on a per-bucket basis, with each bucket allowed different lifecycle policies based upon rules. Examples of these include:

  1.      Which data objects to apply the lifecycle rule to. This can include:
  • All objects in the bucket
  • Objects for which the name starts with a specific prefix (such as prefix “Meetings/2015/”)
  1.      The tiering schedule, which can be specified using one of three methods:
  • Move objects X number of days after they’re created
  • Move objects if they go X number of days without being accessed
  • Move objects on a fixed date — such as December 31, 2016

When a data object becomes a candidate for tiering, a small stub object is retained on the HyperStore cluster. The stub acts as a pointer to the actual data object, so the data object still appears as if it’s stored in the local cluster. To the end user, there is no change to the action of accessing data, but the object does display a special icon denoting the fact that the data object has been moved.

For auto-tiering to a Cloud provider such as Amazon or Google, an account is required along with associated account access credentials.

Accessing Data After Auto-Tiering

To access objects after they’ve been auto-tiered to public cloud services, the objects can be accessed either directly through a public cloud platform (using the applicable account and credentials) or via the local HyperStore system. There are three options for retrieving tiered data:

  1.      Restoring objects –   When a user accesses a data file, they are directed to the local stub file held on HyperStore which then redirects the user request to the actual location of the data object (tiered target platform).

A copy of the data object is restored back to a local HyperStore bucket from the tiered storage and the user request will be performed on the data object once copied back. A time limit can be set for how long to retain the retrieved object locally, before returning to the secondary tier.

This is considered the best option to use when accessing data relatively frequently and you want to avoid any performance impact incurred by traversing the internet and any access costs applied by service providers for data access/retrieval. Storage capacity must be managed on the local HyperStore cluster to ensure that there is sufficient “cache” for object retrievals.

  1.      Streaming objects – Streams data directly to the client without restoring the data to the local HyperStore cluster first. When the file is closed, any modifications are made to the object in situ on the tiered location. Any metadata modifications will be updated in both local HyperStore database and on the tiered platform.

This is considered the best option to use when accessing data relatively infrequently and concern about the storage capacity of the local HyperStore cluster is an issue, but performance will be lower as the data requests are traversing the internet and access costs may be applied by the service provider every time this file is read.

  1.      Direct access – Objects auto-tiered to public cloud services can be accessed directly by another application or via your standard public cloud interface, such as the AWS Management Console. This method fully bypasses the HyperStore cluster. Because objects are written to the cloud using the standard S3 API, and include a copy of the object’s metadata, they can be referenced directly.

Storing objects in this openly accessible manner — with co-located rich metadata — is useful in several instances:

  1. A disaster recovery scenario where the HyperStore cluster is not available
  2. Facilitating data migration to another platform
  3. Enabling access from a separate cloud-based application, such as content distribution
  4. Providing open access to data, without reliance on a separate database to provide indexing

HyperStore provides great flexibility for leveraging hybrid cloud deployments where you get to set the policy on which data is stored in a public or private cloud. Learn more about HyperStore here.

 

YOU MAY ALSO BE INTERESTED IN

Object Storage vs. Block Storage: What’s the Difference?

How to offload your NAS and reclaim capacity, with zero disruption

Running low on NAS capacity? Over 60% of NAS data is typically cold and infrequently accessed. Common examples include old project information, engineering files, historical data, and media that rarely gets used. All of these sit there, consuming capacity and data backup resources. Cloudian and Komprise let you offload that data to on-premises Cloudian storage and immediately reclaim 60% of your Tier 1 NAS capacity.

View this on-demand webinar with Komprise and Cloudian, “How to Delay Your Next NAS Expansion,” to learn more.

 

Transparently Tier Data to Cloudian, at 70% Less Cost

Cloudian/Komprise lets you find and reclaim that costly NAS capacity without user disruption. So you can defer that next NAS purchase.

One of the key benefits of the Cloudian/Komprise solution is that users will see no change in data access. Komprise’s software transparently tiers old or dormant CIFS and NFS files from any filer or server to Cloudian. That data is stored at 70% less cost and is still immediately accessible when requested by users. There are no delays, and no access charges. To the user, nothing has changed.

 

On-Prem Control, Public Cloud Prices

Cloudian gives you on-prem storage at the cost of public cloud. With Cloudian, the storage is in your data center, under your control, at costs down to ½ cent per GB per month. And there are no cloud access charges.

Save on Backup Licenses and Capacity

Unused data costs more than just NAS capacity: you’re also paying for the backup software and data copies. These can more than double your costs. When you migrate that data to Cloudian, it’s protected with nine nines data durability without the cost of a backup license. If you need more protection, you can tier data to a public cloud (such as Amazon S3) for offsite storage. Cloudian’s built-in management tools make that transparent as well.

With Cloudian/Komprise, you save on Tier 1 NAS, save on backup, and get full data protection.

Free Storage Assessment

Contact Cloudian for your free storage assessment. We will analyze the data on your NAS and show you what data is actually being used, and what data hasn’t been touched in months. And then we will provide you a written report and analysis of your potential savings of tiering that dormant data to Cloudian.

 

Reclaim costly NAS capacity and put off that costly expansion. Contact Cloudian today to get started. It’s quick and it’s free.

View the Cloudian/Komprise solution brief for more information.

 

An Introduction to Data Tiering

All data is not equal due to factors such as frequency of access, security needs, and cost considerations, therefore data storage architectures need to provide different storage tiers to address these varying requirements. Storage tiers differ depending on disk drive types, RAID configurations or even completely different storage sub-systems, which offer different IP profiles and cost impact.

Data tiering allows the movement of data between different storage tiers, which allows an organization to ensure that the appropriate data resides on the appropriate storage technology. In modern storage architectures, this data movement is invisible to the end-user application and is typically controlled and automated by storage policies. Typical data tiers may include:

  1. Flash storage – High value, high-performance requirements, usually smaller data sets and cost is less important compare to the performance Service Level Agreement (SLA) required
  2. Traditional SAN/NAS Storage arrays – Medium value, medium performance, medium cost sensitivity
  3. Object Storage – Less frequently accessed data with larger data sets. Cost is an important consideration
  4. Public Cloud –  Long-term archival for data that is never accessed

Typically, structured data sets belonging to applications/data sources such as OLTP databases, CRM, email systems and virtual machines will be stored on data tiers 1 and 2 as above. Unstructured data is more commonly moving to tiers 3 and 4 as these are typically much larger data sets where performance is not as critical and cost becomes a more significant factor in management and purchasing decisions.

Some Shortcomings of Data Tiering to Public Cloud

Public cloud services have become an attractive data tiering solution, especially for unstructured data, but there are considerations around public cloud use:

  1. Performance – Public network access will typically be a bottleneck when reading and writing data to public cloud platforms, along with data retrieval times (based on the SLA provided by the cloud service). Especially for backup data, backup and recovery windows are still incredibly important, so for the most relevant backup sets it is worth considering to hold onsite and only archive older backup data to the cloud.
  2. Security – Certain data sets/industries have regulations stipulating that data must not be stored in the cloud. Being able to control what data is sent to the cloud is of major importance.
  3. Access patterns – Data that is re-read frequently may incur additional network bandwidth costs imposed by the public cloud service provider. Understanding your use of data is vital to control the costs associated with data downloads.
  4. Cost – As well as bandwidth costs associated with reading data, storing large quantities of data in the cloud may not make the most economical sense, especially when compared to the economics of on-premise cloud storage. Evaluations should be made.

Using Hybrid Cloud for a Balanced Data Tier Strategy

For unstructured data, a hybrid approach to data management is key with an automation engine, data classification and granular control of data necessary requirements to really deliver on this premise.

With a hybrid cloud approach, you can push any data to the public cloud while also affording you the control that comes with on-premises storage. For any data storage system, granularity of control and management is extremely important as different data sets have different management requirements with the need to apply different SLAs as appropriate to the value of the data to an organization.

Cloudian HyperStore is a solution that gives you that flexibility for easily moving between data tiers 3 and 4 listed earlier in this post. Not only do you get the control and security from your data center, you can integrate HyperStore with many different destination cloud storage platforms, including Amazon S3/Glacier, Google Cloud Platform, and any other cloud service offering S3 API connectivity.

Learn more about our solutions today.

Learn more about NAS backup here.

 

SNL and Object Storage: Archiving Media Assets

Picture all of your media assets today. How much space does it take up and how well does your current storage solution work? Now what if you had over 40 years of assets? Will the same solution work just as efficiently?

Tape storage is currently the preferred method for archiving media assets, but tape is a limited-life solution with many different ways it can be compromised. When thinking long-term, tape becomes less and less viable.

For a prime example on why we need to move away from tape storage, let’s look at Saturday Night Live. One of the longest running network programs in the US, SNL has generated 42 seasons of content consisting of 826 episodes and 2,966 cast members. In terms of data, that’s 42 years worth of archive data made up of multiple petabytes across 2 data centers.

saturday night live logo

That’s a lot of data, and for SNL, having a huge archive is useless unless they can easily access it. That’s why SNL utilized object storage to help digitize and store their 42 years of assets. Each asset can be tagged with as many metadata tags as needed, making it easy and fast to find, organize, and assemble clips from the show’s long history.

If your media assets are just sitting in cold storage, it may be time to rethink your strategy. By creating an efficient archival solution today, you can accelerate your workflows and continue to monetize those assets 40 years from now, just as SNL is doing today.

We’ll be delving further into this topic at NAB along with Matt Yonks, who is the Post Production Supervisor for Saturday Night Live. The session will take place on April 25 at 3:30pm and will include a drawing for a 4K video drone. Register early for extra chances to win!

New HyperStore 4000: Highest density storage

Rack space and budget. Most data centers are short on both. Yet somehow, you’re expected to accommodate a 50% increase in unstructured data volume annually. That’s a problem.

The new solution is the HyperStore 4000. With 700TB in just 4U of rack height, it’s nearly 2X the density of our earlier models. And it delivers storage in your data center at prices on par with the public cloud: about ½ cent per GB per month. Space savings and cost savings in one.

The HyperStore 4000, by the numbers

The HyperStore 4000 appliance was built to handle massive amounts of data. It’s housed in a 7” high 4U enclosure, with 2 nodes and a max capacity of 700TB. The drive sizes range from 4TB to 10TB and has 256GB of memory (128GB per node).

Better yet, the appliance reduces storage costs by 40% (versus other Cloudian solutions) – data management now costs half a cent per GB per month. Even with the reduced cost, there is no drop in data availability – with a three-appliance cluster, you’ll still see 99.999999% data durability.

For the full list of specs, check out our datasheet.

Save time, your scarcest commodity

With most storage systems, as your needs grow, your management headaches grow with them. But the HyperStore 4000 grows painlessly. You just add nodes. No disruption and no data migration.

We specifically designed the HyperStore 4000 appliance to help customers in industries with huge data needs such as life sciences, healthcare, and entertainment. These are the industries where data growth is exploding every year and where modern data centers feel the most burden, as they need high-density storage with peak performance for data protection, video surveillance, research, archival, and more. Now you can meet these growing needs without growing pains.

Finally, the HyperStore 4000 has a 100% native S3 API, and has the industry’s highest level of S3 API compatibility. In fact, we guarantee it to work with your S3-enabled applications.

Be sure to also take a look at Cloudian’s other solutions to see which one is right for you.

Cloudian Customer Receives Commvault Innovation Award

Cloudian Customer Receives Commvault Innovation Award for Data Protection with Object Storage

A Cloudian customer, Schuberg Philis, has been recognized by Commvault for their innovation in a data protection deployment with Cloudian object storage. Several aspects of this award-winning solution illustrate advancements that make backup a very exciting topic right now:

  • Object storage as a target: On-premises S3-compatible storage is the backup target in this solution
  • Backup as a service: 3800 clients employ this environment
  • Local and remote backup: Clients being protected are both local (within the Schuberg facility) and remote

Data protection is alive with innovation, and this illustrates why. Data center managers now have more options than ever to reduce headaches, cut costs, and increase service levels.

Object storage helps by providing a seamlessly scalable backup target that a) works with most backup solutions, including Commvault, b) delivers disk performance at costs approaching tape, and c) includes a broad range of capabilities including compression, encryption, and deduplication.

Backup as a service is now more practical than ever, thanks to the S3 protocol that enhances data delivery over network connections.

Schuberg Philis brought these innovations together to offer Data Management as a Service (DMS). This is a multi-tenant data protection solution that’s based on Commvault software and Cloudian storage. It runs within Schuberg Philis’ Mission Critical Cloud Infrastructure.

As a centralized backup and restore platform, DMS includes a wide swath of features such as object storage, SQL AlwaysOn, clustering, and encryption. These features make it easier for customers to manage data protection options without sacrificing data integrity. Commvault took notice and awarded Schuberg Philis a global Service Provider Innovation Award.

We’re very proud that we could be a part of this great solution!

To learn more about the economics of object storage, read this Object Storage Buyer’s Guide. Learn how you too can save a bundle, and beat your SLAs, all with the backup software you already have.

AWS re:Invent Attendees See Benefits of Hybrid Storage

AWS re:Invent was a fantastic show this year. The show has seen phenomenal growth, with over 32,000 attendees, up from 18,000 attendees last year.

AWS re:Invent

Many visitors were looking for solutions to let them integrate their on-premises operations with the cloud. By adopting a hybrid cloud storage approach, they would be able to capitalize on the scalability and cost of cloud storage when appropriate, while also maintaining the cost predictability and control of on-prem storage.

For these visitors, Cloudian proved to be the perfect fit. We provide 100% native Amazon S3 object storage, with automated tiering between the data center and the cloud. Our HyperStore solution is also available directly from AWS Marketplace, which means users can get all their usage and billing data within a single monthly invoice from AWS.

Steve Varner, Principal Data Engineer at Motorola Solutions, visited our booth and had this to say afterwards:

Steve Varner

Interested in learning more about Cloudian? Contact us or try it out for yourself.

 

Cloudian and Thoughts About the Future of Storage

Evan Powell

The enterprise storage industry is going through a massive transformation, and over the last several years I’ve had the good fortune of being on the front lines.  As founding CEO of Nexenta, I helped that company disrupt the storage industry by creating and leading the open storage market.  These days I’m having a blast as a senior advisor and investor at companies including Cloudian, who is taking off as a leader in what is typically called “object storage”.

In this blog I’d like to share what I’m seeing – across the IT industry – and why change is only accelerating.  The sources of this acceleration are much larger than any one technology vendor or, indeed, than the technology itself.

Let’s start at the top – the top of the stack, where developers and their users reside. From there we will dive into the details before summarizing the implications for the storage industry.

Software eats everything

What does “software eats everything” really mean?  To me it means that more than ever start-ups are successfully targeting entire industries and transforming them through technology-enabled “full stack” companies.  The canonical example is a few guys that thought about selling better software to taxi companies… and instead became Uber.

Look around and you’ll see multiple examples where software has consumed an industry. And today, Silicon Valley’s appetite is larger than it ever has been.

So why now?  Why is software eating everything?  A few reasons:

  1. Cloud and AWS – When I started Clarus back in the early 2000s, it cost us at least $7 million to get to what we now would call a minimum viable product.  These days, it costs perhaps 10% of that, largely thanks to the shift to the cloud.  Maybe more importantly, thanks to SaaS and AWS, many users now see that cloud-hosted software is often safer than on-premises software.
  2. SaaS and Cloud have enabled a profound trend: DevOps –  DevOps first emerged in technology companies that deliver software via the cloud.  Companies such as Netflix, Facebook, and GitHub achieve developer productivity that is 50-60x that of older non-DevOps approaches.  Highly automated end-to-end deployment and operations pipelines allow innovation to occur massively faster – with countless low risk changes being made and reverted as needed to meet end user needs.
  3. Pocket sized supercomputers – Let’s not forget that smartphones enable ubiquitous user interactions and also smart-sensing of the world – a trend that IoT only extends.
  4. Open source and a deep fear of lock-in – Open source now touches every piece of the technology stack. There are a variety of reasons for this including the role that open source plays as a way for developers to build new skills and relationships.  Another reason for the rise of open source is a desire to avoid lock-in.  Enterprises such as Bank of America and others are saying they simply will *not* be locked in again.
  5. Machine learning – Last but not least, we are seeing the emergence of software that teaches itself. For technology investors, this builds confidence since it implies a fundamental method of sustaining differentiation. Machine learning is turning out to the be the killer-app for big data. This has massive second-order effects that have yet to be fully considered. For example, how will the world change as weather prediction continues to improve? Or will self-driving cars finally lead to pedestrian-friendly suburban environments in the US?

Ok, so those are at least a few of the trends…let’s get more concrete now.  What does software eating everything – and scaring the heck out of corporate America wrestling with a whole new batch of competitors – mean for storage?

Macro trends drive new storage requirements

Let’s hit each trend quickly in turn.

1) Shift to AWS

By now you probably know that Cloudian is by far the most compliant Amazon S3 storage.  And this S3 compliance is not just about data path commands – it is also about the management experience such as establishing buckets.

What’s more, doubling down on this differentiation, Cloudian and Amazon recently announced a relationship whereby you can bill via Amazon for your on-premise Cloudian storage.   In both cases Cloudian is the first solution with this level of integration and partnership.

2) DevOps

If you’re an enterprise doing DevOps, you should look at Cloudian. That’s because the automation that serves as the foundation for DevOps is greatly simplified by the API consistency that Cloudian delivers.

If your developers are on the front lines of responding to new full stack competitors, you don’t want them hacking together their own storage infrastructure. To deliver on the promise of “just like Amazon S3, on premise and hybrid”, Cloudian has to make distributed system management simple. This is insanely difficult.

In a recent A16Z podcast, Marc Andreessen commented that there are only a few dozen great distributed systems architects and operators in the world today.  If you already employ a few of them, and they have time on their hands, then maybe you should just grab Ceph and attempt to roll your own version of what Cloudian delivers.  Otherwise, you should be a Cloudian user.

3) Mobility

Architectures have changed with mobility in mind. User experience is now further abstracted from the underlying infrastructure.

In the old scale-up storage world, we worried a lot about IOPS for particular read/write workloads. But when RF is your bottleneck, storage latency is less of a concern.  Instead, you need easy to use, massively scalable, geographically disperse systems like object storage, S3, and Cloudian.

4) Open source and a fear of lock-in

Enterprises want to minimize their lock-in to specific service providers. The emergence of a de-facto standard, Amazon S3, now allows providers and ISVs to compete on a level playing field. Google is one example. They now offer S3 APIs on their storage service offerings.  If your teams need to learn a new API or even a new set of GUIs to go with a new storage vendor, then you are getting gradually locked in.

5) Machine learning

Machine learning may be the killer-app for big data. In general, there is one practical problem with training machine learning: That is, how do we get the compute to the data rather than the other way around?

The data is big and hard to move. The compute is much more mobile. But even then, you typically require advanced schedulers at the compute layer – which is the focus of entire projects and companies.

The effectiveness of moving the compute to the data is improved if information about the data is widely available as metadata.  Employing metadata, however, leads to a new problem: it’s hard to store, serve, and index this metadata to make it useful at scale. It requires an architecture that is built to scale and to serve emerging use cases such as machine learning. Cloudian is literally years ahead of competitors and open source projects in this area.

For a real world example, look no further than Cloudian’s work with advertising giant Dentsu to deliver customized ads to Tokyo drivers. Here, Cloudian demonstrates the kind of breakthrough applications that can be delivered, due in part to a rich metadata layer  Read more here, and see what is possible today with machine learning and IoT.

As I’ve written elsewhere, there is a lot to consider when investing in technology. You need companies that understand and can exploit relevant trends. But even more so, you need a great team. In Cloudian you’ve got a proven group that emphasizes product quality and customer success over big booths and 5 star parties.

Nonetheless, I thought it worth putting Cloudian’s accelerating growth into the context of five major themes.  I hope you found this useful.  I’d welcome any feedback in the comments below or via Twitter.  I’m at @epowell101 and I’ll try to catch comments aimed at @CloudianStorage as well.

Embracing Hybrid Storage

It’s no surprise that Amazon Web Services (AWS) is a dominant force when it comes to the public cloud – it’s a $10B a year business, with nearly 10% of Amazon’s Q2 net sales attributed to AWS.

AWS Q2 net sales

While AWS has been touting public cloud since its inception, only recently has it started to acknowledge the need for hybrid storage solutions. Why? Because it’s simply not realistic for many companies to move all their data to the public cloud.

Private vs. Public Cloud

 

A company may choose to stay with private, on-premises storage solutions if they have existing data centers already in place. Or they may prefer the enhanced performance and extra measure of control that comes with on-premises storage.

Nonetheless, public cloud storage has significant advantages. It’s easy to implement, scales on demand, and automates many of the data management chores.

Neither option is clearly better than the other – in fact, customers are spending more than ever on both private and public cloud solutions. IDC forecasts that total IT spending on cloud infrastructure will increase by 15.5% in 2016 to reach $37.1B.The bottom line is that companies need both on-prem and cloud solutions.

The Best of Both Worlds: Hybrid Storage

 

What’s needed is a solution that allows you to enjoy that advantages of both — the speed and control of on-prem and the on-demand scalability of cloud. And ideally, you’d get both within a single, simple management model.

That’s what Cloudian HyperStore is. It’s S3 cloud storage that physically sits in your data center. And, it looks and behaves exactly like Amazon S3 cloud storage, so your apps that work with Amazon will work with Cloudian. Best of all, you can manage the combined Cloudian + Amazon S3 storage pool as a single, limitlessly scalable storage environment.

Amazon Makes It Easy

 

Fortune summed up Amazon’s need for a hybrid compute model in their recent article, stating:

It’s become clear that AWS, which is the leader in public cloud, will have to address this issue of dealing with, if not embrace, customers’ on-premises computing.

Thankfully, in the storage world they’ve already addressed this by adding Cloudian HyperStore directly to the AWS Marketplace. We announced this last month, but it bears repeating because it’s an important step in AWS’s evolution.

The advantages in moving towards hybrid storage are numerous. Everything folds up to AWS, so even usage and billing from private cloud will be centralized in the monthly AWS invoices. More importantly, Cloudian HyperStore was built from day one to be fully S3 compatible, which ensures complete investment protection.

So if you’re debating between public and private cloud options for your company, remember that you can still get the best of both worlds. Check out Cloudian HyperStore for a better hybrid storage solution with AWS and Amazon S3.

Object Storage vs. File Storage: What’s the Difference?

Object storage has only been around since the mid-90s. As the relatively new kid on the block, there can be some confusion as to how it differs from other storage types, such as block or file storage. This post is the first in a series looking at these key differences, focusing on Object Storage vs. File Storage.

What is Object Storage?

Object-based storage essentially bundles the data itself along with metadata tags and a unique identifier. The metadata is customizable, which means you can input a lot more identifying information for each piece of data. These objects are stored in a flat address space, which makes it easier to locate and retrieve your data across regions.

This flat address space also helps with scalability. By simply adding in additional nodes, you can scale to petabytes and beyond.

A Primer on File Storage

File storage has been around for considerably longer than object storage and is something most people are familiar with. You name your files/data, place them in folders, and can nest them under more folders to form a set path. In this way, files are organized into a hierarchy, with directories and sub-directories. Each file also has a limited set of metadata associated with it, such as the file name, the date it was created, and the date it was last modified.

This works very well up to a point, but as capacity grows the file model becomes burdensome for two reasons.  First, performance suffers beyond a certain capacity. The NAS system itself has limited processing power, making the processor a bottleneck. Performance also suffers with the massive database – the file lookup tables– that accompany capacity growth.

 

Object Storage vs. File Storage

 

Now that you know the basics of both object-based storage and file storage, let’s look at some of the key differences separating the two.

To start, object storage overcomes many of the limitations that file storage faces. Think of file storage as a warehouse. When you first put a box of files in there, it seems like you have plenty of space. But as your data needs grow, you’ll fill up the warehouse to capacity before you know it. Object storage, on the other hand, is like the warehouse, except with no roof. You can keep adding data infinitely – the sky’s the limit.

If you’re primarily retrieving smaller or individual files, then file storage shines with performance, especially with relatively low amounts of data. Once you start scaling, though, you may start wondering, “How am I going to find the file I need?”

In this case, you can think of object storage as valet parking while file storage is more like self-parking (yes, another analogy, but bear with me!). When you pull your car into a small lot, you know exactly where your car is. However, imagine that lot was a thousand times larger – it’d be harder to find your car, right?

Because object storage has customizable metadata and all the objects live on a flat address space, it’s similar to handing your keys over to a valet. Your car will be stored somewhere, and when you need it, the valet will get the car for you. It might take a little longer to retrieve your car, but you don’t have to worry about wandering around looking for it. All of these features and advantages also extend to object storage in the cloud.

Object Storage vs File Storage
DOWNLOAD PDF

Object Storage Metadata

For a real-life example of why metadata makes a difference, we can look at X-rays. An X-ray file would have limited metadata associated with it, such as created date, owner, location, and size. An X-ray object, on the other hand, could have a rich variety of metadata information.

The metadata could include patient name, date of birth, injury details, which area of the body was X-rayed – in addition to the same tags that the file had. This makes it incredibly useful for doctors to pull up the relevant information for reference.

If you want a more straightforward side-by-side comparison, take a look at this table that compares object-based storage vs file storage:

 

OBJECT STORAGE FILE STORAGE
PERFORMANCE Performs best for big content and high stream throughput Performs best for smaller files
GEOGRAPHY Data can be stored across multiple regions Data typically needs to be shared locally
SCALABILITY Scales infinitely to petabytes and beyond Potentially scales up to millions of files, but can’t handle more
ANALYTICS Customizable metadata, not limited to number of tags Limited number of set metadata tags

 

This was only a general overview of the differences between object storage and file storage, but it should give you a clearer idea of the advantages of each type.

Object Storage and File Storage Together

Now Cloudian offers a way to get the goodness of object-based storage for your files: Cloudian HyperFile, a scale-out file storage system that provides NAS features together with the scalability and cost of object-based storage.

For more, download the Object Storage Buyer’s Guide.

AWS CLI with S3-Compatible Storage

There’ve been a lot of discussions about Amazon’s Simple Storage Service (S3) and Amazon Web Services (AWS). It seems to me that everyone is saying that they are Amazon S3-compatible or that they work with S3 storage. That makes me wonder, what is the best way to validate a solution or test it out to see if the storage solution will meet my object storage needs? Well, why not just use Amazon’s own S3APIs and AWS Command Line Interface (CLI)?

AWS CLI is a unified tool developed to help manage AWS services. I believe this is the best way to test out any solution that says they are an S3 compatible storage such as Cloudian HyperStore. So let’s hop on to it and get started. The following shows the steps on how to install and use AWS CLI with Cloudian HyperStore on your Linux server.

Prerequisite:

You will need to install PIP to simplify your AWS CLI installation, you can copy the following python script to your Linux server and it will help you install pip and awscli. The script is provided as-is but feel free to copy, modify and improve it to your liking.

import urllib

import os

PIP=’get-pip.py’

urllib.urlretrieve (“https://bootstrap.pypa.io/get-pip.py”, PIP)

os.system(“python get-pip.py”)

os.system(“pwd”)

os.system(“pip install awscli”)

Process:

  1. Download the following dc_getpip.py to your Linux server. The script has been tested on RHEL and CentOS. The Cloudian S3 region used in this example is s3-region.addomain.local
  2. Run python dc_getpip.py. This script will download pip and install AWS CLI for you.
  3. When the AWS CLI is successfully installed, continue with configuring AWS CLI with Cloudian HyperStore.
  4. Execute aws configure and provide the Cloudian credential along with the Cloudian S3 region information. For example:
  5. cd ~/./.aws because the config and the credential files for aws is located in your user directory. In this example, this is the root user directory.AWS CLI root user directory
  6. There are 2 files in .aws directory:
    1. config
    2. credentials
  7. Update the config file with the Cloudian region information. Include [cloudian] in your update.AWS CLI Cloudian regional information
  8. Update the credentials files with the Cloudian information, include [cloudian] in your update.AWS CLI credentials file
  9. Run the following aws command to validate connectivity to your Cloudian HyperStore cluster. Using s3 ls will list the buckets of the tenant that was configured.
    1. aws –profile=cloudian –endpoint-url=http://s3-region1.addomain.local s3 lsAWS CLI validate connectivity to Cloudian HyperStore cluster
    2. Replace s3-region1.addomain.local with your Cloudian region.
    3. You can use aws –profile=cloudian –endpoint-url=http://s3-region1.addomain.local s3 cp file s3://bucket to test upload to your s3 bucket.
  10. Your AWS CLI is successfully configured with Cloudian HyperStore S3.

 

If you are curious to learn more about S3, download Cloudian HyperStore’s community edition and validate the solution for yourself.

Learn more about hybrid cloud management here.

Draw my Cloudian: What is Object Storage?

A few months ago, I was hesitant about applying to an internship at a technology company. Unlike many of my peers who view the Silicon Valley as the perfect gateway for fueling their careers and interests, I was never quite drawn to the tech scene I had grown up with.

A few months ago, I was hesitant about applying to an internship at a technology company. Unlike many of my peers who view the Silicon Valley as the perfect gateway for fueling their careers and interests, I was never quite drawn to the tech scene I had grown up with.

At the same time, a majority of my hesitation could be attributed to intimidation – I had neither a technical background nor real understanding of the sorts of professions and companies that existed in the Silicon Valley. But given the exciting opportunity to intern at Cloudian these past few months, I got the chance to not only explore the cloud computing industry, but also immerse myself in an environment I was once too scared to venture into.

Along with the support of my manager and peers at Cloudian, one of the major projects I worked on as a marketing intern was a “draw my life” style video about object storage. Although I had absolutely no clue what object storage was prior to my internship, my co-workers were always there for me to turn to for help and guidance. After all, being given the opportunity to work on a topic I was previously unacquainted with translated into an opportunity to learn everything I stumbled upon. From there, the video began to unfold – from hours upon hours of research to creating a script, incessant doodling, and many dry-erase marker stains, here’s the finished product!

Thanks to my team at Cloudian for supporting me the entire way. Having worked on this project has definitely instilled in me a new confidence to take a leap into the unknown. As I spend my last few days here, I am proud to have been able to spend some time familiarizing myself with the core of Cloudian’s product and leave a piece of something I created before I go!

– Lesley