New Use Cases for Smart Data and Deep Learning

In case you missed it, we recently announced a project with advertising giant Dentsu, QCT (Quanta Cloud Technology) Japan, and Intel Japan. Using deep learning analysis and Cloudian HyperStore’s smart data storage, we’re launching a billboard that can automatically recognize vehicles and display relevant ads.

The system has ‘seen’ 3,000-5,000 images per car so that it can distinguish all the various features of a particular car and identify the make, model, and year with an average 94% accuracy. For example, if someone is driving an older Mercedes, the billboard could advertise the latest luxury car. Or, if someone is driving a Prius, then the billboard could show eco-friendly products. It’s important to note that none of this data is stored – it is simply processed and then relayed into a relevant ad.

Cloudian and Dentsu use smart data for billboardsOur smart data system sifts through thousands of images to accurately identify vehicles

You can also turn to this piece from CNN Money to learn a bit more about the project. The first billboard will be up and running later this year in Tokyo.

Broader Potential for Innovative Technology

 

One of the reasons why this technology is possible is through the use of metadata. Typically, big data is just stored passively for future analysis. Because this data is unorganized and untagged, it requires a good amount of effort in order to discover and pull out specific information.

Object storage, on the other hand, can have metadata tags attached to them. We run the data through real-time classification and auto-recognition/discrimination, which means these metadata tags are attached on the fly. As a result, we use this ‘deep learning’ to turn big data into smart data.

How IoT and deep learning combine to make smart data

So what are the implications of this technology beyond advertising? There is potential for tremendous applications of deep learning in other fields, such as improved object recognition for self-driving cars, higher quality screening for manufacturing equipment, or even better tumor detection in MRIs.

Still skeptical? Sign up for a free trial and test out our smart data storage for yourself.

Shifting Technology Habits and the Growth of Object Storage

Technology is, for many of us, a vital and inextricable part of our lives. We rely on technology to look up information, keep in touch with friends and family, monitor our health, entertain ourselves, and much more.

space

However, technology wasn’t always so ubiquitous – it wasn’t too long ago that our wireless phones had limited features and even fewer users actually using these features. Here’s the breakdown from 2004, according to a study from the Yankee Group:

This means that just over 10 years ago, less than 50% of cell phones had internet access and less than 10% had cameras. Even with 50% of phones having internet access, only 15% of users took advantage of this feature.

pew research center

By contrast, look at this survey conducted by Pew Research in 2014:

Among the 18-29 age group, text messaging and internet are more frequently used features than phone calls, which is indicative of the tremendous shift in technology use over the past few years. This study doesn’t even cover a major feature that many users use their phones for: pictures. As younger users turn almost exclusively to smartphone cameras for their photos (and, of course, #selfies), they turn to photo-sharing sites to host and display their images.

Photos are just one type of the ever-growing deluge of unstructured data, though. For enterprises, unstructured data also includes emails, documents, videos, audio files, and more. In order for companies to cost-effectively store this data (while keeping it protected and backed up for end-users), many of them are starting to turn to object storage over traditional network-attached storage (NAS).

Some of the benefits of object storage include a lower total cost of ownership (TCO) and the ability to easily scale up as data needs grow. That by itself is not enough, though. With a solution like our very own HyperStore, in addition to the affordable price (as low as 1c per GB per month) and infinite scalability (from tens of terabytes to hundreds of petabytes), we offer easy management and access control, plus strong data protection with both erasure coding and replication settings. You can read about all of HyperStore’s features and benefits here.

Unstructured data use is only going to continue to grow. Smartphones and other data-intensive technologies will only become more prevalent, and you’ll want to be prepared to meet that growth. Learn more about Cloudian’s hardware and software solutions today.

Lenovo Solves Data Storage Needs with a New Appliance

As our lives become increasingly digital, we’ll generate more and more data. By current estimates, storage needs are doubling in size every two years. That means that by 2020, we will reach 44 zettabytes – or 44 trillion gigabytes – of data, with most of that growth as unstructured data for backups, archives, cloud storage, multimedia content, and file data. This growth in data is quickly outpacing IT budgets. It’s clear we need a new storage approach if we hope to keep up with this deluge of data.

Introducing a New Appliance by Lenovo and Cloudian

 

Lenovo, together with Cloudian, is attacking the $40B storage market with a new, innovative capacity storage appliance for low-cost, scalable storage which addresses 80% of customer’s data needs. We are proud to introduce the Lenovo DX8200C powered by Cloudian as the storage building block which can scale to this challenge and further drive datacenter efficiency and investment protection.

Lenovo DX8200C powered by CloudianThe Lenovo DX8200C powered by Cloudian is an affordable and scalable object storage solution.

Offered as part of Lenovo’s StorSelect software-defined storage program, this factory integrated appliance is built upon Lenovo’s industry-leading servers and features:

  • S3: S3 is the de facto cloud storage standard as stated by Gartner. Cloudian is the only native S3-compatible mass capacity storage solution on the market, enabling customers and partners to take advantage of the $38B AWS ecosystem
  • Affordability: Lower the total cost of ownership (TCO) to $0.1 per GB per month
  • Scalability: The flexible design allows you to start small and scale up to 112 TB of storage capacity per node
  • Security: Utilize always-on erasure coding and replication to ensure your data is protected
  • Simplicity: Single SKU for full appliance and support

The Lenovo DX8200C powered by Cloudian delivers a fully-integrated and ready-to-deploy capacity storage system, reducing risks and variables in the datacenter. Global support is provided by Lenovo’s top-rated support team.

Additionally, what sets this appliance apart from others is the use of Cloudian’s HyperStore storage platform, bringing with it a full host of key features, including:

 

In a news announcement today, David Lincoln, GM of the Data Center Group at Lenovo, stated that “the Cloudian HyperStore solution enables us to deliver leading innovative, software-defined storage capabilities to enterprises and service providers worldwide.”

Michael Tso, CEO and co-founder of Cloudian, reiterated this point by stating that “enterprises and value-added resellers (VARs) can maximize their business investment and revenue opportunities with this fully turnkey, channel-ready, 100 percent S3 object storage solution.”

With more and more industries requiring massive amounts of data to be stored, this partnership with Lenovo represents a vital next step – one where pre-loaded appliances make it easy for companies to both integrate with existing infrastructure and scale out for large deployments.

The Lenovo DX8200C powered by Cloudian will be available worldwide in the third quarter of 2016 but Lenovo and Cloudian are working closely together to address all customer needs in the meantime.

Start Small and Grow with Unlimited Scale

It seems that much of the current conversation around data revolves around how much of it there is and how much there will be in the coming years. While this macro level perspective is important and should help inform how data is stored, it’s also important to focus in on the micro level use cases.

space
Many companies tout that they can start big and go bigger. The issue with this approach is that it ignores a large swath of customer needs. What if you don’t need hundreds of TBs of storage immediately? What if you want to start small, but anticipate growth down the line?

Cloudian HyperStore 6.0

Scale as you grow with Cloudian HyperStore

 

Cloudian offers the flexibility to start small without sacrificing any of the robust features in our HyperStore operating environment. We offer both software and hardware solutions so you can start with as little as tens of TB of storage and scale up to hundreds of PBs.

Cloudian HyperStore can be deployed on off-the-shelf commodity hardware for 1c per GB per month, making it both easy and affordable to scale out as your data grows. As you add more data, HyperStore will automatically divert from highly used disks to less used disks to avoid imbalance. Of course, as you scale, security and data resiliency become more and more vital, which is why this smart disk balancing is only one part of the wider array of protection features in HyperStore.

Big protection for all your data

 

No matter how much data you’re storing, we’ve built in some of the most robust security features possible to protect your data. On a read request on your data, all replicas are checked and missing or out-of-date replicas are automatically updated or replaced. As a result, you don’t have to worry about restoring to outdated data.

The Cloudian Management Console lets you monitor your system’s health and get alerts when things are off. Be proactive by utilizing replication or erasure coding (or both!) to properly protect your data. Plus, spread your data out among geographically independent data centers as an added contingency against data loss. If you need to conduct a more granular check-up on your system, we’ve implemented an “object GPS” so you can quickly and easily locate any specific object within a given bucket.

As your organization grows, your access needs will change as well. HyperStore gives you multi-tenancy controls so that you can give role-based access to administrators and users.

From the very beginning, we believed strongly in providing customers with all the tools they needed to create the storage platform that works for them. In addition to the HyperStore software, we also have turnkey appliances that enable small deployments with the potential to scale up to many PBs.

Cloudian HyperStore Appliance 1500 The Cloudian HyperStore 1500 Appliance offers hot-swappable hardware, automated data tiering, and unlimited scale.

If you’d like to try Cloudian HyperStore for yourself, sign up for a free trial today.

Simplifying Enterprise Data Protection with Rubrik

Data center sprawl can be a real pain to manage, and it’s only made worse when dealing with legacy architecture. As data needs continue to grow exponentially, having an efficient and cost-effective data protection solution in place is more necessary than ever. If you’re still dealing with outdated hardware and software, then it will only become increasingly complex (and frustrating) to deal with data migration, backup, and recovery.

space

Working with Rubrik for Better Data Protection

 

To help address these pain points, Cloudian has partnered with Rubrik to bring simple, seamless, and secure backup and long-term data retention solutions to enterprises.

Rubrik acts as a sort of ‘time machine’ for VMs. Backup software, catalog management, replication, and deduplicated storage are all brought into a single appliance. As a result, you get incredible ease of use and infinite scalability. Rubrik is able to deliver near-zero recovery times without the need for rehydration.

How Rubrik and Cloudian work together

How Rubrik and Cloudian HyperStore work hand-in-hand to provide efficient and affordable data backup and protection.

How Cloudian Fits In

 

Rubrik serves as a smart on-ramp to Cloudian HyperStore – no additional software installations or plugins necessary to connect the two. This, in turn, lets IT automate backup, replication, and archival via a policy-based engine. By sending the deduplicated data to HyperStore, IT can save money on data transfer and storage costs.

Furthermore, Cloudian HyperStore is fully software-defined with no affinity to hardware. It is a scale-out, 100% native S3 object storage platform designed for large but flexible storage solutions – ideal for storing unstructured data and content. HyperStore is robust and durable, but also flexible, and we kept ease of management and usability in mind from the start. Additionally, HyperStore provides seamless tiering and replication to public cloud providers such as AWS, Glacier, and other S3-compatible endpoints, including tape.

You can read more about Cloudian HyperStore’s features and benefits here.

Having ironclad data protection in place is vital. Read more about how Rubrik and Cloudian can help in our joint solution brief.

S3 API & Extensions for Enterprise Object Storage

Amazon’s S3 API is the de-facto standard for object storage APIs. Having multiple service providers, software providers, and applications standardize on S3 has made it easier to interchange between them and rapidly stand up new uses for object storage. But there are different grades of S3 compatibility. Some software and solutions provide only the basic CRUD (create, remove, update, delete) functions. At the other end is Cloudian’s Hyperstore, committed to providing the highest fidelity S3 compatibility backed by a guarantee.

The S3 API is an HTTP/S REST API where all operations are via HTTP PUT, POST, GET, DELETE, and HEAD requests. Each object is stored in a bucket. Beyond the basic object CRUD operations provided by S3, there are many advanced APIs like versioning, multi-part upload, access control list, and location constraint. There are multiple options for encryption including (1) server-side encryption where the server manages encyrption keys, (2) server-side encyption with customer keys, and (3) client-side encryption where the data is encrypted/decrypted at the client side. Though no single S3 user is likely to use all of the advanced APIs, the union of APIs used by different users quickly covers them all. The table below highlights some advanced object storage APIs supported by S3:

S3 Feature Azure Google Cloud OpenStack Swift
Object versioning No Yes Yes
Object ACL No Yes No
Bucket Lifecycle Expiry No Yes Yes
Multi-object delete No Yes Yes
Server-side encryption No Yes Yes
Server-side encryption with customer keys No No No
Cross-region replication Yes No Yes
Website No No No
Bucket logging No No No
POST object No No No

Table 1 – Comparison of some S3 advanced object storage APIs[1]

S3 API compatibility is a prerequisite, but not sufficient to provide object storage for enterprises. There are 4 additional areas that Cloudian has added to make S3 object storage enterprise-ready.

 

  1. Software or Appliance, not a service.The software-only package includes a Puppet-based installer with a wizard-style interface. It runs on commodity software (CentOS/RedHat) and commodity hardware. The appliances come in a few fixed models ranging from 1U (24TB) to the FL3000 series of PB-scale in 8U form.
  1. APIs for all functions
    • Configuration
    • Multi-Tenancy: User/Tenant provisioning
    • Quality of Service (QoS)
    • Reporting
    • S3 Extensions: Compression, Metadata APIs, Per-bucket Protection Policies.

    Highlighting the per-bucket protection policies feature, each bucket can have its own protection policy. For example, a“UK3US2” policy can be defined as UK DC with 3 replicas and US DC with 2 replicas. Another example is a “ECk6m2” policy as DC1 with Erasure Coding with 6 data and 2 coding fragments. As buckets are created they can be assigned a policy.

Bucket
Figure 1 – Per-bucket protection policies example

  1. O&M tools to install, monitor, and manage.In addition to the installer, a single pane web-based Cloudian Management Console (CMC) does system administration from the perspective of the system operator, a tenant/group administrator, and a regular user. It’s used to provision groups and users, view reports, manage the cluster, and monitor the cluster.

Cloudian Management Console

Figure 2 – CMC dashboard

  1. Integration with Other Products
    • NFS/CIFS file interface
    • OpenStack, CloudPlatform
    • Tiering to any S3 system (public or private).
    • Active Directory, LDAP

The opportunity and use case for enterprises and object storage has never been more compelling. Amazon S3 API compatibility ensures full portability of already working applications. Using Cloudian’s HyperStore platform instead of AWS, enterprise data can be brought on-premise for better data security and manageability at lower cost. For STaaS providers, S3 API compatibility, backed by a full guarantee, provides the same benefits of a fully controlled storage platform, and opens up a large range of compatible applications. Beyond the S3 API, Cloudian is committed to providing all operations by API and has added APIs to make the platform enterprise-ready, including multi-tenancy.

If you would like a technical overview, you can check out this webinar I recently presented, “S3 Technical Deep Dive” and make sure to check out more information on our S3 Guarantee…we’ll run all your S3 Apps anytime and anywhere – Guaranteed!

– Gary


[1] References:
http://docs.openstack.org/developer/swift/#object-storage-v1-rest-api-documentation
https://cloud.google.com/storage/docs/xml-api-overview
https://msdn.microsoft.com/en-us/library/azure/dd135733.aspx

Cloudian HyperStore Integration with Symantec NetBackup

Starting with Symantec NetBackup 7.7, administrators will find an exciting new feature for cloud storage backup: Cloudian HyperStore®. The NetBackup Cloud Storage Connector enables the NetBackup software to back up data to and from Cloudian HyperStore straight out of the box without additional software installations or plugins. HyperStore is an option in the “Cloud Storage Server Configuration Wizard”. Users can simply add their S3 account information such as endpoint, access key, and secret key to begin the process of backing up their data to Cloudian HyperStore storage.

cloudian hyperstore 4000

Cloudian HyperStore and Symantec NetBackup together deliver the following benefits:

  • Enterprise-level backup
  • Complete integrated data center solution: computing, networking, and storage
  • Reduced total cost of ownership (TCO) that continues to improve as the solution scales out
  • Operational efficiency
  • Agility and scalability with the scale-out architectures of Cloudian HyperStore
  • Complete Amazon Simple Storage Service (S3) API–compatible geographically federated object storage platform
  • Enterprise-class features: multi-tenancy, quality of service (QoS), and dynamic data placement in a completely software-defined package
  • Policy-based tiering between on-premises hybrid cloud storage platform and any S3 API–compliant private or public cloud
  • Investment protection: mix and match different generations and densities of computing platforms to build your storage environment; more than 400 application vendors support S3

The seamless integration allows IT Departments to manage cloud storage for backup and recovery as easily as on-premise storage, but with lower costs. Finally, this integrated solution helps deliver an automated and policy-based backup and recovery solution. Organizations can also leverage the cloud as a new storage tier or as a secondary off-site location for disaster recovery.

For more information, please see the Symantec NetBackup and Cloudian HyperStore Solution Brief.

 

Next Generation Storage: integration, scale & performance

Guest Blog Post by Colm Keegan from Storage Switzerland

Various industry sources estimate that data is doubling approximately every two years and the largest subset of that growth is coming from unstructured data. User files, images, rich multimedia, machine sensor data and anything that lives outside of a database application can be referred to collectively as unstructured data.

Storage Scaling Dilemma

3d-man-growing-data-centerThe challenge is that traditional storage systems, which rely on “scale-up” architectures (populating disk drives behind a dual controller system) to increase storage capacity, typically don’t scale well to meet the multi PB data growth which is now occurring within most enterprise data centers. On the other hand, while some “scale-out” NAS systems can scale to support multiple PB’s of storage within a single filesystem, they are often not a viable option since adding storage capacity to these systems often requires adding CPU and memory resources at the same time – resulting in a high total cost of ownership.

Commoditized Storage Scaling

Businesses need a way to cost effectively store and protect their unstructured data repositories utilizing commodity, off the shelf storage resources and/or low cost cloud storage capacity. In addition, these repositories need to be capable of scaling massively to support multiple PB’s of data and enable businesses to seamlessly share this information across wide geographical locations. But in addition to storage scale and economy, these resources should also be easy to integrate with existing business applications. And ideally, they should be performance optimized for unstructured data files.

Software Driven Capacity

Software defined storage (SDS) technologies are storage hardware agnostic solutions which allow businesses to use any form of storage to build-out a low cost storage infrastructure. Internal server disk, conventional hard disk drives inside a commodity disk array or even a generic disk enclosure populated with high density disk can be used. Likewise, with some SDS offerings, disk resources in the data center can be pooled with storage in secondary data center facilities located anywhere in the world and be combined with cloud storage to give businesses a virtually unlimited pool of low-cost storage capacity.

Plug-and-Play Integration

From an integration perspective, some of these solutions provide seamless integration between existing business applications and cloud storage by providing native support for NFS and CIFS protocols. So instead of going through the inconvenience and expense of re-coding applications with cloud storage compatible API’s like REST, SWIFT or Amazon’s S3 protocol, these technologies essentially make a private or hybrid cloud data center object storage deployment a plug-and-play implementation but still provide the option to go “native” in the future.

Tapping Into Cloud Apps

But storage integration isn’t just limited to on premise applications, it also applies to cloud based applications as well. Today there is a large ecosystem of Amazon S3 compatible applications that businesses may want to leverage. Examples include backup and recovery, archiving, file sync and share, etc. Gaining access to these software offerings by utilizing an S3 compatible object storage framework, gives businesses even more use cases and value for leveraging low-cost hybrid cloud storage.

Data Anywhere Access

Now businesses can provision object storage resources on-premises and/or out across public cloud infrastructure to give their end-users ubiquitous access to data regardless of their physical location. This enables greater data mobility and can serve to enhance collaborative activities amongst end-users working across all corners of the globe. Furthermore, by replicating data across geographically dispersed object storage systems, businesses can automatically backup data residing in remote offices/branch offices to further enhance data resiliency.

With data intensive applications like big data analytic systems and data mining applications clamoring for high speed access to information, object storage repositories need to be capable of providing good performance as well. Ideally, the storage solution should be tuned to read, write and store large objects very efficiently while still providing high performance.

 Stem The Data Tide

Businesses today need a seamless way to grow out low-cost abundant, hybrid cloud storage resources across the enterprise to meet the unstructured data tsunami that is flooding their data center environments. In addition to providing virtually unlimited scaling from a storage capacity perspective, these resources need to easily integrate into existing application environments and provide optimal performance access to large unstructured data objects. Cloudian’s HyperStore solution provides all of these capabilities through a software defined storage approach which gives businesses the flexibility to choose amongst existing commodity disk assets in the data center and/or low cost object storage in the cloud, to help stem the unstructured data tide.

 

About Author

Colm Keegan is a 23 year IT veteran, Colm’s focus is in the enterprise storage, backup and disaster recovery solutions space at Storage Switzerland.

What is Hybrid Cloud Storage and What Can it Do For You?

IT departments today are forced to come up with innovative ways to deal with the amount of unstructured data that they must manage. Now many of these places are looking to merge the flexibility and scale of the cloud with the security and control of their on-premises IT environment by using a hybrid cloud storage solution. However, until recently, the hybrid cloud storage solution was a mere dream.

CloudianHybridCloud

With the release of the Hyperstore 4.0, Cloudian has transcended this split data system and created a cost-effective data storage solution that begins in the on-premises IT environment, but also integrates with the Amazon cloud infrastructure.

Basic Benefits of Using Hybrid Cloud Storage

Moving towards a hybrid cloud storage system such as the newly developed Cloudian Hyperstore, companies are now able to reap a number of previously unattainable benefits including:

–          Reducing storage acquisition costs

–          Reducing cost of managing storage environment

–          Enjoying Amazon S3 compatible and complementary applications

–          Easily expanding from small (Terabytes) to Petabytes as unstructured data grows

–          Balancing SLAs (service level agreements) using S3 bucket lifecycle policies

 

Cloudian Hyperstore Infrastructure

 

Cloudian Hybrid Cloud

The Cloudian Hyperstore is a hybrid cloud storage software solution for service providers and enterprises. It is fully Amazon S3 compliant and can be seamlessly integrated. It begins with an on-premises data solution and integrates with a cloud data solution for the best of both worlds, particularly for those with large growing amounts of unstructured data.

Hybrid Cloud Storage with Cloudian Hyperstore and Amazon S3

 

Simone Morellato Director of Technical & Solutions Marketing

Enterprise Storage – Stop the Madness

Guest Blog Post by John Bennett

Recently I was visiting my favorite co-location data center in Tokyo when I saw two young technologists attempting to push a heavily laden cart of brand new gear, still in neonatal ESD bags, over a door jam. Their ID badges revealed them as employees of a well-known global investment bank. In a thinly veiled maneuver to satisfy my curiosity, I offered to help. After a few tedious moments we had surmounted the obstacle. Panting a bit, the two young men thanked me profusely for lending an extra back to their burden. It was then that I realized what I had been lifting.  A brand new disk array with Fibre Channel storage processors.

Fibre Channel… in 2014.

Well, I thought, perhaps they were adding storage to a mainframe, or it was an upgrade to an existing solution. My curiosity piqued, I asked.

No, they said. It was storage for a new component of a customer facing web application.

The exchange bothered me for the rest of the afternoon. When I arrived at the office the next day, I penned some rough specifications, put in a request for a budgetary quotation and scribbled out a high-level WBS and rough order of magnitude estimate for a project to deliver 100TB of replicated, geographically diverse disk using a similar technology to what I had seen the day before.

A couple of days later the numbers came back from the storage vendor. When I put it all together, what I discovered was shocking. The effective all-in 5 year cost of ownership for the disk array I had pushed over a 1 cm piece of aluminum the day before was somewhere around $2.3 million USD. This includes the cost of the array, networking, installation and project labor, cabling, rack space, power and maintenance.

Most of us have had to help a business executive through technology sticker shock before. I’m sure this project had been no exception. These conversations typically contain catchphrases like “investing in scalability” and “enterprise-grade availability and fault tolerance” and typically last as long as it takes for the person holding the purse strings to glaze over and open their wallet. But at this point we’ve been preaching about the cost savings of virtualization and private clouds for well over a decade. How many of us are still spending megabucks on old legacy or, even worse, new legacy disk arrays and SAN fabric switches? When will our adherence to now ancient technologies become an existential risk to the enterprise technologist as a species? A reasonable argument could be made that we’ve all been made obsolete and we just don’t know it yet.

We have to stop the storage madness before it’s too late.

The narrative of the counter argument goes something like this: Infrastructure, properly run, is a utility service. As such it is largely defined by the requirements of the layers of the technology stack that depend on it. Infrastructure technologists can only independently change their part of the investment insofar as that change doesn’t impact the layers above it. Put another way, no matter how much shiny new whiz-bang object store capacity I can offer in my data center, it does absolutely no good if the business I support runs on applications that were written when monolithic RDBMSs dominated the earth. In this context, it’s understandable why some might think enterprise storage a lost cause. I’d like to argue that enterprise storage presents a ripe opportunity to add value.

The fact of the matter is that now is the time to be vocal about next-generation infrastructure technologies of all stripes. “Big Data” is no longer just a buzz word. It’s a reality, and often an unwelcome one for established firms. Pressure from without as more agile cloud-native ventures close in on the market share of more mature firms is converging with pressure from within to add features and capacity to legacy BI and CRM systems.  Legacy platforms the world over are straining under impossible loads and technology departments are straining under demands to meet bottom line targets that simply can’t be met with technology architectures from 1988.

As it was when mainframes gave way to midrange UNIX and when the Internet changed everything forever, the big winners were the ones who could optimize technological transformations for their stakeholders. For enterprise storage a similar change is happening right now. The past has shown us that leading a revolution is much preferable to being swept away by it.

About Author

John is a technologist with 20 years of experience on the front lines of technology infrastructure and operations. John’s focus is the application of scientific,  data-driven quality management techniques to high-risk technology operations. He is currently located in Tokyo. 

The Right S3-Compatible Storage System: Choose Wisely

Applications are driving the escalating need for cloud storage and the public cloud providers saw it coming years ago. Amazon’s foresight dates back to early 2006 when they launched the Amazon Simple Storage Service (S3), a massively scalable and cost effective cloud storage solution developed specifically to house the massive influx of data being created by organizations worldwide. We have since witnessed several milestones for the public cloud giant including the number of objects stored in S3 grow to over 2 trillion, 40 price drops on the service, and the development of a burgeoning ecosystem of over 350+ compatible applications. It’s clear that Amazon has established itself as the dominant leader in public cloud storage. Now hybrid cloud storage is getting its turn in the limelight. For the enterprise, flexibility, control, and resilience are at the forefront of their concerns and hybrid cloud is rapidly shaping up to be the solution of choice for their storage needs. In fact, a recent Rackspace survey indicated that over 60% of the enterprise IT departments surveyed planned to deploy hybrid cloud over the next 3 years.

Amazon S3 holds twice the market share of all its closest competitors combined, so it’s likely that the storage platform of choice for on-premise hybrid or private cloud deployments largely depends on its compatibility with S3. With no standards enforced for claiming S3 compatibility, choosing the right storage platform can be tenuous.

So what does it mean to be S3 compatible? And why does it matter?

Why it matters…

If you have applications that speak to S3 and you are looking to deploy a hybrid cloud storage solution that allows for applications to simply switch between storage targets, then compatibility matters. If you utilize any of the 350+ applications that speak S3 and you want them to operate seamlessly in an on-premise or hybrid cloud environment, compatibility matters. If you have written or plan to write applications that utilize S3 and want them to continue to run without having to rewrite their APIs, then compatibility matters. If ease of migration, cost efficiency, and TCO are important to you, then compatibility matters.

Compatibility with the S3 API can mean the difference between a good decision and costly and time intensive mistake. Understanding the S3 API and the varying levels of compatibility storage platforms offer can seriously impact the outcome for those of you that plan to stand up hybrid or on-premise clouds.

The S3 API

To make S3 simple for applications to access, Amazon built and continues to refine the operations available to applications through the S3 API. The API enables operations to be performed on the S3 Service, S3 Buckets, and S3 Objects. There are 51 total operations available through it. Compatibility is based on a storage platform’s ability to perform some, many, or all of the 51 operations available through S3 API.

S3 Compatibility (1)

Simple Compatibility

There are 9 simple operations available through the S3 API. The “Simple” subset of operations should act as the barebones set of operations that a storage platform must perform in order to claim compatibility. These 9 operations allow for very basic manipulation of data through the API, though it is important to remember that 42 additional operations still remain that are not being executed by a storage platform falling under this category, even though it may boast “S3 compatibility”. The chart below shows the 9 “Simple” operations:

Screen Shot 2014-04-24 at 2.57.08 PM

An example of a storage platform that can claim simple compatibility is SwiftStack. Using middleware called Swift3, SwiftStack can perform the above operations through the S3 API.

Moderate Compatibility

If an application requires just one more operation to be performed through the S3 API other than the 9 “Simple” operations listed above AND you don’t want to have to rewrite your application APIs, then you need to look to storage platforms that carry a bit more robust compatibility. The chart below shows the 9 simple operations (in red) with the addition of 18 moderately complex operations (in yellow). In order to be considered moderately compatible, a storage platform should be able to perform the majority of the 27 operations listed below:

Screen Shot 2014-04-24 at 2.57.21 PM

 

One such storage platform is the open-source unified storage platform, Ceph. Based on the above chart, it can boast “Moderate” S3 compatibility, though robust as it is, even it cannot perform 100% of the operations listed above.

Advanced Compatibility

For those of you that desire peace of mind, knowing that the applications you have written that speak S3 and/or all 350+ S3 compatible applications will continue to work seamlessly with your hybrid or on-premise cloud, then choosing a storage platform that boasts advanced compatibility with the S3 API is vital. Of the 51 operations available through the S3 API, 24 of them are considered advanced. Below is the total set of operations available through the API including simple (red), moderate (yellow), and advanced (green):

Cloudian is the only storage platform that can boast full “Advanced Compatibility”. Additionally, Cloudian is the only storage platform in the “Advanced Compatibility” tier that allows developers continued use of Amazon’s S3 SDK, which significantly eases their workload. Finally, Cloudian is the only storage platform that automatically tiers data between on-premise deployments and Amazon’s S3 public cloud while representing it under a single name space. With this set of advanced functionality, Cloudian stands by its claim as a bug-for-bug match with S3 for on-premise and hybrid cloud deployments.

Too Long; Didn’t Read…

When looking at deploying an open-hybrid cloud and/or moving data between S3 and your private cloud, it is of utmost importance that you understand the level of compatibility your storage platform claims versus its compatibility in reality. S3 is not going anywhere and if you’re reading this, there is a good chance you currently use or plan to use it, either exclusively or in a hybrid cloud environment. Choosing the right storage platform for your hybrid or private cloud can save you tons of money and shave months off of your time to deploy. Compatibility matters.

So, choose wisely.

Steven Walchek, Business Development