6 Best Practices for Object Storage Deployment

IT managers face new storage challenges as companies generate growing volumes of unstructured data. Whether it’s high-res images, backup data, or IoT-generated information, this data needs to be searchable and instantly accessible to facilitate analysis and data mining.

Here are 6 tips for getting the most from your object storage project.

By Neil Stobart
VP of WW Sales Engineering, Cloudian 

(Re-post of sdx Central article)

best practices for object storage deploymentIT managers face new storage challenges as companies generate growing volumes of unstructured data. Whether it’s high-res images, backup data, or IoT-generated information, this data needs to be searchable and instantly accessible to facilitate analysis and data mining. IT professionals increasingly find that object storage addresses these challenges in a simple and cost-effective way. But, object storage is new in many data centers and presents questions about how best to manage it. Here are six best practices that will help you get the most from object storage.

Best Practice No. 1

Identify workloads that make sense for object storage.

With multi-petabyte scalability, object storage is best for data-intensive applications. Consider object storage for applications that require streaming throughput (Gb/s) rather than high transaction rates (IOPs). Examples are backup, data archiving, IoT, CCTV, voice recordings, log files, and media files. As one option, consider tiered storage infrastructures that let you transparently move data from high-performance storage to object storage.

Object storage offers compelling economies for large data sets, so you can keep more data online and available on-demand. But storage is not one-size-fits-all. Review your applications and determine where object storage makes more sense than other storage types. Traditional storage arrays or All-Flash systems will continue to make sense for high IOPs\low-latency applications and for smaller data set sizes (think Oracle, SQL databases, email servers, ESX server farms, and VDI).

Best Practice No. 2

Beware of the 1PB failure domain. 

High-density storage servers now offer capacities nearing 1PB in a single device. With such high storage density, these devices can be very attractive from a cost standpoint. But make sure you’ve thought through the implications of managing this much storage in a single device. Even if you’re protected from data loss, you might still be looking at a long rebuild time in the event of device failure. To reduce rebuild times, logically divide large servers into multiple independent nodes. Also, use erasure coding to build cluster configurations that are resilient to multiple device failures. That way, if you were to encounter a second failure during a rebuild, you’re still protected.

Best Practice No. 3

Use QoS and multi-tenancy to consolidate different workloads on a single platform. 

Cloudian object storage system, petabyte-scalable storageA key benefit of object storage is the great scalability, which lets you simplify management by consolidating users and applications onto a single system. Within that shared environment, however, the system must deliver service levels that meet each users’ needs: they each require storage capacity, security, and predictable performance. To achieve this, make sure your system is configured with isolated storage domains plus quality-of-service controls. The combination will eliminate the two main challenges of shared storage: the nosey neighbor and the noisy neighbor problems. Your users will thank you.

Best Practice No. 4

Consider integrating data management into your application to deliver workflow automation.

The most common “language” of object storage is the S3 API — Amazon Web Services (AWS) simple storage service API — which is revolutionizing how applications can control data. To see why, compare the S3 API with traditional data management protocols such as FC, iSCSI, NFS or SMB. Those protocols only support two basic commands: read and write data. By contrast, the S3 API supports over 400 different verbs that facilitate the management, reporting, and seamless integration with public cloud services. Application owners should be aware of the possibilities built into the S3 API and work with app developers and vendors to capitalize on these advanced services.

Best Practice No. 5

Leverage metadata capabilities.

Rich metadata — or data about data — is simply a user-defined tag associated with each object. But that tag’s implications are profound. Object storage has rich metadata tag features built-in, unlike network-attached storage (NAS), which has very limited metadata, or SAN, which has none. Simple as they are, these tags will have a significant impact on data management. They can be readily searched with Google-like tools, and they can be changed over time by applications that analyze your data and extract insights, such as, “What is the name of the person in this image?” By recording that finding in a tag that’s forever connected with the data, wherever that data may be stored, business information can be found and leveraged in seconds. Imagine all data sets being searchable, across all storage pools, with a single Google-like search query.

Application owners should consider their opportunities. Can your data be described in ways that make it more searchable? Could you potentially use tools — either on-prem or in the cloud — to enrich your metadata, thereby adding value to your search process? If so, consider ways to capitalize on the power of metadata.

Best Practice No. 6

Conduct a Proof-of-Concept.

Not all object storage platforms are created equal, and some careful analysis is required to make sure your needs are met. A simple way to eliminate risk is by conducting a proof-of-concept. Document your requirements and share them with your vendor. Undertake what testing is needed to validate both your needs and the vendor’s claims. In many cases, this can be completed quickly and non-disruptively using virtual machines as the test platform. The knowledge you will gain – about your needs, the product, and the vendor’s capabilities – will ensure your project’s success.

Learn more at www.cloudian.com.

This is part of a series of articles about object storage.

YOU MAY ALSO BE INTERESTED IN:

Object Storage vs Block Storage: What’s the Difference?

Object Storage vs File Storage: What’s the Difference?

Cloudian and Thoughts About the Future of Storage

The enterprise storage industry is going through a massive transformation, and over the last several years I’ve had the good fortune of being on the front lines.  As founding CEO of Nexenta, I helped that company disrupt the storage industry by creating and leading the open storage market.  These days I’m having a blast as a senior advisor and investor at companies including Cloudian, who is taking off as a leader in what is typically called “object storage”.

In this blog I’d like to share what I’m seeing – across the IT industry – and why change is only accelerating.  The sources of this acceleration are much larger than any one technology vendor or, indeed, than the technology itself.

Let’s start at the top – the top of the stack, where developers and their users reside. From there we will dive into the details before summarizing the implications for the storage industry.

Software eats everything

What does “software eats everything” really mean?  To me it means that more than ever start-ups are successfully targeting entire industries and transforming them through technology-enabled “full stack” companies.  The canonical example is a few guys that thought about selling better software to taxi companies… and instead became Uber.

Look around and you’ll see multiple examples where software has consumed an industry. And today, Silicon Valley’s appetite is larger than it ever has been.

So why now?  Why is software eating everything?  A few reasons:

  1. Cloud and AWS – When I started Clarus back in the early 2000s, it cost us at least $7 million to get to what we now would call a minimum viable product.  These days, it costs perhaps 10% of that, largely thanks to the shift to the cloud.  Maybe more importantly, thanks to SaaS and AWS, many users now see that cloud-hosted software is often safer than on-premises software.
  2. SaaS and Cloud have enabled a profound trend: DevOps –  DevOps first emerged in technology companies that deliver software via the cloud.  Companies such as Netflix, Facebook, and GitHub achieve developer productivity that is 50-60x that of older non-DevOps approaches.  Highly automated end-to-end deployment and operations pipelines allow innovation to occur massively faster – with countless low risk changes being made and reverted as needed to meet end user needs.
  3. Pocket sized supercomputers – Let’s not forget that smartphones enable ubiquitous user interactions and also smart-sensing of the world – a trend that IoT only extends.
  4. Open source and a deep fear of lock-in – Open source now touches every piece of the technology stack. There are a variety of reasons for this including the role that open source plays as a way for developers to build new skills and relationships.  Another reason for the rise of open source is a desire to avoid lock-in.  Enterprises such as Bank of America and others are saying they simply will *not* be locked in again.
  5. Machine learning – Last but not least, we are seeing the emergence of software that teaches itself. For technology investors, this builds confidence since it implies a fundamental method of sustaining differentiation. Machine learning is turning out to the be the killer-app for big data. This has massive second-order effects that have yet to be fully considered. For example, how will the world change as weather prediction continues to improve? Or will self-driving cars finally lead to pedestrian-friendly suburban environments in the US?

Ok, so those are at least a few of the trends…let’s get more concrete now.  What does software eating everything – and scaring the heck out of corporate America wrestling with a whole new batch of competitors – mean for storage?

Macro trends drive new storage requirements

Let’s hit each trend quickly in turn.

1) Shift to AWS

By now you probably know that Cloudian is by far the most compliant Amazon S3 storage.  And this S3 compliance is not just about data path commands – it is also about the management experience such as establishing buckets.

What’s more, doubling down on this differentiation, Cloudian and Amazon recently announced a relationship whereby you can bill via Amazon for your on-premise Cloudian storage.   In both cases Cloudian is the first solution with this level of integration and partnership.

2) DevOps

If you’re an enterprise doing DevOps, you should look at Cloudian. That’s because the automation that serves as the foundation for DevOps is greatly simplified by the API consistency that Cloudian delivers.

If your developers are on the front lines of responding to new full stack competitors, you don’t want them hacking together their own storage infrastructure. To deliver on the promise of “just like Amazon S3, on premise and hybrid”, Cloudian has to make distributed system management simple. This is insanely difficult.

In a recent A16Z podcast, Marc Andreessen commented that there are only a few dozen great distributed systems architects and operators in the world today.  If you already employ a few of them, and they have time on their hands, then maybe you should just grab Ceph and attempt to roll your own version of what Cloudian delivers.  Otherwise, you should be a Cloudian user.

3) Mobility

Architectures have changed with mobility in mind. User experience is now further abstracted from the underlying infrastructure.

In the old scale-up storage world, we worried a lot about IOPS for particular read/write workloads. But when RF is your bottleneck, storage latency is less of a concern.  Instead, you need easy to use, massively scalable, geographically disperse systems like object storage, S3, and Cloudian.

4) Open source and a fear of lock-in

Enterprises want to minimize their lock-in to specific service providers. The emergence of a de-facto standard, Amazon S3, now allows providers and ISVs to compete on a level playing field. Google is one example. They now offer S3 APIs on their storage service offerings.  If your teams need to learn a new API or even a new set of GUIs to go with a new storage vendor, then you are getting gradually locked in.

5) Machine learning

Machine learning may be the killer-app for big data. In general, there is one practical problem with training machine learning: That is, how do we get the compute to the data rather than the other way around?

The data is big and hard to move. The compute is much more mobile. But even then, you typically require advanced schedulers at the compute layer – which is the focus of entire projects and companies.

The effectiveness of moving the compute to the data is improved if information about the data is widely available as metadata.  Employing metadata, however, leads to a new problem: it’s hard to store, serve, and index this metadata to make it useful at scale. It requires an architecture that is built to scale and to serve emerging use cases such as machine learning. Cloudian is literally years ahead of competitors and open source projects in this area.

For a real world example, look no further than Cloudian’s work with advertising giant Dentsu to deliver customized ads to Tokyo drivers. Here, Cloudian demonstrates the kind of breakthrough applications that can be delivered, due in part to a rich metadata layer  Read more here, and see what is possible today with machine learning and IoT.

As I’ve written elsewhere, there is a lot to consider when investing in technology. You need companies that understand and can exploit relevant trends. But even more so, you need a great team. In Cloudian you’ve got a proven group that emphasizes product quality and customer success over big booths and 5 star parties.

Nonetheless, I thought it worth putting Cloudian’s accelerating growth into the context of five major themes.  I hope you found this useful.  I’d welcome any feedback in the comments below or via Twitter.  I’m at @epowell101 and I’ll try to catch comments aimed at @CloudianStorage as well.

New Use Cases for Smart Data and Deep Learning

In case you missed it, we recently announced a project with advertising giant Dentsu, QCT (Quanta Cloud Technology) Japan, and Intel Japan. Using deep learning analysis and Cloudian HyperStore’s smart data storage, we’re launching a billboard that can automatically recognize vehicles and display relevant ads.

The system has ‘seen’ 3,000-5,000 images per car so that it can distinguish all the various features of a particular car and identify the make, model, and year with an average 94% accuracy. For example, if someone is driving an older Mercedes, the billboard could advertise the latest luxury car. Or, if someone is driving a Prius, then the billboard could show eco-friendly products. It’s important to note that none of this data is stored – it is simply processed and then relayed into a relevant ad.

Cloudian and Dentsu use smart data for billboardsOur smart data system sifts through thousands of images to accurately identify vehicles

You can also turn to this piece from CNN Money to learn a bit more about the project. The first billboard will be up and running later this year in Tokyo.

Broader Potential for Innovative Technology

 

One of the reasons why this technology is possible is through the use of metadata. Typically, big data is just stored passively for future analysis. Because this data is unorganized and untagged, it requires a good amount of effort in order to discover and pull out specific information.

Object storage, on the other hand, can have metadata tags attached to them. We run the data through real-time classification and auto-recognition/discrimination, which means these metadata tags are attached on the fly. As a result, we use this ‘deep learning’ to turn big data into smart data.

How IoT and deep learning combine to make smart data

So what are the implications of this technology beyond advertising? There is potential for tremendous applications of deep learning in other fields, such as improved object recognition for self-driving cars, higher quality screening for manufacturing equipment, or even better tumor detection in MRIs.

Still skeptical? Sign up for a free trial and test out our smart data storage for yourself.