Scale-Up vs. Scale-Out Storage: What’s the Difference?

In the evolving landscape of enterprise storage, the distinction between scale-up and scale-out storage architectures remains a focal point. As organizations face exponential data growth, understanding the nuances of these architectures is crucial for efficient storage management and expansion.

Storage capacity is the primary benchmark for evaluating storage devices, closely followed by the ease of capacity expansion. The urgency of scaling is a critical concern for storage administrators, often requiring a choice between adding hardware to an existing system or architecting a more complex solution such as a new data center. The former, known as scale-up, and the latter, scale-out, are differentiated by their inherent architectural designs.

space

The Traditional Scale-Up Model

Scale-up storage has been the traditional approach. It typically involves a central pair of controllers overseeing multiple shelves of drives. Expansion is linear and limited; when space runs out, additional shelves of drives are integrated. The limitation of this model lies in the finite scalability of the storage controllers themselves.

As storage demands increase, the scale-up model encounters bottlenecks. New systems must be introduced to manage additional data, leading to increased complexity and isolated storage silos. This architecture also struggles with resource allocation inefficiency, as determining the optimal location for workloads becomes increasingly challenging.

RAID technology underpins drive failure protection in scale-up systems. However, RAID does not extend across multiple storage controllers, anchoring the drives to a specific controller and consequently cementing the scalability challenge of this architecture.

Figure 1 – Modular/Scale-up Storage Architecture

As an organization’s data volume grows, completely new systems need to be added to cope with the additional demands. Ultimately, this architecture becomes highly complex to manage. Inefficient resource allocation becomes an issue in deciding where workloads need to reside.

Figure 2 shows the potential for storage system sprawl.

Figure 2 – Modular/Scale-up Storage Silos

The Modern Scale-Out Strategy

In contrast, scale-out storage architectures, particularly those utilizing object storage, offer a dynamic alternative. Constructed with industry-standard servers, storage is linked to each node, reminiscent of Direct Attached Storage (DAS). Object storage software on each node unifies the nodes into a single cluster, creating a pooled storage resource with a unified namespace accessible to users and applications.

Protection against drive failure in a scale-out environment is not reliant on RAID but on RAIN (Redundant Array of Independent Nodes), which offers data resilience across nodes. RAIN supports several data protection methods, including replicas and erasure coding, which mirror RAID’s data safeguarding principles but are optimized for multi-node environments.

Figure 3 – Object/Scale-out Storage Architecture

Scale-Out with Cloudian HyperStore

Cloudian HyperStore exemplifies the scale-out storage solution. HyperStore utilizes object storage technology to enable seamless scalability, providing a storage platform that expands horizontally by adding nodes. Each node addition enhances storage capacity, as well as compute and networking capabilities, ensuring that performance scales with capacity.

HyperStore’s architecture allows for simple integration of new nodes, which the system then incorporates into the existing cluster. Data is intelligently distributed across the new configuration, maintaining performance and reliability without the limitations of traditional scale-up architectures.

In a multi-data center setup, Cloudian HyperStore’s geo-distributed capabilities shine. Nodes can be deployed across various geographical locations, and thanks to HyperStore’s geo-awareness, data can be strategically placed to optimize access speeds. Users access storage through a virtual address, with the system directing requests to the closest or most optimal node. This ensures fast response times and consistent data availability, irrespective of the user’s location.

HyperStore’s innovative approach not only addresses the immediate scalability challenges but also provides a future-proof solution that accommodates the ever-increasing volume and complexity of enterprise data. Its efficient use of resources, simplified management, and robust data protection mechanisms make it a compelling choice for enterprises looking to overcome the traditional hurdles of storage expansion.

In summary, the evolution from scale-up to scale-out storage, epitomized by solutions like Cloudian HyperStore, marks a significant transition in enterprise storage. Organizations can now address their data growth challenges more effectively, with architectures designed for the demands of modern data management.

For more information, watch our overview video on object storage or read the first part of our series on object storage vs. file storage.

You may also be interested in:
Using Storage Archives to Secure Data and Reduce Costs

 

Mobile Video Surveillance Solution for Montebello Bus Lines

Mobile video surveillance can do a lot to ensure safety on transit systems. After all, bus and train operators must focus on operating their vehicles, not on policing riders.

Real-time mobile video surveillance would allow one staff member to monitor multiple vehicles, which could save cost and increase safety.

The problem is this: traditional technologies record video on the vehicle for later retrieval after the vehicle returns home. The obvious problem here is the lack of a real-time view. When an incident occurs, you can only see what happened after-the-fact.

Also, when an incident occurs finding the relevant clip takes a long time. The manual process consumes expensive resources and slows a response.

A better video surveillance answer was devised by the City of Montebello. View this video to learn more.

The Challenges in Storing Video Surveillance

Montebello Bus Lines currently operates 72 buses that serve over 8 million passengers a year, and each bus houses five cameras and a recording system. All videos were only recorded locally on the buses. Transferring the data into the operations center at the end of the day took time.

Then, MBL had to manually locate clips using time codes. This made it difficult to follow up on reported incidents in a timely manner.

Another storage issue was budget. Budget limitations meant MBL couldn’t keep the video data for more than a few days. If someone filed a complaint after the video was deleted, the city of Montebello would face financial risk.

Finding the Answer in Object Storage

What MBL needed was the ability to wirelessly upload video in addition to storing the data locally. This would allow for immediate review by transit staff or law enforcement and would serve as an additional layer of backup to prevent data loss.

MBL first tried using a Network Attached Storage (NAS) system, but the problem with NAS is that the entry systems simply aren’t fast enough while the better performing systems are cost-prohibitive. Another challenge was the file structure, which did not allow graceful transfer over a wireless network. An interrupted transfer resulted in re-starting the process. Finally, NAS systems allowed limited metadata tagging, containing only the most basic information.

Backing up video surveillance with Cloudian

But this is where Cloudian steps in. With Cloudian and Transportation Security Systems (TSS) IRIS, MBL is now able to add metadata tagging on their videos. The metadata search also makes it easier to locate videos based on parameters such as time, location, vehicle, and more.

Large clips are broken into smaller pieces before being transferred concurrently, resulting in better reliability and successful use of wireless data transfers. Additionally, object storage is more cost-efficient, meaning it’s easy (and affordable) to scale up as more videos are stored.

David Tsuen, IT Manager for the City of Montebello, stated that “Cloudian and TSS together allowed us to solve a very challenging problem. We now have a path to significant cost savings for the City and a safer experience for our riders. That’s a genuine win-win.”

You can learn more about how we solved MBL’s challenges by reading our case study, or you can try Cloudian out for yourself with our free trial.

 

Data Management Partners Unite to Provide Comprehensive Object Storage

We just announced our Data Management Partners program to help our customers solve more capacity management problems in less time. The program combines technology, testing, and support to make it easy to put object storage to work. Inaugural members of this program are Rubrik, Komprise, Evolphin, and CTERA Networks.

Here’s why this program is exciting: object storage has the potential to solve many capacity management problems in the data center. It’s 2/3 less costly and infinitely scalable. In a recent survey, Gartner found that capacity management was the #1 concern of Infrastructure and Operations managers, so these are important benefits.

The question is how to get started with object storage? You can piece together solutions on your own, but that can be risky. We’ve done the homework for you and proved out these solutions.

The Solution for Unstructured Data Consolidation

These solutions solve capacity-intensive challenges where Cloudian’s scalability and cost benefits deliver huge savings. Cloudian consolidates data into one big storage pool, so you can add as many nodes as you want. With one set of users, groups, permissions, file structures, etc, storage managers see still only see one thing to manage. This cuts management workloads by 90% and makes it possible to grow with less headache and cost.

Solution areas in this program include:

  • Data protection: Rubrik and Cloudian together unify and automate backup, instant recovery, replication, global indexed search, archival, compliance, and copy data management into a single scale-out fabric across the data center and public cloud.
  • Data lifecycle management: Komprise and Cloudian tackle one of the biggest challenges in the data center industry, unstructured data lifecycle management, with solutions that offload non-critical data that is typically 70%+ of the footprint from costly Tier-1 NAS to a limitless scalable storage pool.
  • Media active archiving: Evolphin and Cloudian help media professionals address capacity-intensive formats (e.g., 4k, 8k, VR/360) with the performance to handle time-pressed workflows.
  • File sync and share: CTERA Networks and Cloudian provide enterprises with tools for collaboration in capacity-rich environments.

Reducing Risk with Proven Partners

This program is 100% proven solutions. All are deployed, with customers, in live production data centers, right now. They solve real capacity management problems and do not create new problems along the way.

Object storage is seeing rapid adoption. It costs significantly less than traditional storage and fixes the capacity problem with infinite scalability. If you’re looking into object storage, make sure you’re getting a complete solution, though. Learn more about our Data Management Partners today.

 

An Introduction to Data Tiering

All data is not equal due to factors such as frequency of access, security needs, and cost considerations, therefore data storage architectures need to provide different storage tiers to address these varying requirements. Storage tiers differ depending on disk drive types, RAID configurations or even completely different storage sub-systems, which offer different IP profiles and cost impact.

Data tiering allows the movement of data between different storage tiers, which allows an organization to ensure that the appropriate data resides on the appropriate storage technology. In modern storage architectures, this data movement is invisible to the end-user application and is typically controlled and automated by storage policies. Typical data tiers may include:

  1. Flash storage – High value, high-performance requirements, usually smaller data sets and cost is less important compare to the performance Service Level Agreement (SLA) required
  2. Traditional SAN/NAS Storage arrays – Medium value, medium performance, medium cost sensitivity
  3. Object Storage – Less frequently accessed data with larger data sets. Cost is an important consideration
  4. Public Cloud –  Long-term archival for data that is never accessed

Typically, structured data sets belonging to applications/data sources such as OLTP databases, CRM, email systems and virtual machines will be stored on data tiers 1 and 2 as above. Unstructured data is more commonly moving to tiers 3 and 4 as these are typically much larger data sets where performance is not as critical and cost becomes a more significant factor in management and purchasing decisions.

Some Shortcomings of Data Tiering to Public Cloud

Public cloud services have become an attractive data tiering solution, especially for unstructured data, but there are considerations around public cloud use:

  1. Performance – Public network access will typically be a bottleneck when reading and writing data to public cloud platforms, along with data retrieval times (based on the SLA provided by the cloud service). Especially for backup data, backup and recovery windows are still incredibly important, so for the most relevant backup sets it is worth considering to hold onsite and only archive older backup data to the cloud.
  2. Security – Certain data sets/industries have regulations stipulating that data must not be stored in the cloud. Being able to control what data is sent to the cloud is of major importance.
  3. Access patterns – Data that is re-read frequently may incur additional network bandwidth costs imposed by the public cloud service provider. Understanding your use of data is vital to control the costs associated with data downloads.
  4. Cost – As well as bandwidth costs associated with reading data, storing large quantities of data in the cloud may not make the most economical sense, especially when compared to the economics of on-premise cloud storage. Evaluations should be made.

Using Hybrid Cloud for a Balanced Data Tier Strategy

For unstructured data, a hybrid approach to data management is key with an automation engine, data classification and granular control of data necessary requirements to really deliver on this premise.

With a hybrid cloud approach, you can push any data to the public cloud while also affording you the control that comes with on-premises storage. For any data storage system, granularity of control and management is extremely important as different data sets have different management requirements with the need to apply different SLAs as appropriate to the value of the data to an organization.

Cloudian HyperStore is a solution that gives you that flexibility for easily moving between data tiers 3 and 4 listed earlier in this post. Not only do you get the control and security from your data center, you can integrate HyperStore with many different destination cloud storage platforms, including Amazon S3/Glacier, Google Cloud Platform, and any other cloud service offering S3 API connectivity.

Learn more about our solutions today.

Learn more about NAS backup here.

 

Why Internet Unie Chose Cloudian for Hybrid Cloud Storage

Internet Unie, a service provider in the Netherlands, has recently deployed an innovative hybrid cloud service, combining Cloudian object storage in their data center together with Amazon S3 storage.

The new service allows their colocation customers to employ local S3 storage in their data center, with additional capacity available in the AWS public cloud.

Why would a service provider launch a service that employs another service provider (in this case Amazon)?

The answer is simple: it fills a real business need and gives Internet Unie a competitive advantage.

By offering their customers this hybrid service, Internet Unie meets multiple possible requirements:

    • Performance: Local storage provides cloud-compatible capacity without the latency of a long network hop
    • Data governance: Locally stored data does not leave the data center
    • Capacity flexibility: Data can be tiered off to the cloud when desired, meaning capacity is always there
    • Disaster recovery: Backup information can be moved off site at any time
    • Cost: Locally stored information costs nothing to access, meaning that cloud storage invoices become far more predictable
    • Archival storage: Cloud archival services are very cost effective for rarely accessed information
    • Business simplicity: One invoice for both on prem and cloud storage, thanks to the Amazon Marketplace metered-by-use program

Internet Unie summed it up this way:

“This hybrid service opens up enormous possibilities for those using the AWS service cloud offerings and need to store certain data types in a private cloud, for reasons such as data governance policies. With Cloudian’s new offering on AWS, our customers can point their applications to either cloud storage or on-premises storage, and it’s completely transparent,” said Arvid Cauwels, Sales Director at Internet Unie. “With AWS metering now available for Cloudian storage, customers get one AWS invoice for both their public and private cloud storage usage.”

Cloudian is a natural fit due to our native support for the Amazon S3 API, which makes it easy to tier between a Cloudian storage system in the Internet Unie data center and AWS cloud storage. Additionally, Cloudian supports AWS metering, which pulls all usage and billing (for both public and private cloud) into a single monthly AWS invoice.

Hybrid cloud represents a ‘best of both worlds’ solution, giving customers extra flexibility and control while providing limitless scalability. Read our blog post to learn more about why you should consider a hybrid cloud solution.

 

Cloudian and Thoughts About the Future of Storage

The enterprise storage industry is going through a massive transformation, and over the last several years I’ve had the good fortune of being on the front lines.  As founding CEO of Nexenta, I helped that company disrupt the storage industry by creating and leading the open storage market.  These days I’m having a blast as a senior advisor and investor at companies including Cloudian, who is taking off as a leader in what is typically called “object storage”.

In this blog I’d like to share what I’m seeing – across the IT industry – and why change is only accelerating.  The sources of this acceleration are much larger than any one technology vendor or, indeed, than the technology itself.

Let’s start at the top – the top of the stack, where developers and their users reside. From there we will dive into the details before summarizing the implications for the storage industry.

Software eats everything

What does “software eats everything” really mean?  To me it means that more than ever start-ups are successfully targeting entire industries and transforming them through technology-enabled “full stack” companies.  The canonical example is a few guys that thought about selling better software to taxi companies… and instead became Uber.

Look around and you’ll see multiple examples where software has consumed an industry. And today, Silicon Valley’s appetite is larger than it ever has been.

So why now?  Why is software eating everything?  A few reasons:

  1. Cloud and AWS – When I started Clarus back in the early 2000s, it cost us at least $7 million to get to what we now would call a minimum viable product.  These days, it costs perhaps 10% of that, largely thanks to the shift to the cloud.  Maybe more importantly, thanks to SaaS and AWS, many users now see that cloud-hosted software is often safer than on-premises software.
  2. SaaS and Cloud have enabled a profound trend: DevOps –  DevOps first emerged in technology companies that deliver software via the cloud.  Companies such as Netflix, Facebook, and GitHub achieve developer productivity that is 50-60x that of older non-DevOps approaches.  Highly automated end-to-end deployment and operations pipelines allow innovation to occur massively faster – with countless low risk changes being made and reverted as needed to meet end user needs.
  3. Pocket sized supercomputers – Let’s not forget that smartphones enable ubiquitous user interactions and also smart-sensing of the world – a trend that IoT only extends.
  4. Open source and a deep fear of lock-in – Open source now touches every piece of the technology stack. There are a variety of reasons for this including the role that open source plays as a way for developers to build new skills and relationships.  Another reason for the rise of open source is a desire to avoid lock-in.  Enterprises such as Bank of America and others are saying they simply will *not* be locked in again.
  5. Machine learning – Last but not least, we are seeing the emergence of software that teaches itself. For technology investors, this builds confidence since it implies a fundamental method of sustaining differentiation. Machine learning is turning out to the be the killer-app for big data. This has massive second-order effects that have yet to be fully considered. For example, how will the world change as weather prediction continues to improve? Or will self-driving cars finally lead to pedestrian-friendly suburban environments in the US?

Ok, so those are at least a few of the trends…let’s get more concrete now.  What does software eating everything – and scaring the heck out of corporate America wrestling with a whole new batch of competitors – mean for storage?

Macro trends drive new storage requirements

Let’s hit each trend quickly in turn.

1) Shift to AWS

By now you probably know that Cloudian is by far the most compliant Amazon S3 storage.  And this S3 compliance is not just about data path commands – it is also about the management experience such as establishing buckets.

What’s more, doubling down on this differentiation, Cloudian and Amazon recently announced a relationship whereby you can bill via Amazon for your on-premise Cloudian storage.   In both cases Cloudian is the first solution with this level of integration and partnership.

2) DevOps

If you’re an enterprise doing DevOps, you should look at Cloudian. That’s because the automation that serves as the foundation for DevOps is greatly simplified by the API consistency that Cloudian delivers.

If your developers are on the front lines of responding to new full stack competitors, you don’t want them hacking together their own storage infrastructure. To deliver on the promise of “just like Amazon S3, on premise and hybrid”, Cloudian has to make distributed system management simple. This is insanely difficult.

In a recent A16Z podcast, Marc Andreessen commented that there are only a few dozen great distributed systems architects and operators in the world today.  If you already employ a few of them, and they have time on their hands, then maybe you should just grab Ceph and attempt to roll your own version of what Cloudian delivers.  Otherwise, you should be a Cloudian user.

3) Mobility

Architectures have changed with mobility in mind. User experience is now further abstracted from the underlying infrastructure.

In the old scale-up storage world, we worried a lot about IOPS for particular read/write workloads. But when RF is your bottleneck, storage latency is less of a concern.  Instead, you need easy to use, massively scalable, geographically disperse systems like object storage, S3, and Cloudian.

4) Open source and a fear of lock-in

Enterprises want to minimize their lock-in to specific service providers. The emergence of a de-facto standard, Amazon S3, now allows providers and ISVs to compete on a level playing field. Google is one example. They now offer S3 APIs on their storage service offerings.  If your teams need to learn a new API or even a new set of GUIs to go with a new storage vendor, then you are getting gradually locked in.

5) Machine learning

Machine learning may be the killer-app for big data. In general, there is one practical problem with training machine learning: That is, how do we get the compute to the data rather than the other way around?

The data is big and hard to move. The compute is much more mobile. But even then, you typically require advanced schedulers at the compute layer – which is the focus of entire projects and companies.

The effectiveness of moving the compute to the data is improved if information about the data is widely available as metadata.  Employing metadata, however, leads to a new problem: it’s hard to store, serve, and index this metadata to make it useful at scale. It requires an architecture that is built to scale and to serve emerging use cases such as machine learning. Cloudian is literally years ahead of competitors and open source projects in this area.

For a real world example, look no further than Cloudian’s work with advertising giant Dentsu to deliver customized ads to Tokyo drivers. Here, Cloudian demonstrates the kind of breakthrough applications that can be delivered, due in part to a rich metadata layer  Read more here, and see what is possible today with machine learning and IoT.

As I’ve written elsewhere, there is a lot to consider when investing in technology. You need companies that understand and can exploit relevant trends. But even more so, you need a great team. In Cloudian you’ve got a proven group that emphasizes product quality and customer success over big booths and 5 star parties.

Nonetheless, I thought it worth putting Cloudian’s accelerating growth into the context of five major themes.  I hope you found this useful.  I’d welcome any feedback in the comments below or via Twitter.  I’m at @epowell101 and I’ll try to catch comments aimed at @CloudianStorage as well.