Modern businesses are constantly creating and modifying data, much of which is used only briefly, but must be retained for compliance reasons or for historical analysis. When you have data that must be kept long term, you can save costs and resources by archiving it. This article will help clarify your options so you can build an effective archive strategy.
In this article:
What Is a Storage Archive?
A storage archive is used to preserve data that is rarely if ever accessed, often for long periods of time. It is more cost-effective than regular storage solutions and is frequently used for data related to compliance or auditing, log data, historical data, or data generated by retired applications.
Types of Data Archives
There are three main types of data archives:
Governance archives are designed in response to regulatory and audit requirements and typically fall under the areas of record management, risk management, or compliance readiness. These archives contain primarily communications data, like emails or instant messages, but can also include documents, images, websites, or social media information. These archives must be easily searchable and data quickly retrievable in case of eDiscovery or audit.
Active Data Archive
Active archives are useful for data that is infrequently accessed but still needs to be available. The data they store usually isn’t read-write intensive and is often static, allowing the use of lower performance media, like tapes. Active solutions tend to be user-centric and sometimes include software meant to simplify retrieval and searching of records. Often data in active storage will be replicated in other archive systems.
Cold Data Archive
Cold data archives are useful for data that is infrequently or never accessed, such as backups or data from legacy applications, with the aim of storing this data as cheaply as possible. These archives typically have very slow data retrieval times and no integrated user access. These limitations can make them a liability in cases of eDiscovery or audit and often lead to investing additional money in the development or purchase of a UI to simplify use.
Storage Archive Media
In order to find a solution that best suits your needs, you’ll need to weigh the benefits and drawbacks of the media available and choose accordingly. Many strategies use multiple media types to accommodate user needs and data priority.
Tape is a cheap and reliable medium with a long history of use. Its offline nature makes it especially useful for protecting data from cyber threats and malware.
- Significant storage capacity at good transfer speeds
- Minimal storage requirements with a long shelf life
- Reliable error detection and correction with built-in read-after-write verification
- 2 generations backward compatibility
- Sequential access makes retrieval and searching slower
- Requires special drive or tape library to read or write data
- Prone to wear with use and sensitive to environmental conditions
Optical Media Storage
Optical disks, CDs and DVDs, are a form of write once, read many (WORM) storage. They are useful when you need highly portable storage that you don’t want to be overwritten.
- Longest shelf life
- Less vulnerable to wear and tear and no chance of mechanical failure
- Compact size makes highly portable
- Low storage capacity
- Slow read times and slow write performance
- Requires optical drive to read data and different functionality to write data
Disk storage offers good storage to cost ratio and can include features for local and remote replication, data deduplication, and faster search capacity.
- Random-access allows faster read and write
- Single point of failure protection when using RAID
- Can be paired with indexing engines for faster searching
- Expensive to purchase, maintain, store, and upgrade
- Relatively short lifespan and high failure rate
- Energy-intensive operation requires environmental controls like cooling and air filtering
Removable disk storage, such as thumb drives or external hard drives, is primarily used by individuals or small to medium-sized businesses due to its trade-off of limited capacity for portability.
- Random-access allows faster read and write
- Available as multi-disk
- Portable and allows offline storage
- Poor cost to storage volume ratio
- Requires media handling, increasing risk of damage
Cloud storage is a good option for businesses of all sizes, particularly if they operate in a decentralized fashion. This medium’s remote nature allows for easier globalization and protects from localized disasters.
- Reduced costs since don’t need to purchase, store, or maintain equipment
- Highly flexible medium with good scalability and application integration capabilities
- Data is remotely accessible with built-in encryption
- Requires network or internet access for use
- Requires specialized software for transfer and access to data
- Reliance on the provider can create lock-in
Features of a Good Archiving Solution
If an archiving solution doesn’t have certain key features, the time and effort cost of using it can outweigh any benefits.
Solutions must include efficient search capabilities. You should be able to search for data based on type (document, PDF, email, etc.), source of origin (server, application, device, etc.), author, and by the structure of the data contained within (SSNs, bank routing numbers, credit card numbers, etc.).
Audit tracking features are essential━solutions can provide audit trails including who is accessing data, when they’re accessing it, and what specifically is being accessed.
Data deduplication features are key to maintaining low archive size and thus lower cost. Deduplication ensures that only changes to data are kept, along with references to a baseline copy for unchanged data. These features can be present at either the file, block, or bit-level with bit-level granting the least redundancy.
Good solutions are flexible and prevent media or vendor lock-in. They allow multiple data platforms to be used for both data writing and retrieval, making it easier for you to change or update systems as needed. They need to be able to handle multiple data types, from application logs to archives of social networking sites.
Automation is vital to reduce the amount of time spent creating, auditing, and modifying archives. Good solutions allow you to create policies to schedule when data is archived along with its lifecycle and to manage access permissions. They should also provide logging of these processes and alerts in case of write failure.
Archiving with Cloudian
Over time, it is likely that you will accumulate data that still holds value for your business but that doesn’t need to be available instantly. Archiving this data is a good solution for ensuring that it is kept safe without taking up expensive resources. The variety of archive options available allows you to create a solution that suits your needs and if you select strategically, can simplify archiving and retrieval processes for you in the future.
You can simplify the process of archiving data with solutions like Cloudian HyperStore, which is an on-premise object storage platform available as an appliance or software. This solution is scalable and can be integrated with cloud and third-party migration services, making it flexible to your needs.
HyperStore is fully S3 API compliant and includes automatic data verification and encryption. It allows you to tag your data with custom metadata for intelligent search or analytic functions, and manage stored data with bucket-level policies, determining replication schedule and lifecycle time. You can also create policies dictating erasure coding and replication according to data type. HyperStore can help you store your data securely and efficiently while keeping it accessible to your broader storage systems.