Splunk provides big data solutions for cloud, on-premises, and hybrid environments. Splunk management capabilities include data collection, querying, indexing, and visualization. To help you prioritize data backup, Splunk architecture categorizes data according to lifecycle stages. The result is a system that includes hot, warm, cold, and frozen buckets.
To properly protect your data, there are two primary backup strategies. You can backup Splunk index data to an on-premises storage device using the Splunk data lifecycle stages, or you can use the SmartStore indexer to backup data to cloud storage such as Amazon S3, or local S3-compatible storage devices.
In this article you will learn:
- What is Splunk
- Backing up Splunk index data on-premises
- Using SmartStore indexer for backup
- Backup configuration for Splunk deployment
This is part of a series of articles about Splunk Architecture.
What is Splunk?
Splunk is a highly scalable distributed system that indexes and searches log files. It can collect huge volumes of log data, parse it, and analyze it to provide operational intelligence. Splunk’s main benefit is that it does not require external databases or data management, it uses its own indexes and distributed storage clusters and can handle any scale of log data.
Backing up Splunk Index Data On-Premises
Like any enterprise system, Splunk must be supported by a data backup plan. However, you will probably not need to backup all Splunk data, because much of it may have low value. For this reason, Splunk provides a system for transitioning data between four types of storage buckets, representing different stages in the data lifecycle.
Splunk Data Lifecycle Stages: Hot, Warm, Cold, Frozen Bucket
Splunk indexed data is located in database directories, divided into subdirectories called buckets. As time goes by, Splunk performs storage tiering, moving data through several types of buckets, which represent four tiers—hot, warm, cold and frozen.
Here is the simplified process, but note that Splunk allows you to customize almost every aspect of the data lifecycle, so your process may be different:
- When the indexer indexes data for the first time, it stores it in a “hot” bucket. Hot buckets cannot be backed up because the indexer is actively writing to them, but snapshots are allowed.
- You can define a policy for data to be moved from the hot bucket to a “warm” bucket, which the indexer does not actively write to. This is called “rolling” data to a warm bucket. The policy can be based on the hot bucket size, age, or it can be applied when splunkd is restarted. It is safe to back up the warm buckets.
- Splunk configuration defines several limits of indexing. When the system hits a limit, the oldest warm bucket becomes a cold bucket. The indexer then moves the bucket to the colddb directory. Splunk sets the default amount of warm buckets to 300; the 301st bucket is automatically switched to a cold bucket.
- At a time based on your policy, a cold bucket transitions to “frozen”. The indexer then deletes the frozen bucket, but you can choose to preserve the data by configuring the indexer to move it to a data archive. To learn more about archiving in Splunk, see the official documentation.
To summarize, your backup strategy should primarily consider warm buckets. Cold buckets may also be backed up in some circumstances, but you should never back up hot or frozen buckets.
On-Premise Splunk Backup Strategies
There are two ways to perform a backup of a Splunk bucket:
Incremental backups
Splunk recommends scheduling regular backups of any new warm buckets, using a third party incremental backup utility. If your policy specifies that hot buckets should be frequently rolled to warm buckets, include the colddb directory in your backup schedule, to ensure you don’t miss any buckets that recently rolled from warm to cold.
If there is a need to back up hot buckets, take a snapshot of their files, using a tool like Windows VSS or ZFS snapshots. You can also manually roll a hot bucket to a warm bucket and then configure the backup, but this is not recommended.
Backup all data
Splunk strongly recommends backing up all data when the indexer is updated—including hot, warm, and cold buckets. There are a number of approaches to do so, based on the size of your dataset and how much downtime is reasonable for your Splunk deployment.
- For smaller dataset, shut down the indexer and simply make a copy of your database directories before doing the update.
- For larger datasets, create snapshots of your hot buckets and back them up before you upgrade.
If you already have incremental backups of warm buckets, you only need to worry about hot buckets when you perform an indexer update.
Estimate your current Splunk Storage costs using our Splunk Storage Calculator.
Backing Up Configuration for Your Splunk Deployment
Follow these best practices to safely back up your Slunk deployment.
Backup Splunk configuration files
It is important to ensure you have regular backups of your Splunk configuration files, including saved searches, user accounts, tags, and custom sources. Splunk data buckets will not be useful without your custom configuration. Configuration files are stored in the SPLUNK_HOME/etc/ directory and its subdirectories.
Backup to a remote location
To ensure durability in case of complete site failure, copy configuration to a remote location. If this is not possible, at least backup files to a different part of your data center, to a different machine or a different physical disk, to reduce single points of failure.
Backup single points of failure
If your Splunk deployment has one indexer, one search head, or critical utility resources like a deployment server, license server or master node, ensure they are backed up. Test your restore procedure to ensure you can quickly restore the system in case of disaster.
Backup at least one search head cluster (SHC)
Periodically back up the state of your SHC, to ensure you can re-establish knowledge items in their current state in case of disaster.
Use version control
It’s extremely important to be able to save multiple versions of your configuration and other data, so you can roll back to a specific previous version if needed. There are three ways to do this:
- Scripted input that generates a diagnostic file, and cleans old copies to prevent filling up the file system on the target system.
- Scripted input that checks backups into a source control system, such as Git.
- A custom solution using source control with controlled recovery.
Using SmartStore Indexer to Back Up Splunk Data Buckets to Cloud Storage
Splunk SmartStore is an indexer capability that lets you use remote object stores to store indexed data. This includes Amazon S3, other cloud services that support the S3 API, and on-premise S3-compatible storage devices like Cloudian’s private cloud storage.
SmartStore has several advantages compared to traditional on-premises Splunk backup, as described above:
- Rapid recovery from peer failure and fast data rebalancing. Warm data only requires metadata, requiring only metadata fixups for data that is warm.
- Lower local storage requirements, since the system only needs to keep a single permanent copy of each warm bucket.
- Higher reliability, ensuring complete recovery even in case of site failure or large local failure affecting a large number of peer nodes.
- Indexer updates are easier to perform since data is continuously backed up to the cloud.
The SmartStore indexer is especially useful for Enterprise Splunk operations that go through massive amounts of data. Splunk’s SmartStore provides enterprises with more control and scaling options.
Reduce Splunk TCO by 60% and Increase Storage Scalability with Cloudian HyperStore
- Control—Growth is modular, and you can independently scale resources according to present needs.
- Security—Cloudian HyperStore supports AES-256 server-side encryption for data at rest and SSL for data in transit (HTTPS). Cloudian also protects your data with fine-grained storage policies, secure shell, integrated firewall and RBAC/IAM access controls.
- Multi-tenancy—Cloudian HyperStore is a multi-tenant storage system that isolates storage using local and remote authentication methods. Admins can leverage granular access control and audit logging to manage the operation, and even control quality of service (QoS).
You can find more information about the cooperation between Splunk and Cloudian here.