The Right Way to Splunk with Cloudian

Amit Rawlani, Director of Solutions & Alliances, Cloudian

In late 2018, Splunk introduced a feature called SmartStore, which offers enhanced storage management functionality. SmartStore allows you to move warm data buckets to S3-compatible object stores such as Cloudian HyperStore, the industry’s most S3-compatible on-prem object storage platform. Moving the data from expensive indexer storage achieves many benefits, including:

  • Decoupling of storage and compute layers for independent scaling of those resources to best serve workload demands
  • Elastic scaling of compute on-demand for search and indexing workloads
  • Growing storage independently to accommodate retention requirements
  • Cost savings with more flexible storage options

However, migrating existing Splunk indexes to SmartStore is a one-time occurrence. It is also an intensive activity with no undo button. Based on extensive internal testing and validation efforts, the Cloudian team has developed best practices for ensuring smooth deployment of Splunk SmartStore with Cloudian HyperStore. These range from provisioning all the way to safeguards to consider during the migration process itself. Listed below is a summary of the best practices guidelines.

Provisioning

For correct operations, the recommended deployment is HyperStore nodes with a network interface larger than the network interfaces on the Splunk Indexers.

splunk indexer nic size

Tuning Splunk SmartStore

Concurrent upload setting should be 8 threads (default value)

Concurrent download setting should be 18 threads (max tested is 24 threads)

For high ingest rates bucket size should be set to 10GB (setting = Auto_high)

Tuning Cloudian HyperStore

Consistency setting should be kept as Quorum. Anything else will have a performance impact

Link MTU should be set to 1500 to ensure optimal performance

Storage Sizing

Storage sizing guidelines should be followed. These will vary based on the compression ratios, metadata size and indexer volume sizing. Below is one specific guideline.

table

Cache Sizing

Undoubtedly the most important decision for appropriate SmartStore operations. Following parameters need to be considered while sizing SmartStore cache.

  • Daily Ingestion Rate (I)
  • Search time span for majority of your searches
  • Cache Retention (C) = 1 day / 10 days/ 30 days or more
  • Available disk space (D) on your indexers (assuming homogeneous disk space)
  • Replication Factor (R) =2
  • Min required cache size: [I*R + (C-1)*I]
  • Min required indexers = Min required cache size / D
  • Also factor in ingestion throughput requirements (~300GB/day/indexer) to determine the number of indexers

Based on the above parameters, the table below gives a framework on how to get the recommended SmartStore cache sizing.

1TB/Day
7-Day Cache
1TB/Day
10-Day Cache
1TB/Day
30-Day Cache
10TB/Day
10-Day Cache
10TB/Day
30-Day Cache
Ingest/Day (GB) 1,000 1,000 1,000 10,000 10,000
Storage/Indexer (GB) 2,000 2,000 2,000 2,000 2,000
Cache Retention 7 10 30 10 30
Replication Factor 2 2 2 2 2
Min Required Cache (GB) 8,000 11,000 31,000 110,000 310,000
Min Required #Indexers 4 6 16 55 155

Migration to SmartStore

As mentioned, the migration to SmartStore is a one-way street. For existing indexes following steps are recommended for a smooth migration

  1. Setup SmartStore target S3 bucket on HyperStore.
  2. Upload a file to the S3 bucket via CMC or S3 client.
  3. Set up the Volumes on Splunk Indexers without setting RemotePath for Indexes.
  4. Push the changes with the new Splunk Volume to the Splunk Index cluster.
  5. Use the Splunk RFS command to validate each Indexer is able to connect to the volume.

/opt/splunk/bin/splunk cmd splunkd rfs — ls –starts-with volume:NewVolume

  1. Once all Indexers report back connectivity Migrations can begin.
  2. Set RemotePath for one Index at a time. This allows for the ability to manage Index migrations more easily. This also limits the work of each indexer as it tries to move all Warm buckets for every index at once.

For more information on how to deploy Splunk SmartStore with Cloudian HyperStore, see the detailed deployment guide here.

Splunk SmartStore Enables Scalable, Cost-Effective Data Lake

splunk and cloudian

Splunk SmartStore and Cloudian together address two of the biggest challenges in Big Data: Cost and scalability. Today’s larger data sets and longer retention requirements drive up the cost of storage, making it harder to accommodate the needed growth within the available budget, data center footprint, and staffing.

This is part of a series of articles about Splunk Architecture.

The Object Storage Solution

Splunk SmartStore solves these problems by providing a way for Splunk to leverage CloudianHyperStore S3-compatible storage as a highly scalable object store for index data.  Splunk’s indexer storage and compute resources in a cost-effective manner by scaling those resources separately.

Scalable Storage for Splunk Machine Data

With Splunk SmartStore, Splunk Indexers retain data only in hot buckets that contain newly indexed data. Older data resides in the warm buckets and is stored within the scalable and highly cost-effective Cloudian cluster. SmartStore manages where data is stored, moving data among indexers and the external Cloudian storage based on data age, priority and users’ access patterns.

Unlike conventional storage, Cloudian offers modular growth, letting you expand from terabytes to an exabyte without disruption. Embedded data redundancy features provide up to 14 nines data durability, removing the necessity of a separate data backup process. Compared with traditional enterprise storage — or with storage on compute-intensive servers — Cloudian saves up to 60% on TCO.

Cloudian saves on space, too, with the industry’s highest density: up to 840TB capacity in each 4U high chassis.

Need to know your current Splunk storage costs? Calculate them using this Splunk storage calculator.

Cloud Enabled

Cloudian is on-prem storage, but it integrates directly with public cloud storage services. Employ policy-based tools to replicate or tier data to AWS, GCP or Azure for offsite DR, capacity expansion or data analysis in the cloud. This built-in capability requires no additional software or licenses.

Secure

To ensure data security, Cloudian HyperStore provides AES-256 server-side encryption for data at rest, SSL for data in transit (HTTPS), role-based access controls and storage policies that can be applied at an object and bucket-level.

Multi-Purpose Scalable Storage

Cloudian offers the industry’s most compatible S3 API, so it integrates seamlessly with S3-compatible applications. In addition to Splunk SmartStore, it also provides scalable storage for data management applications from Rubrik, Veeam, Commvault, Veritas, Pure Storage, Quantum, Komprise, and others.

Deploy as Software or Appliances

Cloudian is available as preconfigured appliances, with capacities from 96TB to 840TB, and as software for either bare-metal servers or VMs.

Download the Splunk TCO Report.

Read more about Cloudian here, or contact us for more information.

 

YOU MAY ALSO BE INTERESTED IN:

Object Storage vs File Storage: What’s the Difference?