Site icon Cloudian

Anatomy of a Splunk Data Model

Splunk is a scalable system for indexing and searching log files. In order to make data indexable and searchable in Splunk architecture, you need to define a data model. A Splunk data model is a hierarchy of datasets that define the structure of your data. Your data model should reflect the base structure of your data and the Pivot reports required by your end users.

Read on to understand how splunk data models and datasets work, how to define a data model using the Splunk editor, and important best practices for efficient data model design.

In this article, you will learn:

 

This is part of a series of articles about Splunk Architecture.

What Is a Splunk Data Model?

A data model is a structured, hierarchical mapping of semantic knowledge of a collection of datasets. It outlines the details necessary to enable searches of dataset information. Within a data model, datasets are typically arranged categorically into parent and child datasets. This ordering makes it easier for users to search specific parts of a dataset.

In Splunk, data models and the searches enabled are used to generate pivot reports for users. Pivot reports are visualizations, tables, or charts displaying information from a dataset search. Pivot is also the name of the tool used to create pivot reports in Splunk.

In Pivot, users select the data model they want to use according to the data they want to work with. Within that model, they select the dataset specific to the data they want to report on. Once these parameters are entered, users can create charts, statistical tables, and visualizations according to selected row and column configurations.

Creating a data model

Creating data models requires understanding your data sources and semantics. Within your model, you need to apply this understanding when defining how datasets are ordered.

The architecture of your Splunk data model should be determined by your data types and sources. For example, if your data is from system logs, you need to create several root datasets, such as for searches, events, or transactions. By contrast, if your data is from a table-based format, such as a CSV file, you can create a flat, single root dataset. Within each root dataset you can create child sets accordingly.

For more information about Splunk for big data, check out our guide: Splunk Big Data: a Beginner’s Guide

Splunk Datasets

Understanding data models requires understanding the datasets that compose the model:

How to Create a Data Model

Once you are ready to create a Splunk data model, the process is relatively straightforward. Before you can begin, you need to ensure that you have the correct permissions. To create a model, you also need the ability to write apps. If you do not have this permission, you will not see the New Data Model button.

To create a new data model:

  1. On the Data Models management page, select New Data Model.
  1. Configure a title for your model:
  1. If desired, you can enter a model description or change the app value.
  1. Once your information is entered click Create. This opens the model in the editor from where you can add and define datasets.
  1. Click Add Dataset to begin. You can then select your dataset type and begin adding constraints and fields.

Tips for Data Model Design

Designing effective Splunk data models can be a challenge and may take multiple attempts to get right. When creating and refining your models, consider the following tips. These can help you get started faster and can speed the refinement process.

To ensure billing efficiency, you should estimate your storage needs. There is no automatic or official Splunk storage calculator, but you can run some calculations. For more information, check out our guide: Splunk Storage Calculator: Learn to Estimate Your Storage Costs.

Read more in our focused guides on:

Reduce the Cost of Splunk Storage and Increase Scalability with Cloudian HyperStore

If you are managing large volumes of data in Splunk, you can use Splunk SmartStore and Cloudian HyperStore to create an on-prem storage pool. The storage pool is separate from Splunk indexers, and is scalable for huge data stores that reach exabytes. Here’s what you get when you combine Splunk and Cloudian:

 

Read the 7 benefits of using Cloudian Object Storage with Splunk SmartStore and how you can reduce storage complexity and save costs.

 

 

Exit mobile version