Anatomy of a Splunk Data Model
Splunk is a scalable system for indexing and searching log files. In order to make data indexable and searchable in Splunk architecture, you need to define a data model. A Splunk data model is a hierarchy of datasets that define the structure of your data. Your data model should reflect the base structure of your data and the Pivot reports required by your end users.
Read on to understand how splunk data models and datasets work, how to define a data model using the Splunk editor, and important best practices for efficient data model design.
In this article, you will learn:
- What is a Splunk data model
- Understanding Splunk datasets
- How to create a data model
- Tips for model design
- Splunk storage at scale with Cloudian HyperStore
What Is a Splunk Data Model?
A data model is a structured, hierarchical mapping of semantic knowledge of a collection of datasets. It outlines the details necessary to enable searches of dataset information. Within a data model, datasets are typically arranged categorically into parent and child datasets. This ordering makes it easier for users to search specific parts of a dataset.
In Splunk, data models and the searches enabled are used to generate pivot reports for users. Pivot reports are visualizations, tables, or charts displaying information from a dataset search. Pivot is also the name of the tool used to create pivot reports in Splunk.
In Pivot, users select the data model they want to use according to the data they want to work with. Within that model, they select the dataset specific to the data they want to report on. Once these parameters are entered, users can create charts, statistical tables, and visualizations according to selected row and column configurations.
Creating a data model
Creating data models requires understanding your data sources and semantics. Within your model, you need to apply this understanding when defining how datasets are ordered.
The architecture of your Splunk data model should be determined by your data types and sources. For example, if your data is from system logs, you need to create several root datasets, such as for searches, events, or transactions. By contrast, if your data is from a table-based format, such as a CSV file, you can create a flat, single root dataset. Within each root dataset you can create child sets accordingly.
For more information about Splunk for big data, check out our guide: Splunk Big Data: a Beginner’s Guide
Understanding data models requires understanding the datasets that compose the model:
- Datasets correspond to a set of data in an index—Splunk data models define how a dataset is constructed based on the indexes selected. As stated previously, datasets are subsections of data.
- Datasets are categorized into four types—event, search, transaction, child.
- Datasets are defined by fields and constraints—fields correspond to the columns of a dataset and define the name, data type and properties of the column. Constraints are rules applied to data changes that are used to ensure data integrity.
- Datasets are hierarchical—can either be a root, parent, or child dataset. Depending on your hierarchy, a child dataset may also be a parent if it has children of its own. Root datasets are the top-level event, search, or transaction datasets in your model.
- Child datasets inherit from parents—each child set you create inherits the fields of its parent but can also include additional fields as needed. A dataset can gain additional fields from custom field extractions, regular-expression-based extractions, lookups, or eval expressions.
How to Create a Data Model
Once you are ready to create a Splunk data model, the process is relatively straightforward. Before you can begin, you need to ensure that you have the correct permissions. To create a model, you also need the ability to write apps. If you do not have this permission, you will not see the New Data Model button.
To create a new data model:
- On the Data Models management page, select New Data Model.
- Configure a title for your model:
- Titles can include any character except asterisks (*).
- When you create your title, the ID field automatically fills. You should not change this field unless you have to. If so, keep in mind that IDs can only contain letters, numbers, and underscores. You can not change this value once you create the data model.
- If desired, you can enter a model description or change the app value.
- Once your information is entered click Create. This opens the model in the editor from where you can add and define datasets.
- Click Add Dataset to begin. You can then select your dataset type and begin adding constraints and fields.
Tips for Data Model Design
Designing effective Splunk data models can be a challenge and may take multiple attempts to get right. When creating and refining your models, consider the following tips. These can help you get started faster and can speed the refinement process.
- Base models on Pivot user needs—it is not efficient to create models and then try to adapt to user needs. Start with an understanding of what your users want and work from there.
- Base models on existing searches and dashboards—this is the data most relevant to you so it makes sense to incorporate it into models first. Additionally, using pivot reports as a base for dashboards is often easier to maintain.
- Using streaming commands—using streaming with root event and root search datasets enables you to take advantage of Splunk data model acceleration.
- Include indexes—when defining constraints and searches for accelerated root datasets, including indexes improves accuracy. If you do not specify indexes, your model will search all available indexes.
- Minimize hierarchy depth—filtering by constraints is less efficient as you gain hierarchy levels.
- Use field flags selectively—field flags enable you to expose or hide fields within a dataset. You can use this feature to reduce the number of fields visible to Pivot users, making reporting easier.
To ensure billing efficiency, you should estimate your storage needs. There is no automatic or official Splunk storage calculator, but you can run some calculations. For more information, check out our guide: Splunk Storage Calculator: Learn to Estimate Your Storage Costs.
Reduce the Cost of Splunk Storage and Increase Scalability with Cloudian HyperStore
If you are managing large volumes of data in Splunk, you can use Splunk SmartStore and Cloudian HyperStore to create an on-prem storage pool. The storage pool is separate from Splunk indexers, and is scalable for huge data stores that reach exabytes. Here’s what you get when you combine Splunk and Cloudian:
- Control—Growth is modular, and you can independently scale resources according to present needs.
- Security—Cloudian HyperStore supports AES-256 server-side encryption for data at rest and SSL for data in transit (HTTPS). Cloudian HyperStore protects your data with fine-grained storage policies, secure shell, integrated firewall and RBAC/IAM access controls.
- Multi-tenancy—Cloudian HyperStore is a multi-tenant storage system that isolates storage using local and remote authentication methods. Admins can leverage granular access control and audit logging to manage the operation, and even control quality of service (QoS).