Splunk – Null Thinking

It takes GOOD data hygiene to become a mature Splunk environment. This starts with field parsing and mapping fields to the common information model.

There is one thing some people overlook with their data hygiene and Splunk. That is when NOT to include a field in the data payload indexed into Splunk. This happens a lot when the logs are internally developed and in use with Splunk HTTP Event Collector.

Sample JSON event fields:

Notice how the field killer has an empty value?

Why does this matter?

1. Consumes license for the string of the field name in all the events. This can be real bad at volume.
2. It throws off all the Splunk auto statistics for field vs event coverage.
3. Makes it hard to do certain search techniques.

License Impact:
If you have empty fields in your event payload in all your events this can add up. It impacts not only the network traffic to get it to Splunk but license usage and storage within Splunk. Splunk does compress indexed data. Still at very large volumes empty fields across your events can add up.

Event Coverage:

Splunk makes it handy to click on an “interesting” field in the search interface to see basic stats about the field and its values. If you have an empty value field preset then you can end up with 100% coverage. The field exists in all events from a search perspective. This is misleading if you want to know what percentage of events have actual useful values.


NULL Usage:

Null is a STATE not a value. I mean something is NULL or it is NOT NULL. This has no reflection of the content of the field value.

If you search like this

Empty strings will break easy to use null based functions like coalesce, isnull, and isnotnull.

Not sending in empty string fields you can easily find events that are missing the field when you expected it to be there. You can use the event coverage mentioned above or a simple search like.

Lookups and KVStore:

Another place making sure you have popped off empty string fields matters is when updating KVStore lookup tables via the Splunk REST API. If you leave the field on the data payload you get a row in your lookup with an empty string for that column.

If you are programmatically maintaining a lookup table which you use as an event filter this can break the common search pattern below. This is a pattern when you simply want events where they have a field that are or are not in a lookup table regardless of the information in the table.


Save bandwidth, field count, make null testing easy by dropping all fields that have empty values from your events before indexing.

A quick python example:

Let’s say we have some API that returns this JSON dict for an event we want to index.

This python example gives us a stripped down dict we can use instead.

This gives us:

You can see the None and empty fields test and killer are removed. Now we have a nice JSON dict to use as our event payload to index. Real handy if you are using my Splunk HTTP Event Collector class for data ingestion.