Splunk – Redis Based Lookups

Welcome to my first guest post about Splunk. I am Donald McCarthy, a fellow Splunk enthusiast of George’s.

Lookup is the feature I have grown to use most frequently in Splunk.  It is a feature that can save you huge amounts of indexing license, as well as make your searches more meaningful by providing non indexed information for context.  If you are unfamiliar with the lookup feature, head over here (http://docs.splunk.com/Documentation/Splunk/6.1.2/SearchReference/lookup) to see its official documentation and here (http://www.georgestarcher.com/splunk-a-dns-lookup-for-abuse-contacts/) for a quick tutorial/example of what it can do for you.

Recently, I found myself staring down the barrel of a problem that required a lookup to contain more than 84 million values.  At this scale Splunk was happy to ingest, index and then lookup data based upon a massive 2 GB CSV.  Remember however that lookups in excess of 100MB are automatically indexed by Splunk to increase their speed. That is a large impact when most initial Splunk licenses are in the 2GB/day range, and can impact even teyabyte+ size licenses if the lookup is updated often.

In this instance the data was indexed, I just wasn’t sure that I was really getting all the performance I needed to solve this particular problem.  After some reading and a few conversations with some Splunk engineers, I settled on Redis (Redis.io).

What does redis do?

Redis is a data structure server that is capable of some really cool things and some amazing speed.  There is one big requirement that users of Redis need to be aware of upfront, RAM usage.  Redis loads everything you give it into RAM, and this is part of how it achieves its incredible speed boost.  In my case, this was not going to be a problem because all of the machines slotted to run Redis had a minimum of 128 GB of RAM.  You will also need to install the Redis module. We will cover one way to install Redis. You will still need to consider the RAM limitations and and be comfortable with setting up Python scripts in Splunk.

Example Use Case:

Massive data sets that need to be searched quickly are where redis excels.  If your data set is a sizable portion of your license and it changes frequently, you might consider redis.  I am using the use case of running huge lists of phone numbers against huge lists of numbers of interest.  The numbers of interest could change frequently depending on who is running this lookup.  If it is an intelligence agency, the targeting changes frequently.  If the organization is a telecom, it may be to investigate fraud patterns.  Whatever the motives may be, when it comes to massive input data matched against massive enrichment data; redis can be an excellent augmentation to speed.  In this example, there is a long list of telephone numbers in Belize.  One of them in particular is of interest, and the rest of them are not.  Each number is going to be enriched with a decision of interesting or not.  If the number were tagged as interesting, that field could be used as a selector for subsequent correlation or analysis.

Installing Redis:

You will have to install Redis on the Splunk search head(s) where you want the lookup to function.  If you use distributed search and indexing, the data is of significant (terabytes and larger) size, and you have the free RAM; pushing redis to both the search heads and indexers can improve performance significantly. These steps have been tested on a typical Ubuntu Splunk server.

To install and add it as a service to auto-start, execute these commands and accept all defaults:

We will also need the redis-py module.  You can uses a method of your choosing to install this module.  I chose pip.

Redis and Lookup Data:

In this case, I am going to mass insert the data into a set named “is_interesting” from a redis command file “phone_numbers.add” generated by the following perl script.

Generating the test data file:

Edit a perl script and call it testdata.pl:

Once you have made the script:

This will create the file phone_numbers.add. It will be in new line delimited format with some Redis commands prepended.

Loading sample data into Redis:

note the next command may error if redis is already running. You may ignore the error.

By default, Redis will store a .aof file that reloads this set into memory upon restarting the Redis server.  If you do not want this behavior, you can change it in the redis configuration file.

We will pretend that we have millions of values in our set similar to the following: 5012204024 which is the contact number for HP Belize Adventures.  By the way, if you want a fabulous dive experience on your trip to Belize, Hugh Parkey’s (http://www.hpbelizeadventures.com/) is truly a class act.   Who knows, there might be good national security value in knowing who dives with this particular organization. A couple of quick checks:

This should return an integer equal to the unique number of values in your input, in this case 9000.

This should return integer value of 1 if it is the data set and 0 if it is not.  In my case, this should and does return the value of 1.

Splunk Redis Lookup:

So the easy part is done.  We have installed Redis, transformed our data and loaded it into Redis.  We now have to prep Splunk to play nicely with Redis.  This going to require a dynamic external lookup (in the form of a python script).  The python script is a fairly straight forward piece of code.

 REDIS Lookup Python Script:

Create the following Python script. This will be the external lookup python script to talk to Redis.

Paste in the following script. Make sure you update the hashbang path to your Python installation if you need to change it.

Setting up the Lookup in Splunk:

Next we must define an external lookup.

Type:                    External
Command:            is_interesting.py ph_number is_it_interesting
Supported fields     ph_number,is_it_interesting

Note: is_it_interesting is going to be a new field tied to your data upon lookup. In this case, it will have a value of true or false.

Paste in the following external lookup stanza.

Testing the Lookup:

Try the following search in Splunk to emulate having an index of data with the phone number of interest. You should get back a new field is_it_interesting and it should show TRUE.

Try changing the ph_number value in the eval to simulate different phone numbers. Trying one that is not in the Redis test data should give a FALSE value in is_it_interesting.

Wrap Up:

For all the ridiculousness of this scenario, (and it is by no means the best or only use of Redis in Splunk>) Redis is an incredible tool that can significantly speed up slow lookup processes in Splunk>, particularly the older versions of Splunk> if you happen to be on one.  It also has the benefit of allowing you to search lookup datasets larger than 100MB eating away at valuable index license.

Good luck and enjoy!

Share

One Reply to “Splunk – Redis Based Lookups”

  1. Good Stuff. If you are going to use this lookup across Splunk apps, you will need a little extra in your local.meta file.

    [transforms]
    export = system

    [searchscripts]
    export = system

    If you only export the transforms, the lookup won’t work. Exporting searchscripts allows the scripted lookup to be used across apps.

Comments are closed.