Splunk DNS Lookup Performance and Caching with dnsmasq

I use dnsmasq, a light weight DNS caching server, at home on a raspberry pi to log dns traffic when testing things (just uncomment the log-queries option and pull the logs into Splunk). But, what about helping performance of some DNS related activity with Splunk itself?

It is very common to do a LOT of DNS lookups when using Splunk for security purposes. This can create a metric ton of lookup requests to the DNS servers your Splunk server normally points at. That traffic load can cause unforeseen issues at times. I like to setup dnsmasq for local DNS caching on my Splunk search head to help reduce that load when IP lookups are repetitive.

Let’s say your normal network dns server is 10.0.0.1. Rather than have your Splunk server query it or go directly to the root servers we will setup dnsmasq on the server and point it to the normal server you use. This will let Splunk get DNS requests locally to the system if they have been recently cached. We will also lock it down to only work for the local server so other systems on your network do not try and use your Splunk server as a DNS server. Now in my testing it seems dnsmasq seems to re-forward requests in fairly short order despite a large cache value (number of host names). This still should provide some protection if you run a poorly placed dnsLookup command in a Splunk search. Just think of how many times the same IP can come up in searches with large numbers of events. This has to be done for each of the Splunk search heads where the DNS lookups occur.

This is an example of a search that will potentially generate repetitive lookups for the same ip address:

tag=authentication action=failure | lookup dnsLookup ip AS src_ip

This is a better placement of the lookup so you only get one lookup per ip address:

tag=authentication action=failure | stats count by src_ip | lookup dnsLookup AS src_ip

Let’s walk through adding dnsmasq to help reduce the traffic caused by the first search and lookup example.

Install and stop dnsmasq:

Tell dnsmasq how to talk to your normal DNS server:

nameserver 10.0.0.1

Write and exit resolve.dnsmasq.

Prepare a non privileged user account:

Now we edit the actual configuration of dnsmasq:

Find and edit user and group fields to make sure we are not running it as root.
user=dnsmasq group=dnsmasq

Find and edit resolv-file to the below:
resolv-file=/etc/resolv.dnsmasq

Since we only want the local system using the cache find and edit the listen-address:
listen-address=127.0.0.1

Remove the command from no-dhcp-interface to ensure the DHCP service does not start on any interfaces:
no-dhcp-interface=

We sometimes hit a lot of DNS lookups in Splunk. Find and edit cache-size to 10000 hostnames worth of cache:
cache-size=10000

We should also make sure to stretch out time to live for negative lookup replies without a TTL set. We will give it a day.
neg-ttl=86400

Write and exit the dnsmasq.conf file.

Next we force the Splunk server to use its own local dnsmasq caching service:

Change the name server line to:
nameserver 127.0.0.1

Write and exit the resolv.conf file, then start the dnsmasq service:

That is it.

If you try a dns lookup at the command line you should get back the server as 127.0.0.1. Do it twice and note the difference in Query time from the first to the second lookup in succession. (Yeah I know my test VM is a bit laggy)

The initial lookup:

 

The cached lookup:

 

Share

One Reply to “Splunk DNS Lookup Performance and Caching with dnsmasq”

Comments are closed.