There is one issue that comes up all the time for IT folks that are new to Splunk. Syslog is something that most IT organizations are already dealing with. It is the easiest log source to get started with and the one easiest to make a mess of.
Here are the common scenarios:
Syslog is already collected from network devices and other appliances such as spam filter systems. It is sent to a Linux syslog server such as rsyslog or syslog-ng. On rare occasions it is something on Windows e.g. Kiwisyslog.
Syslog is not used. But HEY it is a quick easy example of collecting logs the auditors told us we were not doing. So someone testing Splunk googles and finds the example of making a Network Input for Syslog and like magic, logs show up in Splunk. NEAT!! Instant ROI.
I will go ahead and get this out of the way now. NEVER.. EVER… Just don’t send syslog straight to Splunk if you want to avoid a lot of headaches. This isn’t Splunk’s fault. But just the nature of the issue. It could apply to most log collection products.
Why not to send straight to Splunk?
Disruption of Data Collection:
a. If you restart the Splunk indexer you are sending to, you lose syslog data. And yes, you will be applying Splunk updates, doing rolling restarts if you get into Index Clustering etc. You will restart Splunk way more often than you would a syslog service on a dedicated server. Restarting the syslog service is also substantially faster than restarting Splunk.
b. You lose the ability to load balance incoming data across multiple indexers (e.g. Index Clustering)
a. If you send different types of devices syslog streams to the same network input on Splunk then you will have a horrible time setting sourcetype and destination index.
Syslog Administration and Configuration:
a. You get MUCH more flexibility in data handling, routing and filtering with rsyslog or syslog-ng than with a Splunk network port. Maybe you want to drop noisy events before they hit your Indexers.
b. You likely already have network ACLs in place and syslog configuration done on source devices. You won’t have to change that.
c. If you use something like Puppet then re-deploying a failed syslog server with it’s fairly static configuration is easier and good from the business continuity/disaster recovery plan perspective.
d. If your syslog server has plenty of storage you can have an extra backup of the original log data by archiving it to compressed files automatically.
a. If you want to make Splunk listen on port 514 it will need elevated privileges for the whole Splunkd process and it’s child processes. You will have a much smaller attack surface on a dedicated syslog service. Sure iptables etc can be used to trick your way around this but those are topics not covered here.
How do you prepare for success?
Here is the secret sauce to success. Something that a large number of IT groups do not implement. It is the PTR DNS record for reverse DNS.
Splunk will try and get the host field on network inputs by default using reverse DNS. Syslog-ng and rsylog will do this as well. So you want to make sure DNS records are configured. One other item you may need to consider. DNS caching servers. DNS performance and the volume of lookups could potentially be an issue. You can read more on the topic in Splunk DNS Caching and dnsmasq.
A bonus note. If you use the Splunk SNMP Modular Input, there is now an option to perform the reverse DNS lookup to get the host field information. A FQDN is way better than an IP. See my post on Running SNMP Modular Input on a Universal Forwarder. The code I contributed got implemented into the available version.
Summing up. Before you do anything with Splunk, prepare your syslog sources by doing three things.
Decide on a good device naming scheme. perhaps asa8.nyc2.myorg.com represents a firewall in a New York City second office location.
Implement BOTH A and PTR records for the network decides to match that naming scheme. The reverse record is going to be as important as the naming convention.
Make sure your syslog sources are using NTP and preferably GMT for timezone.
This is going to give you three key benefits.
You will be able to use wildcards in Splunk forwarder pickup configuration. So if your network team adds a new switch. As long as they named it, made BOTH A and PTR records and sends to the official syslog server then logs will magically just flow into Splunk. No changes required by the Splunk Admin. It just WORKS for already configured types of devices you are expecting.
You will easily be able to control what sourcetype and index device types go into Splunk with. The host field will be a useful human readable name.
You will be able to add automatic metadata in Splunk based on the device naming convention. Such as geolocation information.
Recently, I found myself staring down the barrel of a problem that required a lookup to contain more than 84 million values. At this scale Splunk was happy to ingest, index and then lookup data based upon a massive 2 GB CSV. Remember however that lookups in excess of 100MB are automatically indexed by Splunk to increase their speed. That is a large impact when most initial Splunk licenses are in the 2GB/day range, and can impact even teyabyte+ size licenses if the lookup is updated often.
In this instance the data was indexed, I just wasn’t sure that I was really getting all the performance I needed to solve this particular problem. After some reading and a few conversations with some Splunk engineers, I settled on Redis (Redis.io).
Damien Dallimore of Splunk wrote a great Modular Input for SNMP on Splunkbase. It is written such a way that you install it on your Splunk server (hopefully that is unix based). Then you setup an inputs.conf in the app like this:
SNMP Modular Input
What if you don’t want traps going directly to your Splunk server?
Why, yes you can indeed use the smnp_ta on a Universal Forwarder. It needs to have pysnmp installed so usually you are going to be ok on most Linux systems.
You just have to make a couple of changes to snmp_ta/bin/snmp.py:
1. You must absolutely change the hash bang at the the top of the file. Rather than the existing path to the Splunk python instance. You might need to change it to something like the following depending on your system. #!/usr/bin/python
2. If you do as I do and make copies of TAs and using a naming convention such as TA_app_snmp_cal01. Then you have to edit two other lines in the snmp.py file. Change the path indicated in egg_dir and mib_egg_dir. To something like:
That should do the trick. Now the Universal Forwarder you put the app onto should start listening on UDP 162 for SNMP traps. Just be sure to change the community string and the trap_host to your settings. The trap_host should be the IP of the forwarded you are putting this onto.
Do keep in mind that the parsing of the traps happens at the time they are received and indexed. So you need to install the right MiBs into the app’s bin/mibs folder. It will honestly drive you to drink. It is a painful process. You can read more on that process on a two part series on SNMP polling using the Modular Input at
The way the snmp_ta works the host field ends up being the IP address of the system that sent the trap. I prefer my host field to be FQDN names that compliment my earlier post on auto lookup of location by host. I modified the TA’s code to allow a new inputs.conf option in the stanza. It is called trap_rnds. I should be submitting a pull request to Damien soon and submitting the feature back to him. Be watching for the updated app. Keep in mind if you use this feature you will generate a reverse DNS lookup to your infrastructure for each trap event that comes in. So you may need to consider if that will impact the DNS servers that system uses.
Some weekends I just pick a couple of lego blocks of technology and click them together to see what happens. I was thinking over the concept of TOR hidden services. It turns out you can run a Splunk Universal Forwarder (UF) with an outputs.conf pointing to your indexer while it listens for inputs from other UFs as a TOR hidden service. You can then make a UF running on something like a raspberrypi send it’s logs back over TOR like a dynamic vpn.
Why would you want to? Because it was neat to do. Here is how to repeat the proof of concept.
Setup $SPLUNK_HOME/etc/system/local/outputs.conf to send logs to localhost:9998
Ensure socat is running to bounce 9998 to 9997. This is how we torrify the Splunk forwarder to Indexer traffic. We need to use it to tunnel the Splunk TCP traffic through TOR. You will want to work up how to make that auto start on reboot and run in background. But here is the command you can run manually to test it. Note in this command you have to know the .onion address of the UF we will use as our TOR to Splunk indexer gateway on the receiving end.
Set Splunk to pickup logs etc via the normal inputs.conf methods.
That is it and you have torrified Splunk forwarder to Indexer traffic. It would let you collect data from remote sources without exposing to them the actual destination address of your Indexing system.
Keep in mind that TOR itself encrypts the traffic so you could stick with the unencrypted “9997” outputs.conf style setup. Or you could still go all out and generate a new SSL Certificate Authority with ECC certificates and do all the normal certificate root and name validation that you should when setting up SSL for Splunk. If you want to learn more on how to do that come see a talk I am giving with a friend at Splunk .conf 2014 this year.
I am often asked how to start looking at Splunk when someone gets interested. This is the same thing I do for myself.
Get the latest build of Splunk and install it on a machine you can test with. Usually this is your daily use laptop or desktop.
Consider your license options. Splunk licensing is based on how much data per day you index into Splunk for searching. The free license will let you index up to 500MB per day. One thing many Splunk administrators do is to get a development license for their personal workstation. This will let you index up to 10GB per day and unlock all the enterprise features. This is great for prototyping and testing your parsing, apps etc on your workstation before moving it to your production system.
Change your default admin password on Splunk once you login for the first time. The last thing you want is to be in a coffee shop and have someone poking into data you have indexed into Splunk that you might not want to share.
Change the web interface to use https. Sure it is the default Splunk SSL certificate but it is better than no encryption at all. Just enable it under Settings->System Settings->General Settings
If you do not end up using a development license or your demo license runs out be sure to firewall Splunk from being accessed outside your local machine. Reference back to my someone in a coffee shop digging through your data comment.
Here are my slides and the tutorial I made for Rolling your own logging vm. Between the slides and the tutorial you can find all the links I referenced.
The VM tutorial uses Ubuntu Linux, syslog-ng and Splunk. I go over how to use syslog-ng with fifo queues to handle multiple sources and even rewrite forwarded syslog events coming from Kiwisyslog before indexing in Splunk. The tutorial zip has both pdf and epub formats in it.
*update* I was asked some questions today during my presentation on MS Log Parser. I added my post on it below to the link list. Also for those downloading my actual logging vm from the link I gave those whom attended my talk. The url does redirect to dropbox so do not be surprised.
*second update* a question came up today on a forensics mailing list to search some evtx event log files. I suggested using MS Log Parser to replay output to syslog. The target being spunk say like in my logging vm tutorial. Then the logs are easily searchable.
This is sort of a follow up to my SSH screencast series for remote access to your Mac. Maybe you are paranoid like me and want to know when a connection has been made to your mac, when a wrong user name has been tried or even a failure to login on a good username. You also want to know this no matter where you are.
I was inspired by the script written by Whitson Gordon, over at Macworld on automating turning off your wireless Airport interface. Note what I have below has only been tested on my Snow Leopard setup. I leave it up to you if you are on Leopard or even Tiger. BTW update your system if you are as far back as Tiger. C’mon join the modern world.
You will have to have Growl installed, also install growlnotify and last you need a Growl to push notification service like Prowl. Then have the Prowl app on your iPhone or iPad.
Read on for the scripts and how to get it all working.
Perhaps you have made yourself a logging vm, or even a logging machine out of an old laptop using my pdf instructions. At home I actually turned a real old IBM Thinkpad A22m into a unbuntu logging machine. Just like my directions only no vmware.
I send all my network hardware logs via syslog to the machine. BUT I also did one simple change to the syslog.conf on every mac in my house. Now all my mac logs collect into my machine for searching in Splunk.
Just open Terminal on your mac.
sudo vi /etc/syslog.conf
edit the file and add the following line, substituting your own logging machine IP address.
Make sure to use an actual ip address in place of loggingmachineipaddress. I tried using the bonjour or mdns name like logger.local and my macs never consistently sent logs. So changing to IP address it seemed to work after that.
Next if you are in Leopard you can do the following two terminal commands to restart syslog and pick up the config change. Otherwise you could also just reboot your mac.