There is one issue that comes up all the time for IT folks that are new to Splunk. Syslog is something that most IT organizations are already dealing with. It is the easiest log source to get started with and the one easiest to make a mess of.
Here are the common scenarios:
- Syslog is already collected from network devices and other appliances such as spam filter systems. It is sent to a Linux syslog server such as rsyslog or syslog-ng. On rare occasions it is something on Windows e.g. Kiwisyslog.
- Syslog is not used. But HEY it is a quick easy example of collecting logs the auditors told us we were not doing. So someone testing Splunk googles and finds the example of making a Network Input for Syslog and like magic, logs show up in Splunk. NEAT!! Instant ROI.
I will go ahead and get this out of the way now. NEVER.. EVER… Just don’t send syslog straight to Splunk if you want to avoid a lot of headaches. This isn’t Splunk’s fault. But just the nature of the issue. It could apply to most log collection products.
Why not to send straight to Splunk?
- Disruption of Data Collection:
a. If you restart the Splunk indexer you are sending to, you lose syslog data. And yes, you will be applying Splunk updates, doing rolling restarts if you get into Index Clustering etc. You will restart Splunk way more often than you would a syslog service on a dedicated server. Restarting the syslog service is also substantially faster than restarting Splunk.
b. You lose the ability to load balance incoming data across multiple indexers (e.g. Index Clustering)
- Splunk Metadata:
a. If you send different types of devices syslog streams to the same network input on Splunk then you will have a horrible time setting sourcetype and destination index.
- Syslog Administration and Configuration:
a. You get MUCH more flexibility in data handling, routing and filtering with rsyslog or syslog-ng than with a Splunk network port. Maybe you want to drop noisy events before they hit your Indexers.
b. You likely already have network ACLs in place and syslog configuration done on source devices. You won’t have to change that.
c. If you use something like Puppet then re-deploying a failed syslog server with it’s fairly static configuration is easier and good from the business continuity/disaster recovery plan perspective.
d. If your syslog server has plenty of storage you can have an extra backup of the original log data by archiving it to compressed files automatically.
a. If you want to make Splunk listen on port 514 it will need elevated privileges for the whole Splunkd process and it’s child processes. You will have a much smaller attack surface on a dedicated syslog service. Sure iptables etc can be used to trick your way around this but those are topics not covered here.
How do you prepare for success?
Here is the secret sauce to success. Something that a large number of IT groups do not implement. It is the PTR DNS record for reverse DNS.
Splunk will try and get the host field on network inputs by default using reverse DNS. Syslog-ng and rsylog will do this as well. So you want to make sure DNS records are configured. One other item you may need to consider. DNS caching servers. DNS performance and the volume of lookups could potentially be an issue. You can read more on the topic in Splunk DNS Caching and dnsmasq.
A bonus note. If you use the Splunk SNMP Modular Input, there is now an option to perform the reverse DNS lookup to get the host field information. A FQDN is way better than an IP. See my post on Running SNMP Modular Input on a Universal Forwarder. The code I contributed got implemented into the available version.
Summing up. Before you do anything with Splunk, prepare your syslog sources by doing three things.
- Decide on a good device naming scheme. perhaps asa8.nyc2.myorg.com represents a firewall in a New York City second office location.
- Implement BOTH A and PTR records for the network decides to match that naming scheme. The reverse record is going to be as important as the naming convention.
- Make sure your syslog sources are using NTP and preferably GMT for timezone.
This is going to give you three key benefits.
- You will be able to use wildcards in Splunk forwarder pickup configuration. So if your network team adds a new switch. As long as they named it, made BOTH A and PTR records and sends to the official syslog server then logs will magically just flow into Splunk. No changes required by the Splunk Admin. It just WORKS for already configured types of devices you are expecting.
- You will easily be able to control what sourcetype and index device types go into Splunk with. The host field will be a useful human readable name.
- You will be able to add automatic metadata in Splunk based on the device naming convention. Such as geolocation information.