Splunk Dry Harder – Splunking the Laundry 2

I was originally going to call this revisit of my old Splunking the Laundry post, Heavy Duty Cycle. My former coworker Sean Maher instead suggested Dry Harder and I could not pass that up as the sequel. So we return to playing with Laundryview data. This is a fun service used in campuses and apartment buildings to let the residents track when there are available machines, check status of their wash and get alerts when done.

The original code was very primitive python scraping a specific laundry view page for my apartment building laundry room. It formatted the data like syslog. From there that went into Splunk.

I decided remaking the code as a modular input was in order if I could make it scrape all shown machines from the page automatically. It works and you can find the TA-laundryview on my Github account. The readme does point out you need to know the laundry room (lr) code found in the URL you normally visit to see a room’s status.

Splunk can pull in any textual information you feed it. Whether that is data generated by small devices like a RaspberryPi 2 or scraping a site like Laundryview and benefitting from the existing machine data. Let’s explore a day’s data from UoA.

So here is the example I have been collecting of the University of Alabama laundry rooms. Note that I have defined an input for each laundry room on campus. The interval I set is every 15 minutes to the index=laundry, sourcetype=laundry, host=laundryview.com. I have found that the 15 minute time frame is enough resolution to be useful without hammering the sites too hard.

UoA Laundry Rooms

UoA Laundry Rooms

A pair of stacked column graphs gives us a fun trend of washers and dryers in use for the entire campus population.

> index=laundry site_name=”UNIVERSITY OF ALABAMA” type=washer | timechart span=15m count by inUse

UoA Washers

> index=laundry site_name=”UNIVERSITY OF ALABAMA” type=dryer | timechart span=15m count by inUse

UoA Dryers

Next we make a Bubble chart panel to bring out the machines in an error status. We define that as Laundryview reporting an offline or out of service status. You will find I defined an eventtype for that.

> index=laundry | stats dc(uniqueMachineID) AS totalMachines by room_name, type | append [search index=laundry eventtype=machine_error | stats dc(uniqueMachineID) AS inError by room_name, type] |  stats sum(totalMachines) AS totalMachines sum(inError) AS inError by room_name, type | eval failure=inError/totalMachines*100

UoA Machine Errors

Here we show it again but this time with a stats table below it helping spot the laundry rooms with the most unavailable machines.

UoA Machine Error Table

You can see we could run all sorts of trends on the data. Want to bet the laundry room usage plummets around the football game schedule? How about throwing machine errors on a map? I actually did make a lookup table of laundry room name to lat/long information. That is when I found out the default map tiles in Splunk do not have enough resolution to get down to a campus level. It gets you down to about the city level. Tuscaloosa in this case. So it was not worth showing.

Other questions you could answer from the data might be:

  1. Do we have any laundry rooms with functioning washers or dryers with none of the other type? Imagine how ticked students would be stuck with a wet bunch of clothes and have to carry it to the next closest laundry room to dry it.

  2. How about alerting when the ratio of machines in an error state hits a certain level compared to the population available in a given laundry room.

  3. Could the data help you pick which housing area you want to live in at school?

  4. How long and how often do machines sit in an Idle status? This maps to a machine that has finished it’s cycle but no one has opened the machine door to handle the finished load. (eventtype=laundry_waiting_pickup)

The possibilities are quite fun to play with. Enjoy!

 

Share

Splunk and Apple Swift for Alert Scripting

This week, I attended my first Nashville Cocoaheads meeting in a few years. Cocoaheads is basically an Apple programming club. The topic was “Scripting with Swift” It was a great presentation by Blake Merryman. Blake works for Griffin Technology and they were our awesome hosts for the meeting. Griffin is one of my favorite accessory makers. Plus there are right here in Nashville.

Of course I immediately wondered. If I can treat Apple Swift as a shell scripting language… Splunk alerts written in Apple Swift! Oh yeah why not?!?

And… The short answer is that it works for Splunk running on OSX. This means you could make data sent from Splunk via alerts to OSX code that has full native library access to anything on the Mac.

Here is a very simple example thanks again to Blake’s example code.

Alert Script Setup

You must have Xcode installed on the Mac as well as Splunk. Xcode provides the Swift language installation.

You need to make a shell wrapper script in the $SPLUNK_HOME/bin/scripts/ folder. Splunk won’t be able to call a .swift script directly. You will need to add the executable flag to both script files. Then you simply setup the desired Splunk alert to call alertSwift.sh which is just a shell command wrapper script to call the Swift script. The key on the swift script is the hashbang that tells the system how to execute Swift.

Our Swift code example below is a simple call of Apple Script to make the Mac say “Splunk Alert.” This is a very simple example. But it shows that if you use all the normal tricks of pulling in the alert script arguments you could pull in a search results zip file and take actions in Mac OSX based on the data. It could be anything including Notification Center alerts. Enjoy the possibilities.

Alert Script Code

Example alertSwift.sh: (Make sure you do a chmod +x alertSwift.sh)

Example alertSwift.swift: (Make sure you do a chmod +x alertSwift.swift)

Share

Splunk sessionKeys and Python Indentions

On sessionKeys to the Kingdom:

I started making a scripted input app to pull in logs from the LastPass Enterprise API. Everything was progressing nicely until I found I could not retrieve the encrypted API key from Splunk where I saved it. I was going to be smart and re-use my credentialsFromSplunk.py class that I created for alert scripts. That is when I beat my head on the wall. Scripted Inputs get sent a sessionKey if you set the passAuth value for your script in the inputs.conf stanza. Stdin is also how sessionKeys are sent to alert_scripts. So I figured my existing code would work great.

I kept getting authentication failures on the API. It turns out, I had not put in a logDebug event for the original sessionKey as it came in. So I had not noticed an inconsistency. SessionKeys sent for scripted inputs do NOT have the “sessionKey=” string at the front of the key sent by Splunk to alert scripts. Thus re-using my existing code that clips off those eleven characters broke the sessionKey value. So I share it here in case you are learning new Splunk features that depend on the sessionKey. Check your original values if you get authentication errors on the API. The sessionKey could contain extra text you have to remove or is URL encoded.

Remember that you can submit feature enhancement requests through the Splunk support portal. It is important when you find such inconsistencies to submit it so Splunk developers can eventually fix it. I did on this one.

On Indention in Python Scripts:

I used to give no thought to using tabs in my Python script as long as it indented right. Now I am firmly in the “four spaces” in place of a tab camp. I have gone through and updated all the code in my git repo to be four spaces based. I found I was getting unexplained bugs where code may or may not execute when expected. It turns out it was all indention related. Code was running only when an IF stanza above it was true. A pesky hidden tab was causing it. So if you edit in vi/vim make sure to change your config to use four spaces. I found these two great links on doing that and putting it into your .vimrc file for persistance.

Share

Splunk Success with Syslog

There is one issue that comes up all the time for IT folks that are new to Splunk. Syslog is something that most IT organizations are already dealing with. It is the easiest log source to get started with and the one easiest to make a mess of.

Here are the common scenarios:

  1. Syslog is already collected from network devices and other appliances such as spam filter systems. It is sent to a Linux syslog server such as rsyslog or syslog-ng. On rare occasions it is something on Windows e.g. Kiwisyslog.
  2. Syslog is not used. But HEY it is a quick easy example of collecting logs the auditors told us we were not doing. So someone testing Splunk googles and finds the example of making a Network Input for Syslog and like magic, logs show up in Splunk. NEAT!! Instant ROI.

I will go ahead and get this out of the way now. NEVER.. EVER… Just don’t send syslog straight to Splunk if you want to avoid a lot of headaches. This isn’t Splunk’s fault. But just the nature of the issue. It could apply to most log collection products.

Why not to send straight to Splunk?

  1. Disruption of Data Collection:
    a. If you restart the Splunk indexer you are sending to, you lose syslog data. And yes, you will be applying Splunk updates, doing rolling restarts if you get into Index Clustering etc. You will restart Splunk way more often than you would a syslog service on a dedicated server. Restarting the syslog service is also substantially faster than restarting Splunk.
    b. You lose the ability to load balance incoming data across multiple indexers (e.g. Index Clustering)
  2. Splunk Metadata:
    a. If you send different types of devices syslog streams to the same network input on Splunk then you will have a horrible time setting sourcetype and destination index.
  3. Syslog Administration and Configuration:
    a. You get MUCH more flexibility in data handling, routing and filtering with rsyslog or syslog-ng than with a Splunk network port. Maybe you want to drop noisy events before they hit your Indexers.
    b. You likely already have network ACLs in place and syslog configuration done on source devices. You won’t have to change that.
    c. If you use something like Puppet then re-deploying a failed syslog server with it’s fairly static configuration is easier and good from the business continuity/disaster recovery plan perspective.
    d. If your syslog server has plenty of storage you can have an extra backup of the original log data by archiving it to compressed files automatically.
  4. Security:
    a. If you want to make Splunk listen on port 514 it will need elevated privileges for the whole Splunkd process and it’s child processes. You will have a much smaller attack surface on a dedicated syslog service. Sure iptables etc can be used to trick your way around this but those are topics not covered here.

How do you prepare for success?

Here is the secret sauce to success. Something that a large number of IT groups do not implement. It is the PTR DNS record for reverse DNS.

Splunk will try and get the host field on network inputs by default using reverse DNS. Syslog-ng and rsylog will do this as well. So you want to make sure DNS records are configured. One other item you may need to consider. DNS caching servers. DNS performance and the volume of lookups could potentially be an issue. You can read more on the topic in Splunk DNS Caching and dnsmasq.

A bonus note. If you use the Splunk SNMP Modular Input, there is now an option to perform the reverse DNS lookup to get the host field information. A FQDN is way better than an IP. See my post on Running SNMP Modular Input on a Universal Forwarder. The code I contributed got implemented into the available version.

Summing up. Before you do anything with Splunk, prepare your syslog sources by doing three things.

  1. Decide on a good device naming scheme. perhaps asa8.nyc2.myorg.com represents a firewall in a New York City second office location.
  2. Implement BOTH A and PTR records for the network decides to match that naming scheme. The reverse record is going to be as important as the naming convention.
  3. Make sure your syslog sources are using NTP and preferably GMT for timezone.

This is going to give you three key benefits.

  1. You will be able to use wildcards in Splunk forwarder pickup configuration. So if your network team adds a new switch. As long as they named it, made BOTH A and PTR records and sends to the official syslog server then logs will magically just flow into Splunk. No changes required by the Splunk Admin. It just WORKS for already configured types of devices you are expecting.
  2. You will easily be able to control what sourcetype and index device types go into Splunk with. The host field will be a useful human readable name.
  3. You will be able to add automatic metadata in Splunk based on the device naming convention. Such as geolocation information.

Continue reading “Splunk Success with Syslog”

Share

Splunk – Redis Based Lookups

Welcome to my first guest post about Splunk. I am Donald McCarthy, a fellow Splunk enthusiast of George’s.

Lookup is the feature I have grown to use most frequently in Splunk.  It is a feature that can save you huge amounts of indexing license, as well as make your searches more meaningful by providing non indexed information for context.  If you are unfamiliar with the lookup feature, head over here (http://docs.splunk.com/Documentation/Splunk/6.1.2/SearchReference/lookup) to see its official documentation and here (http://www.georgestarcher.com/splunk-a-dns-lookup-for-abuse-contacts/) for a quick tutorial/example of what it can do for you.

Recently, I found myself staring down the barrel of a problem that required a lookup to contain more than 84 million values.  At this scale Splunk was happy to ingest, index and then lookup data based upon a massive 2 GB CSV.  Remember however that lookups in excess of 100MB are automatically indexed by Splunk to increase their speed. That is a large impact when most initial Splunk licenses are in the 2GB/day range, and can impact even teyabyte+ size licenses if the lookup is updated often.

In this instance the data was indexed, I just wasn’t sure that I was really getting all the performance I needed to solve this particular problem.  After some reading and a few conversations with some Splunk engineers, I settled on Redis (Redis.io).

Continue reading “Splunk – Redis Based Lookups”

Share

Splunk .conf 2014 Slides

** Update Oct 15, 2014 – Poodle SSLv3 Issue**
The talk was given the week before the SSLv3 issue was released. Please remove all references to supportSSLV3Only = true from the configs when you use them. You also can find more from Splunk on the SSLv3 issue and how to mitigate at http://www.splunk.com/view/SP-CAAANKE

——-

.conf 2014 was a great time this year. Duane and I enjoyed giving the talk “Avoid the SSLippery Slope of Default SSL” with great questions from the audience. I was surprised at the solid turn out for a Thursday 9am talk. My talk was “From Tool to Team Member: Controlling Systems with Splunk Alert Scripts”

Here are the PDF copies of the slides for both talks:

Increasingly, production security requires more than using default SSL certificates. This session will cover best practices for implementing your own SSL certificates on all Splunk channels. The right configuration and steps can provide both encryption and authentication needed for today’s due diligence requirements.

We will go in depth into setting up alert scripts that can make web services calls to other devices such as intrusion prevention systems. This gives Splunk the ability to actively control such systems. Code samples will be provided that include being able to save login credentials encrypted within Splunk. Using alert scripts we can change Splunk from just a tool into an IT team member taking actions on your behalf!

Share

Splunk Capturing SMNP Traps on a Universal Forwarder

Damien Dallimore of Splunk wrote a great Modular Input for SNMP on Splunkbase. It is written such a way that you install it on your Splunk server (hopefully that is unix based). Then you setup an inputs.conf in the app like this:

What if you don’t want traps going directly to your Splunk server?

Why, yes you can indeed use the smnp_ta on a Universal Forwarder. It needs to have pysnmp installed so usually you are going to be ok on most Linux systems.

You just have to make a couple of changes to snmp_ta/bin/snmp.py:

1. You must absolutely change the hash bang at the the top of the file. Rather than the existing path to the Splunk python instance. You might need to change it to something like the following depending on your system.
#!/usr/bin/python

2. If you do as I do and make copies of TAs and using a naming convention such as TA_app_snmp_cal01. Then you have to edit two other lines in the snmp.py file. Change the path indicated in egg_dir and mib_egg_dir. To something like:

That should do the trick. Now the Universal Forwarder you put the app onto should start listening on UDP 162 for SNMP traps. Just be sure to change the community string and the trap_host to your settings. The trap_host should be the IP of the forwarded you are putting this onto.

MiBs:

Do keep in mind that the parsing of the traps happens at the time they are received and indexed. So you need to install the right MiBs into the app’s bin/mibs folder. It will honestly drive you to drink. It is a painful process. You can read more on that process on a two part series on SNMP polling using the Modular Input at

Host Field and SNMP Traps:

The way the snmp_ta works the host field ends up being the IP address of the system that sent the trap. I prefer my host field to be FQDN names that compliment my earlier post on auto lookup of location by host. I modified the TA’s code to allow a new inputs.conf option in the stanza. It is called trap_rnds. I should be submitting a pull request to Damien soon and submitting the feature back to him. Be watching for the updated app. Keep in mind if you use this feature you will generate a reverse DNS lookup to your infrastructure for each trap event that comes in. So you may need to consider if that will impact the DNS servers that system uses.

Share