Splunk – Metrics and a Poor Man’s mcollect

Strangely the metrics feature in Splunk v7 is missing a command you would think we would have — mcollect. A command to take search results and collect it into a metric store index. Similar but way better for tracking things like license or index size over time using the normal stats and collect commands together.

Duane and I were hanging out and decided to make a poor man’s mcollect.

  1. Make a metric index, mine is called testmetrics
  2. Using inputs.conf in the app context of your choice setup a batch input
  3. Make your search to format results compatible with the metrics_csv sourcetype
  4. Graph your mstats for fun!

Inputs.conf Example:

Note we used a name convention of metrics_testing* so that we could easily target only the csv files we will be exporting using outputcsv soon. Start with the word metrics then second word is related to the index we are going to put it into then a wildcard. You can see we use the sinkhole policy to ensure the file is removed after indexing to avoid filling the disk if you do a lot of metrics this way.

Note that indexing data into a metrics index is counted as 150 bytes per metric event against your Splunk License.

Metrics Gen Search:

In this search we gather the index total size per index, reformat fields to match metrics_csv format.

Then we use an Evil subsearch trick to generate a filename with a timestamp to outputcsv to.

You could easily schedule this search to collect your stats hourly, daily etc. And adapt all this for license usage.

Viewing your New Stats:

The graphing search:

And a sample from our nice new graph!

splunk mstats

Splunk – Metrics Getting Data In my HEC Class

If you are using Splunk v7 you may be looking into the new Metrics store. This is a specialized index time summary index.

If you want to play with it you can use my Splunk HTTP Collector Python Class.   Setup a HEC token and Metric index as outlined in the Splunk Docs. Look at the section “Get metrics in from clients over HTTP or HTTPS”.

Then write a little python to make a HEC connection object then set the payload to match the requirements from the Metrics docs.

Here is an example block of code from modifying my example.py from my class git repo.

A couple of comments. First note the “event” part of the payload that normally has your event dict is just the word “metric”. This is required by Splunk for Metrics via HEC. Next, put your metric payload into the “fields” part of the HEC payload. This is a dict that HEC turns into index time extractions. Normally you want to minimize HEC index time extractions. However, Metrics is solely index time. You have to set your metric_name and _value for that measurement in the fields dict. Additionally any other fields under the fields dict become the dimensions fields the Metrics docs talk about.

Enjoy exploring the metrics store in Splunk.


Splunk – Enterprise Security Enhancing Incident Review

I see folks ask a lot about adding fields not originally in a notable to the notable  in incident review in SplunkES. The initial idea people have is that if they make an adaptive response action that it magically go get more data and have it show up for that notable event. The seed of the idea is there, but it does not work on it’s own.

Let’s go backwards through this.

Follow the ES Docs on adding a field to display in incident review *IF* it is in the notable and has a value. http://docs.splunk.com/Documentation/ES/5.0.0/Admin/CustomizeIR

That means normally the field has to be present already in the notable to be displayed . Bear with me. This is the first step and we can actually make fields show up that are NOT already in the notable event. Let’s say we added the field support_ticket to Incident Review Settings.

Next a magic trick that has nothing to do with adaptive responses. You CAN make new fields not in the notable show up if you have them added above AND you use a lookup. This is what most folks do not know.

Prior to ES 4.6 there was a macro that drives the content display for Incident Review called “Incident Review – Main”.  This became a search with v4.6.

The default search looks like:

We can cleanly insert our own lookup by making a macro for it and shimming it into the Incident Review – Main search. Just make sure your lookup, macro permissions are set right and everyone in ES can read it.

Presume we have a lookup that somehow has event_id to support_ticket mapped. Since we have event_id for a notable event the lookup can return the field support_ticket.

We define a macro for the lookup command: get_support_ticket

We shim that macro into the Incident Review – Main search:

Note I put it after the suppression filter.

Now magically if a notable’s event_id value is added to the lookup, next time someone views that notable in Incident Review they will see the new field support_ticket. If you are clever you can add a workflow action on the support_ticket field to open that ticket via URL in the ticketing system.

The above is helpful already. Think of having a lookup of your cloud account IDs for AWS, Azure etc with all sorts of contact and context. Have that field cloud_account_id show in your notables and the above auto lookup can make more context show for your SOC team.

Now you can go further by writing a custom alert action using Splunk Add-On builder setup with Adaptive Response support. The code can get data from an external system based on data in the notable event.  You could then index that returned data and have a search run across that to maintain a lookup table using outputlookup. Alternatively, you can update KVStore based lookups directly via the Splunk API. If you are willing to dig into the code you can see something similar with my TA-SyncKVStore app modular alert for Splunk.

Combining all that would let an adaptive response fetch data, store it in Splunk and have it display automatically on existing Notable Events within Incident Review.

Good Luck!




Splunk – Null Thinking

It takes GOOD data hygiene to become a mature Splunk environment. This starts with field parsing and mapping fields to the common information model.

There is one thing some people overlook with their data hygiene and Splunk. That is when NOT to include a field in the data payload indexed into Splunk. This happens a lot when the logs are internally developed and in use with Splunk HTTP Event Collector.

Sample JSON event fields:

Notice how the field killer has an empty value?

Why does this matter?

1. Consumes license for the string of the field name in all the events. This can be real bad at volume.
2. It throws off all the Splunk auto statistics for field vs event coverage.
3. Makes it hard to do certain search techniques.

License Impact:
If you have empty fields in your event payload in all your events this can add up. It impacts not only the network traffic to get it to Splunk but license usage and storage within Splunk. Splunk does compress indexed data. Still at very large volumes empty fields across your events can add up.

Event Coverage:

Splunk makes it handy to click on an “interesting” field in the search interface to see basic stats about the field and it’s values. If you have an empty value field preset then you can end up with 100% coverage. The field exists in all events from a search perspective. This is misleading if you want to know what percentage of events have actual useful values.


NULL Usage:

Null is a STATE not a value. I mean something is NULL or it is NOT NULL. This has no reflection of the content of the field value.

If you search like this

Empty strings will break easy to use null based functions like coalesce, isnull, and isnotnull.

Not sending in empty string fields you can easily find events that are missing the field when you expected it to be there. You can use the event coverage mentioned above or a simple search like.

Lookups and KVStore:

Another place making sure you have popped off empty string fields matters is when updating KVStore lookup tables via the Splunk REST API. If you leave the field on the data payload you get a row in your lookup with an empty string for that column.

If you are programmatically maintaining a lookup table which you use as an event filter this can break the common search pattern below. This is a pattern when you simply want events where they have a field that are or are not in a lookup table regardless of the information in the table.


Save bandwidth, field count, make null testing easy by dropping all fields that have empty values from your events before indexing.

A quick python example:

Let’s say we have some API that returns this JSON dict for an event we want to index.

This python example gives us a stripped down dict we can use instead.

This gives us:

You can see the None and empty fields test and killer are removed. Now we have a nice JSON dict to use as our event payload to index. Real handy if you are using my Splunk HTTP Event Collector class for data ingestion.


Splunk New Technology Add-ons: SyncKVStore and SendToHEC

I recently updated and replaced older repositories from my GitHub account that were hand made modular alerts to send search results to other Splunk instances. The first one sends the search results to a Splunk HTTP Event Collector receiver. The second one came from our Splunk 2016 .conf talk on KVStore. It was useful for sending search results (typically an inputlookup of a table) to a remote KVStore lookup table.


You can find the updated Send To HEC TA on Splunkbase: TA-Send_to_HEC or in my GitHub repository: TA-Send_to_HEC.

This is useful for taking search results and sending to another Splunk instance using HEC. If you chose JSON mode it will send the results as a JSON payload of all the fields after stripping any hidden fields. Hidden fields start with an underscore. RAW mode is a new option which takes the _raw field and sends ONLY that field to the remote HEC receiver.


This has been completely redone. I have submitted it to Splunkbase, but for the moment you can get it from my GitHub repository: TA-SyncKVStore

Originally it only sent search results to a remote KVStore. Now it also has two modular inputs. The first pulls a remote KVStore collection (table) and puts it into a local KVStore collection. The second pulls the remote KVStore collection but indexes it locally in JSON format. It will strip the hidden fields before forming the JSON payload to index. You are responsible for making sure all the appropriate and matching KVStore collections exist.

If you look in the code you will notice an unusual hybrid of the Splunk SDK for Python to handle KVStore actions and my own python class for batch saving the data to the collection. I could not get the batch_save method from the SDK to work at all. My own class already existed and was threaded for performance from my old version of the modular input so I just used the SDK to clear data if you wanted a replace option and then my own code for saving the new or updated data.

I rebuilt both of these TAs using the awesome Splunk Add-on Builder. This makes it easy in the SyncKVStore TA to store the credentials in the internal Splunk encrypted storage. One comment to update on the previous post on credential storage. The Add-on Builder was recently updated and now gives much better multiple credential management with a “global account” pull down selector you can use in your inputs and alert actions.


Splunk Stored Encrypted Credentials

I wrote about automating control of other systems from Splunk back in 2014. Things are very different now in what support Splunk provides for framework and SDKs. I have been looking to update some of the existing stuff in my git repo and using the Splunk Add-on builder. It handles a lot of the work for you when integrating to Splunk.

We now have modular alerts which is the evolution of the alert script stuff we were doing in 2014. Splunk also now has modular inputs, the old style custom search commands, and the new style custom search commands. In all cases, you could want to use credentials for a system that you do not want hard coded or left unencrypted in the code.

The Storage Passwords REST Endpoint

You will typically find two blog posts when you look into storing passwords in Splunk. Mine from 2014 and the one from Splunk in 2011 which I referenced in my details post with code. Both posts mention a critical point. The access permissions to make it work.

Knowledge objects in Splunk run as the user that owns them. I am talking the Splunk application user context. Not the OS system account you start Splunk under. If I run a search and save it as an alert then attach an alert action the code that executes in the alert action has Splunk user permissions as me. The owner of the search that triggered it at the time.

This is a critical point because you had to have a user capability known as ‘admin_all_objects’. Yes that is as godlike as it sounds. It normally is assigned to the admin user role. That has changed recently with Splunk 6.5.0. There is a new capability you can assign to a Splunk user role called ‘list_storage_passwords’. This lets your user account fetch from the storage passwords endpoint without being full admin over Splunk. It still suffers one downside. It is still an all or nothing access. If you have this permission you can pull ALL encrypted stored passwords. Still it is an improvement. Yes, it can be misused by Splunk users with the permission if they go figure out how to directly pull the entire storage. You have to decide whom your adversary is. The known Splunk user whom could pull it out, or an attacker or red team person whom finds credentials stored in scripts either directly on the system or in a code repository. I vote for using the storage as the better of the two choices.

Stored Credentials:

Where are they actually stored? On that point I am not going to bother with old versions of Splunk. You should be life cycle maintaining your deployment so I am going to just talk about 6.5.0+.

You need to have a username, the password, a realm and which app context you want to put it in. Realm? Yeah that is a fancy name for what is this credential for because you might actually have five different accounts named admin. How do you know which is the admin you want for a given use? Let’s say I have the username gstarcher on the service adafruit.io. I want to store that credential so I can send IOT data to my account there. I also have an account named gstarcher on another service and I want Splunk to be able to talk to both services using different alerts or inputs or whatever. So I use the realm to say adafruitio, gstarcher, password to define that credential. I might have the other be like ifttt, gstarcher, apikey. I can tell them apart because of the realm.

Wait, what about app context? If you have been around Splunk long you know that all configurations and knowledge objects exist within “applications” aka their app context. If you make a new credential via the API and do not tell the command what application you want it stored under then it will use the one your user defaults to. That is most often the Searching and Reporting app, aka search. That means if you look in $SPLUNK_HOME$/etc/apps/search/local/passwords.conf you will find the credentials you stored.

Example passwords.conf entry:

Do you notice it is encrypted? Yeah, it will be encrypted ONLY if you add the password using the API calls. If you do it by hand in the .conf text file then it will remain unencrypted. Even after a Splunk restart. This is odd behavior considering it uses splunk.secret to auto encrypt passwords in files like server.conf on a restart. So don’t do that.

How is it encrypted? It is encrypted using the splunk.secret private key for the Splunk install itself on that particular system. You can find that in $SPLUNK_HOME/etc/auth. That is why you tightly control whom has access to your Splunk system at the OS level. Audit it, make alerts on SSH into it etc. This file is needed as the software must have a way to know it’s own private key to decrypt things. Duane and I once wrote something in 30 minutes on a Saturday to decrypt passwords if you have the splunk.secret and conf files with encrypted passwords. So protect the private key.

Let me say this again. The app context ONLY matters in where the password lands for a passwords.conf perspective. The actual storage_passwords rest endpoint has no care in the world about app permissions for the user. It only checks if you have the capability list_storage_passwords. It will happily return every stored password to a get call. It will ONLY filter results if you set the app name when you make the API connection back to the Splunk REST interface. If you don’t specify the app as a filter it will return ALL credentials stored. Other than that, it is up to you to use username and realm to grab just the credential you need in your code. Don’t like that? Then please, please log a Splunk support ticket of type Enhancement Request against Core Splunk product asking for it to be updated to be more granular and respect app context permissions. Be sure to give a nice paragraph of your particular use case. That helps their developer stories.

Splunk Add-on Builder:

There are two ways the Splunk Add-on Builder handles “password” fields. First, if you place a password field in the Alert Actions Inputs panel for your alert, the Splunk GUI will obscure the password. The problem is that it is NOT encrypted. Let’s say you made this alert action. You attach your new alert action to a search. The password gets stored unencrypted in savedsearches.conf of the app where the search is saved.

The Add-on Builder provides an alternative solution that does encrypt credentials. You have to use the Add-on Setup Parameters panel and check the Add Account box. This lets you build a setup page you can enter credentials in for the TA. Those credentials will be stored in passwords.conf for the TA’s app context. There is one other issue. Currently the app builder internal libraries hard code realm to the be the app name. That is not great if you are making an Adaptive Response for Splunk Enterprise Security and want to reference credentials stored using the ES Credential Manager GUI. If you are making a TA that will never have multiple credentials that share the same username then this is still ok.

Patterns for Retrieval:

This is where everyone has the hardest time. Finding code examples on actually getting your credential back out. And it varies based on what you are making. So I am going to show an example for each type. Adapting it is up to you.

Splunklib Python SDK:

You will need to include the splunklib folder from the Splunk Python SDK in your App’s bin folder for the newer non InterSplunk style patterns. Yeah I know, why should you have to keep putting copies of the SDK in an app on a full install of Splunk that already should have it? Well there are reasons. I don’t get them all, but has to with design decisions and issues on paths, static vs dynamic linking concepts etc. All best left to the Splunk dev teams. Splunk admins hate the result of larger application bundles, but it is what it is.

Adding a Cred Script:

This is just a quick script that assumes it is in a folder and the splunklib is a folder level up which is why the sys.path.append is what it is for this example. This is handy if you are a business with a central password control system. You could use this as a template on how to reach into Splunk to keep credentials Splunk needs in sync with the centrally managed credential.

Modular Alert: Manual Style

The trick is always how do you get the session_key to work with. Traditional modular alerts send the information into the executes script via stdin. So here we grab stdin, parse it to JSON and pull off our session_key. Using that we can call a simple connect back to Splunk using the session_key and fetch the realm/username that are assumed to be setup in the modular alert configuration which is sent also in that payload of information.

Add-on Builder: Alert Action: Fetch realm other than app

Again it comes down to how do you obtain the session_key of the user that fires the knowledge object. The app builder has this great helper object and session_key is just a method hanging off it. We do not even have to grab stdin and parse it.

Add-on builder: Alert Action: App as realm

Just call their existing method you only specify the username because it is hardcoded to the app name for the realm.

Custom Search Command: Old InterSplunk Style

In an old style custom search command the easiest pattern to leverage the Intersplunk library to grab the sent “settings” which includes the sessionKey field. After we have that we are back to our normal Splunk SDK client pattern. You can see we are just returning all credential. You could use arguments on your custom search command to pass in the desired realm and username and borrow the credential for if pattern from the modular alert above.

Custom Search Command: New v2 Chunked Protocol Style

The new v2 chunked style of search command gives us an already authenticated session connection via the self object. Here we don’t even need to find and handle the session_key and just call the self. service.storage_passwords method to get all the credentials and leverage our usual SDK pattern to get the credential we want. The below pattern does not show it but you could pass realm and username in via arguments on your custom search command. You could then use the credential for if pattern from the modular alert example up above to grab just the desired credential.

Modular Input: Manual Style

I honestly recommend using the Add-on Builder these days. But if you want to use credentials with a manually built input Splunk has documentation here http://dev.splunk.com/view/SP-CAAAE9B#creds . Keep in mind you have to setup what username to send a session_key for by specifying the name in passAuth in the inputs.conf definition.

Modular Input: Add-on Builder

This works the same as our alert actions because of the helper object and the wrapping App Builder does for us. See Above on the other Add-on Builder examples. It is much easier to use and could be made to use the gui and named user creds.


Splunk Getting Extreme Part Eight

Extreme Search has some other commands included with it. They have included the Haversine equation for calculating physical distance in the xsGetDistance command. We can couple that with the Splunk iplocation command to find user login attempts across distances too fast for realistic travel.

Context Gen:

Class: default

First we will create a default context with a maximum speed of 500mph. Note how we do not specify the class argument.

| xsCreateUDContext name=speed container=travel app=search scope=app terms="normal,fast,improbable,ludicrous” type=domain min=0 max=500 count=4 uom=mph

Class: all

Second we will create a context for the class all with the same maximum speed of 500mph. We could use a different maximum if we wanted here.

| xsCreateUDContext name=speed container=travel app=search scope=app terms="normal,fast,improbable,ludicrous” type=domain min=0 max=500 count=4 uom=mph class=all

Class: foot

Last we will create a context for the class foot with a maximum speed of 27.8mph. This is approximately the maximum foot speed of a human. This could be useful if measuring speed across a place like a college campus.

| xsCreateUDContext name=speed container=travel app=search scope=app terms="normal,fast,improbable,ludicrous” type=domain min=0 max=27.8 count=4 uom=mph class=foot


We will pretend my ssh authentication failures are actually successes. This is just because it is the data I have easily available.

Class: all

tag=authentication action=failure user=* src_ip=* user=* app=sshd | iplocation prefix=src_ src_ip | sort + _time | streamstats current=t window=2 earliest(src_lat) as prev_lat, earliest(src_lon) as prev_lon, earliest(_time) as prev_time, earliest(src_City) as prev_city, earliest(src_Country) as prev_country, earliest(src_Region) as prev_region, earliest(src) as prev_src, by user | eval timeDiff=(_time - prev_time) | xsGetDistance from prev_lat prev_lon to src_lat src_lon | eval speed=round((distance/(timeDiff/3600)),2) | table user, src, prev_src, src_Country, src_Region, src_City, prev_country, prev_region, prev_city, speed | eval travel_method="all" | xswhere speed from speed by travel_method in travel is above improbable | convert ctime(prev_time)

Class: foot

tag=authentication action=failure user=* src_ip=* user=* app=sshd | iplocation prefix=src_ src_ip | sort + _time | streamstats current=t window=2 earliest(src_lat) as prev_lat, earliest(src_lon) as prev_lon, earliest(_time) as prev_time, earliest(src_City) as prev_city, earliest(src_Country) as prev_country, earliest(src_Region) as prev_region, earliest(src) as prev_src, by user | eval timeDiff=(_time - prev_time) | xsGetDistance from prev_lat prev_lon to src_lat src_lon | eval speed=round((distance/(timeDiff/3600)),2) | table user, src, prev_src, src_Country, src_Region, src_City, prev_country, prev_region, prev_city, speed | eval travel_method="foot" | xswhere speed from speed by travel_method in travel is above improbable | convert ctime(prev_time)


We combined a User Driven context with another XS command to provide ourselves an interesting tool. We also saw how we could use different classes within that UD context to answer the question on a different scale. Try adding another class like automobile with a max=100 to find speeds that are beyond safe local travel speeds.

This would be real fun when checking webmail logs to find compromised user accounts. Especially if you combine with Levenshtein for look alike domains sent to users to build the list of whom to check.