Splunk Getting Extreme Part Two

Part one gave us a walk through of a simple anomalous search. Now we need to go over foundational knowledge about search construction when building extreme search contexts.

Comparing Search Methods

Traditional Search

This is what we did in part one. We ran a normal SPL search across regular events then used a bucket by _time and stats combination to get our statistics trend over time. This is handy when your event data is not tied to an accelerated Data Model.

Context Gen Search Pattern:

search events action=failure | bucket _time span=1h | stats count by _time, src | stats min, max etc | XS Create/Update

Search Speed:

tag=authentication action=failure

“This search has completed and has returned 8,348 results by scanning 14,842 events in 7.181 seconds”

tstats Search

Splunk is great at the “dynamic schema” aka search time extractions. This flexibility comes at the cost of speed when searching. An Accelerated Data Model is a method to give a step up in performance by building an indexed map of a limited set of fields based on that data. This is much faster to search at the trade off of only being able to specify fields that are mapped in the Data Model. Tstats means tsidx stats. It functions on the tsidx indexing files of the raw data plus it runs the equivalent to “ | datamodel X | stats Z” to catch data that is not accelerated already. This is a middle ground between accelerated and non accelerated only data searching.

Context Gen Search Pattern:

| tstats count from datamodel=…. by _time… span=1h | stats min, max etc | XS Create/Update

Search Speed:

| tstats count from datamodel=Authentication where nodename=Authentication.Failed_Authentication

“This search has completed and has returned 1 results by scanning 12,698 events in 1.331 seconds”

tstats summariesonly=true Search

Using summaries only with tstats tells Splunk to search ONLY the data buckets that have had their Data Model map acceleration build completed. It leaves off the attempt to even check for non accelerated data to return. This does mean you can miss data that has not yet been accelerated. Or you can miss data if something happens where acceleration data has to be rebuilt. This often happens in an index cluster after a rolling restart.

Ball park, the accelerated data copy is going to consume an extra 3.4x storage the size of the indexed data. We are trading that storage for speed for the index of the data. So keep that in mind when you decide how much data to accelerate.

Context Gen Search Pattern:

| tstats summariesonly=true count from datamodel=…. by _time… span=1h | stats min, max etc | XS Create/Update

Search Speed:

| tstats count from datamodel=Authentication where nodename=Authentication.Failed_Authentication

“This search has completed and has returned 1 results by scanning 10,081 events in 0.394 seconds”


We can see significant speed increases in the progression across how we constructed the searches.

  1. Traditional Search took 7.2 seconds

  2. tstats took 1.3 seconds

  3. tstats summariesonly=true took 0.4 seconds.

This tells us that when we want to generate stats trends for Extreme Search contexts over large data sets we should use tstats, and with summariesonly=true where we can. That often makes it trivial even in multi TB/day deployments to generate and update our XS search contexts quickly, even over months of data. That is handy when you are trying to “define normal” based on the existing data. All the above speeds are just using Splunk on my late 2012 MacBook Pro. Real indexers etc will perform even better. The point is to show you the gains between the base search methods when building your XS contexts.

The next posts in our series will focus on actual search use cases and the different XS context types.

Splunk Getting Extreme Part One

This is going to be a series of posts on Splunk Extreme Search (XS). Something that is for folks with Splunk Enterprise Security (ES). At this time that is the only way to get Extreme Search. It comes as an included Supporting Addon. You can get the Extreme Search Visualization (XSV) app from splunkbase, but it does not have all the commands to fully use Extreme Search (XS)

Extreme Search is a tool for avoiding making searches that rely on static thresholds. The Splunk documentation example walks through this based on a simple Authentication failure example. http://docs.splunk.com/Documentation/ES/4.5.1/User/ExtremeSearchExample

I like to explain XS this way. You still are filtering search results on a range of values. The difference is rather than hard coding a simple threshold for all data like failureCount > 6 you build a “context” profiled per “class” such as src, user, app so you can search filter on a word like “anomalous”, “medium”, or “extreme”. Those terms are range mapped to values that got calculated based on your existing data. This is way better than simple thresholds.

Contexts are nothing but fancy lookup tables. They actually get created in the desired app context’s lookup folder.

To use XS we need to make the “context” and ideally freshen it up on some time interval using a scheduled search. Then we have our search that uses that context to filter for what we are looking for.

Construction of a context generating search:

  1. We need a search that gets the data we want bucketed into some time chunks that is meaningful to us.

  2. Next we generate the statistics that XS needs to generate our context lookup table for us based on the data.

  3. We calculate/handle the depth of our context by working with the values such as max, min, and what are called cross over points. We will talk more about those shortly.

  4. We add on the context create/update statement.


This example needs both the XS and XSV apps installed. XSV adds a command called xsCreateADContext that we will need. This stands for Extreme Search Create Anomaly Driven Context. All these XS commands are just custom search commands in a Splunk perspective.

We are interested in event per second spikes beyond “normal” for a sending host. We will take advantage of Splunk’s own internal metrics logs to do this.

Context Generation:

This search will give us all metrics events bucketed into 5 minute averages for a host by day of week and hour of day.

index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_hour, date_wday

Next we expand that to generate the overall statistics.

index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_hour, date_wday | stats avg(eps) as average, stdev(eps) as stddev, count by series, date_hour, date_wday


We want to find EPS values that are anomalous to their normal levels. We will be using xsCreateADContext from the XSV app. That command needs the fields min, max, anomalous_normal, and normal_anomalous.

index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_hour, date_wday | stats avg(eps) as average, stdev(eps) as stddev, count by series, date_hour, date_wday | eval min=(average-3*stddev-3), max=(average+3*stddev+3), anomalous_normal=(average-2*stddev-1), normal_anomalous=(average+2*stddev+1)


Last we add the command to create the context file.

index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_hour, date_wday | stats avg(eps) as average, stdev(eps) as stddev, count by series, date_hour, date_wday | eval min=(average-3*stddev-3), max=(average+3*stddev+3), anomalous_normal=(average-2*stddev-1), normal_anomalous=(average+2*stddev+1) | xsCreateADContext name=eps_by_series_5m app=search container=splunk_metrics scope=app terms="anomalous,normal,anomalous" notes="eps by host by 5m" uom="eps" class="series, date_wday, date_hour"


Fields and Depth:

Min: We calculate min to be the average EPS minus 3 times the standard deviation minus 3. We have to subtract off that last 3 in case the standard deviation is zero. If we did not do this we would get a min=max situation when it was zero. XS has to has ranges to work with.

Max: We calculate min to be the average EPS plus 3 times the standard deviation plus 3. We have to add on that last 3 in case the standard deviation is zero. If we did not do this we would get a min=max situation when it was zero. XS has to has ranges to work with.

Anomalous_Normal: This is the cross over point between a low (left side) anomalous section. So it is similar to calculating Min. But we pull it in some from Min by only using 2 times standard deviation and tacking on a 1 to handle the standard deviation being zero.

Normal_Anomalous: This is the cross over point between a high (right side) anomalous section. So it is similar to calculating Max. But we pull it in some from Max by only using 2 times standard deviation and tacking on a 1 to handle the standard deviation being zero.

In my experience so far playing with the computation of min, max and the cross over points are an experiment. In large volume authentication data I have used 5 times standard deviation for min/max and 3 times for the cross over points. What you use will be some trial and error to fit your data and environment. But you have to create a spread or none of your results will have a depth and then you might as well search for all raw events rather than looking for “abnormal” conditions.

Breaking down the xsCreateADContext command:

Name: that is the name of our data context. In this case we called it eps_by_series_5m to represent it is events per second by the series field values in 5 minute averages.

App: this is the app context we want our stuff to exist in within Splunk. In this case we have it make the context file in the search/lookup folder.

Container: this is the name of the csv file that is created in the lookup folder location. The trick to remember here is that the entire csv “container” is loaded into RAM when Splunk uses it. So you want to consider making contexts that have very large numbers of row into their own containers rather than putting multiple named contexts into the same file.

Scope: this is same how to scope access permissions. Normally I just keep it to the app that I am making the context in using the word “app”

Terms: Since we are making an AD context we need to set “anomalous, normal, anomalous” You can understand why when you look at the graphic below. We are saying that the left low side has the word mapped to the ranges as anomalous, the middle range is normal values, then the right high side is anomalous. This is important because when we use this context to search we will say something like “eps is anomalous” which will match any values in the ranges to the left or right of “normal”. This is what I meant by we range map values to words.

Notes and uom: the notes and units of measure fields are just optional. They only matter when you look at the contexts in something like the XSV app GUI.

Class: this is critical as this is saying we are profiling the values BY series, date_wday and date_hour. This is exactly the same as the split by in a stats command in Splunk.

In this chart notice how the light blue middle region is “normal” and to the left and right we have the “anomalous” zones. This helps you visualize what areas you will match when you make compatibility statements like “is normal”, “is anomalous”, ” is above normal”.


Using our Context:

The key to using the context is making sure we search for data with the same time bucketing and split by fields. Otherwise the context value model won’t line up with our real data very well.

There are several XS commands we should be familiar with in getting ready to use the context.

  1. xsFindBestConcept: this takes our search and compares it to our context and gives us a guide on what “term” we should use for that result line if we wanted to get it from the filter.

  2. xsgetwherecix: this shows us all the results without filtering them but gives us the CIX or compatibility fit value based on the compatibility statement we make. Aka “is anomalous”

  3. xswhere: this is the filtering command we will actually use when we are done.


index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_wday, date_hour | xsFindBestConcept eps from eps_by_series_5m by series, date_wday, date_hour in splunk_metrics



index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_wday, date_hour | xsgetwherecix eps from eps_by_series_5m by series, date_wday, date_hour in splunk_metrics is anomalous



index= _internal source=*metrics.log group=per_host_thruput | bucket _time span=5m | stats max(eps) as eps by _time, series, date_wday, date_hour | xswhere eps from eps_by_series_5m by series, date_wday, date_hour in splunk_metrics is anomalous


There out of our data for yesterday only two hours were classified as “anomalous.” We did not have to hard code in specific limiting values. Our own existing data helped create an XS context and we then applied that to data going forward.

Next in our series we will start going through different use cases related to security. We will also cover the other types of contexts than just simply “anomalous.”

Fun with Bitcoin

Just for fun I have been getting reacquainted with bitcoin. Not the mining side of things. I know that it is not cost effective these days to try to mine at home. My experimentation is simply playing with buying small amounts on Coinbase when the price dips.

Next, I ordered the Shift Visa card and tied it to my Coinbase account. It just required linking it to a specific wallet within my Coinable account. I like that feature since I don’t have to keep a balance in that specific wallet until I am ready to use the card. That reduces the risk of the card being used fraudulently and impacting my balance. The card even comes with the current generation of chips as you would expect for a normal Visa card. The card showed up within several business days and activated with no fuss.

I found I could use the Shift card with my Square Cash app as a bank source. I have not tried receiving to the card via my Cash account because it won’t let you send yourself money. I expect it should work. The Shift Visa card was not recognized by Apple Pay.

Offline Wallet:

I have a kit that I got last year, and just started using as part of this exercise. You can see it at bitcoinpaperwallet.com. I like the design because it obscures the private key when folded and secured with the tamper resistant labels. You can order via paypal or by paying in bitcoin if you want to support the creator.

I researched hardware wallets. The Trezor is interesting to me. It has the aspects of an interactive hardware wallet while being offline regarding the private key. Some sites and wallets can interact with the device for approving transactions. They also just recently added Universal Two Factor, U2F support. The only issue is that the physical size is unwieldy for U2F compared to my Yubikey. I just keep my Yubikey 4 on my keys. I still might ask for a Trezor for Christmas just to have one to play with. For now I will just use a paper wallet in a safe deposit box.

There is another honorable mention. The OpenDime. You can read a great review of it over on CoinJournal. It sounds pretty cool but you need to have the use case of transferring a large amount of bitcoin for a purchase in person. It is a physical electronic wallet where the bearer does not know the private key until he/she breaks it open. So it is well suited for loading up a balance and handing it over to someone else as payment with integrity. This is the closest thing to a ShadowRun credstick that I have seen.

Secondary Wallet:

Coinbase does not support sweeping in a paperwallet/private key. They did a long time ago but removed the feature. So I went with Blockchain.info as my second wallet. They offer MANY security verification features to control the account. I turned them all on.

Another feature I like about the Blockchain.info wallet is that you can setup watch only addresses. That let me enter the public key of each of the paper wallets I made so I can track the active balance as I add to them. All without exposing their private keys.

I really wish Coinbase would add the following things:

  • Trezor support. This would make a great auto “vault” destination for a Coinbase account.
  • Google Authenticator support
  • Private key sweeps for paper wallets
  • Watch only addresses

I know a lot of the hard core crypto currency folks are not fond of places like Coinbase. They adhere to the normal financial laws most banks do. For someone like me that is perfectly ok.

Payment Use Cases:

One interesting use case is the Brave web browser. They recently added a method for charging a wallet specific to your web browser. They call it Brave Payments. That then gets paid out to the sites you frequent. Brave escrows the accumulated payments fro Brave users as they try to reach the site owners. According to the Brave documentation the payout threshold triggers at $10 USD in bitcoin. I did toss in $5 worth into Brave and will make the effort to hit my normal reading blog sites until it is spent.

Another use case is some vendors actually take direct bitcoin via their payment processors. Adafruit for maker electronics is one of my favorites. If you are into silver/gold bullion, JM Bullion takes bitcoin at the cash discount price.


All in all it was a fun exercise. Still, keeping everything in Fiat currency and bank cards is the simplest for normal activities. It also avoids the possible loss of value due to the high volatility of bitcoin to USD. I will keep playing with small amounts of bitcoin but, as an average person it’s not for my day to day use. I don’t even see requests for payment in bitcoin at information security/hacking conference dealer rooms. They all just have Square credit card readers.

Splunk uLimits and You

Most folks are familiar with the concept of file descriptors in Unix/Linux. It gets mentioned in the Splunk docs for system requirements under the section “Considerations regarding file descriptor limits (FDs) on *nix systems” and for troubleshooting.

I run a very high volume index cluster on a daily basis. Complete with Splunk Enterprise Security. One thing I have seen is if you have timestamps off you can get a VERY LARGE number of buckets for low overall raw data size. If you see nearly 10000 buckets for only several hundred GB of data then you have that problem. Keep in mind that is a lot of file descriptors potentially in use. You should check your incoming logs and you will likely find some nasty multi line log file having a line breaking issue where some large integer is getting parsed as an epoch time and causing buckets with timestamps way back in time.

It got me thinking about the number of open files though. Especially, when also being concerned with all the buckets for data model accelerations to be built for supporting the Enterprise Security application. Maybe FD limits have been interfering with my data model acceleration bucket builds.

Then we had a couple of indexers spontaneously crash their splunkd processes. With an error indicating file descriptor limit problems.

I discussed it with my main Splunk partner in crime, Duane Waddle. He explained that if a process starts on it’s own without a user session that Linux might not honor ulimits from limits.conf. So even though we had done the right things accounting for ulimits, Transparent Huge Pages etc that we were still likely getting hosed.

Such as this example from /etc/security/limits.conf using a section like below for a high volume indexer in a cluster:

You might be getting the 4096 default if Splunk is kicking off via the enable boot-start option.

You can test this by logging into your server then do the following:

Check the results looking for the Max open files.

Duane suggested editing the Splunk init file. My coworker Matt Uebel ran with that and came up with the follow quick commands to make that edit. Use the following commands substituting your desired limits values.

Now when your system fully reboots and Splunk starts via enable-bootstart without a user session you should still get the desired ulimits values.

Review – M3D Micro 3D Printer

I wanted a 3D printer for a while. So, I have been watching Noe and Pedro Ruiz with Adafruit. They have a great show on the Adafruit YouTube channel called 3D Thursdays. Originally, I was holding out for the Flash Forge Creator Pro. Then Adafruit added the retail version of the M3D to their store. At just under $500, that fit a gift card I had been holding onto. It was also a simpler printer for someone getting started.

I am a digital guy. So this whole real world 3D printing thing is new to me beyond watching the Adafruit team. What follows are the things I ran into from the point of view of a complete rookie in this area. I had to have concepts and terms click that experienced folks with printers take for granted.

Buying the printer

I mentioned I had a gift card. It was the typical visa type card. It was enough to cover the printer. But, I wanted some other items from Adafruit when I ordered. The purchasing system will not let you specify multiple credit cards and how much to apply to which card. Some creativity let me work around that limitation. I purchased an Adafruit gift certificate with the Visa gift card. It promptly came via email. I then applied that “to my Adafruit account creating a store balance. That allowed the printer to deduct from that balance and overflow costs to my credit card for the extra items like filament, etc just like I wanted. One of those items was a good digital calipers tool. Critical later when you want to print your own items.

I also made sure to wait till a Thursday to order. This let me use the 3D Thursday discount code they give out on the show good to midnight. Awesome that it saves me money, but it also lets Adafruit know the sale is because of Noe and Pedro’s hard work. PS saving me money really meant I ordered more to compensate. I had a budget I had set so I used it all.

Out of the box

There are plenty of unboxing videos out there for the micro 3D. It was well packaged. Just be sure to follow their directions step by step. Do not forget to remove all the tape, foam inserts and gantry clips before hooking it to power and USB.

I made sure to have a flat stable table with room for a filament spool stand next to the printer.

Videos I found useful:

Mac vs Windows

I am a Mac person. The current version works but keep in mind the Windows version is ahead of the Mac version in features and firmware. Whenever you start up the M3D software regardless of platform it will check the firmware version. The software and firmware versions are intertwined best I can tell. If I update firmware to print from the Windows version when I go back to the Mac side then the firmware must downgrade before I can use the Mac M3d software. Same going back the other way.


My particular printer does not print center of the bed when on the Mac firmware (2015-10-23-03) despite what the M3D software shows before printing. The print head can go to the center when told. I had even tried the full system recalibrate. The problem goes away when doing the exact same print from Windows with newer firmware (Beta 2016-01-08-12).

You can get scared you bricked your printer if the update gets interrupted. So far I have been able to just go back to the Mac side force a fresh downgrade to recover. There is a tech note on firmware updating in the tech support pages.

Filament and feeding

The biggest problem I had with printing was getting my head around good filament feeding to the printer. Most of the time the internal feed path from under the print bed worked reliably. At times it would still catch. You know when you have filament binding/friction issues because your print will skew as it builds. Drag causes higher layers to be off compared to where they should be.

Remember I said I’m a digital guy? Yeah.. I was dumb and just put the external PLA spool on a hatch box spool holder I got from Amazon. Without what is called a spool bearing. Meaning that it didn’t fit centered and thus did not rotate feeding filament when gently pulled. That gave me most of my skew problems. When Pedro via Twitter pointed out skew means friction I went after fixing that. Did I mention these guys are great about sharing their knowledge? And without making me feel dumb for not seeing the obvious.

I customized and printed two of this bearing from thingiverse. Remember those digital calipers? Came in handy here. But only after I read this Make article on how to use them. When I first took them out, I had an image of Noe and Pedro dressed as wizards waving around digital calipers like a magic wand. I had to measure the hole in the PLA spool and the tube on the holder then customize the bearing print accordingly. They are not a perfect fit because I’m a noob. They probably need to be a hair bigger or have some sort of locking washer to hold them in. Still good enough for me and now I can gently tug with two fingers on the filament and see it turn the spool without catching.

Another thing I learned. Not to be afraid of the emergency stop or abort print buttons. Several times I had not calibrated after changing filament or bumped the print head taking out a print. I could tell I was getting skew or bad layer bonding early. Just be sure if you use emergency stop to use the set bed clear button before you can print again. Calibrating the bed position again isn’t bad idea either. It is better than wasted print time and filament. And this unit is SLOW, but seriously what did you expect for such an easy to use printer for $500?

PLA vs Flex (tough 3D)

So far the best prints I have gotten from the M3D have been with their own PLA 3D ink filament. I have some blue PLA I got from Adafruit and it works, but not as well for when comparing my best prints.

Flexible aka tough filament can be Ninjaflex that I got from Adafruit or the new “tough 3d ink” that I got from Micro3D directly. I haven’t opened my Ninjaflex roll yet. But, I have tried the tough ink. You will get absolutely miserable layer bonding if there is any skew at all due to filament binding. It’s obviously because it ends up a big spaghetti mess instead of the object you expected.

I seem to get way better results printing the tough ink filament from Windows with the updated software and firmware that “knows” the filament cheat codes for the new tough ink. On the Mac version, you have to trick it and setup a custom filament profile. That is another reason I wish they would keep the Mac and Windows in sync.


I have been around IT a long time. The concept of a printer language was not new to me. So the slicing/gcode thing didn’t throw me for a loop.

Slicing is where software takes our 3D object and turns it into printer language (gcode). That gcode are the actual actions the printer takes to put the filament where it needs to go to create our object. The M3D software does a good starting job at this. I did buy a copy of Simplify3D to get more efficient prints with better support structure. The only downside to using Simplify3d is that you cannot just hit print when ready and have the printer start up. The M3D uses some special serial port communications protocol that prevents Simplify3D from talking directly to the printer. So you have to print “tool path to file” then use the “add spool job” in the M3D spooler engine to print that file. Similar to what you see in this gcode to M3D YouTube video. I found that I have better control over support structures and overall printing speed seems better due to Simplify3D being smarter than the M3D software itself. Another great feature of Simplify3D is that it lets you animate a preview of how the object will print. so you can look for problems before spending an hour or more on a print.

I do need to spend more time setting up established print profiles in Simplify3D for quality and filament types I want to use often.


This is not an option in M3D software, but is something you can have enabled in Simplify3D. The benefit to me so far is that it gives the printer a chance to purge out filament as it gets warmed up to print my object. That leaves the excess filament off to the side instead of on my object or throwing the raft out of whack.


Raft? I almost always print one on the M3D. At first filament adhesion to the print bed was not an issue. It did get worse over time with many prints. So the printing of a raft gives the print a more level footing. The downside is that often the raft is harder to break away at higher print resolutions for me. I could probably improve this if I get my head around what all the numbers mean in the Simplify3D settings. That is again something you have little to no control over in the M3D software natively.

The rafts on my first M3D software based prints when the printer was new broke away great. Seems both Simplify3D and M3D generated rafts have fused more with the objects than they did at first. I suspect either operator error or all the knocking around. Or maybe it’s print quality related. The higher the settling the more heat that gets to small area on this printer.


The Adafruit guys love Octoprint. The idea of using an idle Raspberry Pi2 for a print server is certainly attractive. It would save me from leaving my laptop attached to the printer for hours when I’d rather take it with me to Starbucks. You can even turn on mobile interface for your phone or tablet.

I tried using Octoprint from my raspberry pi 2. It was unbearably slow on my B+ so just stuck with my Pi2. I simply could not get it working with my M3D when starting from the Mac firmware. Octoprint wanted it’s own firmware update of the M3D to let it communicate. Even after letting the firmware update my printer, nothing would work. I kept having to downgrade back to the Mac firmware version.

Next, I tried using Octoprint with the beta windows firmware. It let me communicate to the printer and did not prompt to upgrade firmware via Octoprint. I could move the head around. When I tried to print a gcode file that I previously ran with M3D spooler; the print head tried to go up out the top of the printer. So I figured I needed to calibrate from within Octoprint. That bought me a small burn hole into the front left as it moved the printhead outside the bed area. I would NOT be messing with Octoprint and M3D if you are a rookie like me. I am giving up on it until better step by step tutorials are out by experienced folks.

Updated There is a M3D-Octoprint tutorial on Adafruit that mentions leveling each corner manually. It is all on me for not reading that tutorial over again before messing with Octoprint.


I will make one comment about M3D support. When I first started having issues I was worried indicated printer hardware alignment in the first few days I sent in a support ticket. They are either so busy or so understaffed I have only received automated ticket email on it. That is even days later. I have emailed back asking them to close the ticket. If fast technical support on the retail version is a concern, you should take that into consideration before buying.

I love my M3D as someone new to 3D printing. I have learned a lot and made some mistakes. Hopefully, if you are as new as I am to 3D printing you can learn from my experience so far. It will continue to be good for portable printing and small lower detail parts. I expect in a few months I’ll graduate to the Flash Forge Creator Pro unless something better comes out for 2016.

Splunk, Adafruit.io, and MQTT

I have been enjoying the Splunk HTTP Event Collector (HEC) since it’s introduction in Splunk v6.3. You can check out a python class I made for it over on the Splunk Blog. That got me started back on data collection from my Raspberry Pi. I can just send data straight into Spunk using the HEC. But what if I wanted data from a remote Raspberry Pi?


That brought me back to messing around with my Beta Adafruit.io account. This is a data bus service being made by Adafruit perfect for your DIY Internet of Things projects. You can find a lot of their learning tutorials on it in the Adafruit LMS. I did some minor playing over the holiday. Then Lady Ada went and made a tutorial specifically on MQTT.

MQTT and Splunk:

I remember seeing a modular input for MQTT in Splunkbase. Why not try it out with Adafruit.io? Well the answer was… Its java dependent. I love Damien’s work which is awesome as always. But, the Splunk admin hat side of me cannot stand having to install Java to make a feature work. He is trying to convince me to made a Python based version myself. We shall see if I can make the time. Was there an alternative? Why… yes there is. That is how we come back full circle to the the HTTP Event Collector and my python class.

Mixing Chocolate and Peanut Butter:

I took the Adafruit Python class for adafruit.io and it’s example code. Just import in my HEC class and mod the Adafruit code just a little. Now we have a bridge between the Adafruit MQTT client example and sending it into Splunk via the HEC. This let me take the feed value posted to a give MQTT feed on Adafruit.io and send it into Splunk with a single listening Raspberry Pi running a python script local to my Splunk instance.

The code I used was the MQTT Client example. Just add import and creation of an HEC object at the top of the script right before the Adafruit_IO import section

Next we add the following to the bottom of the message method in the Adafruit code.

That is it. Now as long as the script is running it takes the value from a monitored Adafruit.io MQTT feed and kicks it over into Splunk via the HEC. Enjoy!

Splunk TA-Openphish

Perhaps I should have waited till Friday to release something related to Phishing. Yeah bad humor, Phish Fryday…

I want to test things a little more before putting this to apps.splunk.com. However, you can find the TA-Openphish over on my Git Repo. It indexes the feed that Openphish provides you. The readme gives you all the items to consider and setup. I provided a way to filter what gets indexed based on ASN or Brand. You can even combine them for an ADD type filter. However, the Openphish feed is fairly small so I recommend at least starting out to index the whole thing unfiltered.

I also provided Splunk Enterprise App modular inputs for threatlist correlation integration. As I do not have ES here at home I have not recently tested that.  Jack Coates of Splunk did test my initial threat list for the IPs over the weekend and said it worked fine. Big thanks Jack! I appreciate getting a slice of your very busy time.

I also want to look at expanding this to Critical Stack processed feeds. Maybe, I can normalize Phishtank and Openphish feeds together through it for more coverage on brand protection information going into Splunk.

 ** Note March 5, 2014: corrected Critical Stack link from Threat Stack link.**