Part five brings another use case. We will use values in raw data not a calculated value to make our context and then match against the raw events without bucketing them.
First, we have glossed over an important question. What does data match when you use xsWhere and there is no matching class in the context? It uses the “default” class. Default is the weight average of all the existing classes within the context. If you look within the csv for the container you will find lines for your context where the class is an empty string “”. That is default. Default is also what is made for the class when no class is specified.
You get a message like the following when a class value is not in the context you are trying to use.
xsWhere-I-111: There is no context 'urllen_by_src' with class 'Not Found' from container '18.104.22.168' in scope 'none', using default context urllen_by_src
Use Case: Finding long urls of interest
Just the longer URLs:
Let’s try just creating a context of all our url_length data from the Web Data Model. This version of the search will not break this up by class. We will just see if we can find “extreme” length urls in our logs based on the log data itself.
| tstats dc(Web.url_length) as count, avg(Web.url_length) as average, min(Web.url_length) as min, max(Web.url_length) as max from datamodel=Web where Web.src!="unknown" | rename Web.* as * | xsCreateDDContext name=urllen container=web_stats app=search scope=app type=domain terms="minimal,low,medium,high,extreme" notes="urllen" uom="length"
The table that is displayed when the xsCreateDDContext finishes is interesting. Below we sort for extreme and see the urllen value is 678. This tells us in my data the url_length value high end is around 678 characters. If we search the logs using this context we find that our results are not a magic all bad “is extreme” situation. All the interesting URLs are down in the low/medium ranges with all the good urls. You have to come up with a another way to slice data when the signal and noise are so close to each other. This approach might work for some other use case, but not for this particular data set.
index=weblogs | xswhere url_length from urllen in web_stats is low | stats count by url
We get an overwhelming number of matches since most of our URLs are in the low range.
index=weblogs | xswhere url_length from urllen in web_stats is extreme | stats count by url
We get a manageable 5 events from extreme but they are not interesting URLs.
URL Length by Src:
We get a different url_length distribution if we break it out by src. Remember, default is the weighted average of all the classes in the context if you use classes. The table we see when our context gen finishes is that default.
Notice in our by src version our urllen for extreme in the default context is around 133. That is going to come from the weighted average of the per source classes.
| tstats avg(Web.url_length) as average, min(Web.url_length) as min, max(Web.url_length) as max from datamodel=Web where Web.src!="unknown" by _time, Web.src span=1m | rename Web.* as * | stats count, min(min) as min, max(max) as max, avg(average) as average by src | eval max=if(min=max, max+average, max) | eval max=if(max-min < average , max+average, max) | xsCreateDDContext name=urllen_by_src container=web_stats app=search scope=app type=domain terms="minimal,low,medium,high,extreme" notes="url len by src" uom="length" class="src"
index=weblogs | xsWhere url_length from urllen_by_src in web_stats by src is very very extreme | stats values(src) AS sources, dc(src) as sourceCount by url, status_description
Even using is very very extreme we get a lot of results. However the urls are much more interesting. Granted none of the searches here in Part Five are super awesome. They do show a workable example of using and XS context directly against raw events. We also get a good comparison of a classless context which does what it is supposed to vs with a class that helps draw out more interesting events. Formulating your XS context and your search questions are very important so you really have to think about what question you are trying to answer and experiment with variations against your own data.
In my data I find interesting URLs trying to redirect through my site but they land on a useless wordpress page.