Splunk Stats

Calculates aggregate statistics over the results set, such as average, count, and sum. This is similar to SQL aggregation. If stats are used without a by clause only one row is returned, which is the aggregation over the entire incoming result set. If you use a by clause one row is returned for each distinct value specified in the by clause. The stats command calculates statistics based on the fields in your events. 

Accelerate Your career with splunk Training and become expertise in splunk Enroll For Free Splunk Training Demo!

Syntax

Simple: stats (stats-function(field) [AS field])... [BY field-list] 
Complete: stats [partitions=] [allnum=] [delim=] ( ... | ... ) [] 

stats-agg-term

  • Syntax: ( | ) [AS ]
  • Description:  statistical aggregation function. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wildcard characters in field names. 

sparkline-agg-term

  • Syntax: [AS ]
  • Description:  sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wildcard characters in the field name.

Optional arguments

allnum

  • syntax: allnum=
  • Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.
  • Default: false

delim

  • Syntax: delim=
  • Description: Specifies how the values in the list() or values() aggregation are delimited. 
  • Default: a single space

by-clause

  • Syntax: BY
  • Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.

partitions

  • Syntax: partitions=
  • Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce. 
  • Default: 1

Stats function options

stats-function

  • Syntax: avg() | c() | count() | dc() | distinct_count() | earliest() | estdc() | estdc_error() | exactperc() | first() | last() | latest() | list() | max() | median() | min() | mode() | p() | perc() | range() | stdev() | stdevp() | sum() | sumsq() | upperperc() | values() | var() | varp()
  • Description: Functions used with the stats command. Each time you invoke the stats command, you can use more than one function. However, you can use only one BY clause. 

Frequently Asked Splunk Interview Questions

Sparkline function options

Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. 

sparkline-agg

  • Syntax: sparkline (count(), ) | sparkline ((), )
  • Description: A sparkline specifier, which takes the first argument of an aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section. 

sparkline-func

  • Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
  • Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.

MindMajix YouTube Channel

Usage

Eval expressions with statistical functions

When you use the stats command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example: 
index=* | stats count(eval(status="404")) AS count_status BY sourcetype

Related Page: Splunk Eval Commands With Examples

Functions and memory usage

Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count function requires far more memory than the count function. The values and list functions also can consume a lot of memory. 

If you are using the distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count function with the estdc function (estimated distinct count). The estdc function might result in significantly lower memory usage and run times. 

Event order functions

Using the first and last functions when searching based on time does not produce accurate results. 

  • To locate the first value based on time order, use the earliest function, instead of the first function.
  • To locate the last value based on time order, use the latest function, instead of the last function.

For example, consider the following search. 

index=test sourcetype=testDb | eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats first(startTime) AS startTime, first(status) AS status, first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId

Replace the first and last functions when you use the stats and eventstats commands for ordering events based on time. The following search shows the function changes. 

index=test sourcetype=testDb | eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats latest(startTime) AS startTime, latest(status) AS status, latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId

Checkout Splunk Sample Resumes

Wildcards in BY clauses

The stats command does not support wildcard characters in field values in BY clauses. 
For example, you cannot specify | stats count BY source*. 

Renaming fields

You cannot rename one field with multiple names. For example, if you have field A, you cannot rename A as B, A as C. The following example is not valid. 
... | stats first(host) AS site, first(host) AS report

Examples

1. Return the average transfer rate for each host

sourcetype=access* | stats avg(kbps) BY host

2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"

Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain". 
sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total

Related Page: Splunk Enterprise Security

Detailed Examples.

1. Count the number of events by HTTP status and host

Count the number of events for a combination of HTTP status code values and host: 
sourcetype=access_* | chart count BY status, host
This creates the following table

Count the number of events by HTTP status and host

2. Determine how much email comes from each domain

This example will show how much mail coming from which domain.

sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@") | eval from_domain=mvindex(accountname,-1) | stats count(eval(match(from_domain, "[^nrs]+.com"))) AS ".com", count(eval(match(from_domain, "[^nrs]+.net"))) AS ".net", count(eval(match(from_domain, "[^nrs]+.org"))) AS ".org", count(eval(NOT match(from_domain, "[^nrs]+.(com|net|org)"))) AS "other"

The first half of this search uses eval to break up the email address in the mail from the field and define the from_domain as the portion of the mail from the field after the @ symbol. 

The results are then piped into the stats command. The count() function is used to count the results of the eval expression. Here, eval uses the match() function to compare the from_domain to a regular expression that looks for the different suffixes in the domain. If the value of from_domain matches the regular expression, the count is updated for each suffix, .com, .net, and .org. Other domain suffixes are counted as other. 

Related Page: Splunk Streamstats Command

This produces the following results table: 

Determine how much email comes from each domain

 

Course Schedule
NameDates
Splunk TrainingSep 14 to Sep 29View Details
Splunk TrainingSep 17 to Oct 02View Details
Splunk TrainingSep 21 to Oct 06View Details
Splunk TrainingSep 24 to Oct 09View Details
Last updated: 23 Feb 2024
About Author

 

Madhuri is a Senior Content Creator at MindMajix. She has written about a range of different topics on various technologies, which include, Splunk, Tensorflow, Selenium, and CEH. She spends most of her time researching on technology, and startups. Connect with her via LinkedIn and Twitter .

read less