• (4.0)

Calculates aggregate statistics over the results set, such as average, count, and sum. This is similar to SQL aggregation. If stats is used without a by clause only one row is returned, which is the aggregation over the entire incoming result set. If you use a by clause one row is returned for each distinct value specified in the by clause. The stats command calculates statistics based on fields in your events. 

These core tutorials will help you to learn the fundamentals of Splunk. For an in-depth understanding and practical experience, explore Online Splunk Training.


Simple: stats (stats-function(field) [AS field])... [BY field-list] 
Complete: stats [partitions=] [allnum=] [delim=] ( ... | ... ) [] 


Syntax: ( | ) [AS ]
Description:  statistical aggregation function. The function can be applied to an eval expression, or to a field or set of fields. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in field names. 


Syntax: [AS ]
Description:  sparkline aggregation function. Use the AS clause to place the result into a new field with a name that you specify. You can use wild card characters in the field name.

Optional arguments


syntax: allnum=
Description: If true, computes numerical statistics on each field if and only if all of the values of that field are numerical.
Default: false


Syntax: delim=
Description: Specifies how the values in the list() or values() aggregation are delimited. 
Default: a single space


Syntax: BY
Description: The name of one or more fields to group by. You cannot use a wildcard character to specify multiple fields with similar names. You must specify each field separately.


Syntax: partitions=
Description: If specified, partitions the input data based on the split-by fields for multithreaded reduce. 
Default: 1

Stats function options


Syntax: avg() | c() | count() | dc() | distinct_count() | earliest() | estdc() | estdc_error() | exactperc() | first() | last() | latest() | list() | max() | median() | min() | mode() | p() | perc() | range() | stdev() | stdevp() | sum() | sumsq() | upperperc() | values() | var() | varp()
Description: Functions used with the stats command. Each time you invoke the stats command, you can use more than one function. However, you can use only one BY clause. 

Learn more about Splunk Interview Questions in this blog post.

Sparkline function options

Sparklines are inline charts that appear within table cells in search results to display time-based trends associated with the primary key of each row. 


Syntax: sparkline (count(), ) | sparkline ((), )
Description: A sparkline specifier, which takes the first argument of a aggregation function on a field and an optional timespan specifier. If no timespan specifier is used, an appropriate timespan is chosen based on the time range of the search. If the sparkline is not scoped to a field, only the count aggregator is permitted. You can use wildcard characters in the field name. See the Usage section. 


Syntax: c() | count() | dc() | mean() | avg() | stdev() | stdevp() | var() | varp() | sum() | sumsq() | min() | max() | range()
Description: Aggregation function to use to generate sparkline values. Each sparkline value is produced by applying this aggregation to the events that fall into each particular time bin.


Eval expressions with statistical functions

When you use the stats command, you must specify either a statistical function or a sparkline function. When you use a statistical function, you can use an eval expression as part of the statistical function. For example: 
index=* | stats count(eval(status="404")) AS count_status BY sourcetype

Related Page:: Learn Splunk Eval Commands With Examples

Functions and memory usage

Some functions are inherently more expensive, from a memory standpoint, than other functions. For example, the distinct_count function requires far more memory than the count function. The values and list functions also can consume a lot of memory. 

If you are using the distinct_count function without a split-by field or with a low-cardinality split-by by field, consider replacing the distinct_count function with the the estdc function (estimated distinct count). The estdc function might result in significantly lower memory usage and run times. 

Event order functions

Using the first and last functions when searching based on time does not produce accurate results. 

  • To locate the first value based on time order, use the earliest function, instead of the first function.
  • To locate the last value based on time order, use the latest function, instead of the last function.

For example, consider the following search. 

index=test sourcetype=testDb | eventstats first(LastPass) as LastPass, last(_time) as mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats first(startTime) AS startTime, first(status) AS status, first(histID) AS currentHistId, last(histID) AS lastPassHistId BY testCaseId

Replace the first and last functions when you use the stats and eventstats commands for ordering events based on time. The following search shows the function changes. 

index=test sourcetype=testDb | eventstats latest(LastPass) AS LastPass, earliest(_time) AS mostRecentTestTime BY testCaseId | where startTime==LastPass OR _time==mostRecentTestTime | stats latest(startTime) AS startTime, latest(status) AS status, latest(histID) AS currentHistId, earliest(histID) AS lastPassHistId BY testCaseId

Download Free Splunk Samples Resumes - Prepared by Industry Experts

Wildcards in BY clauses

The stats command does not support wildcard characters in field values in BY clauses. 
For example, you cannot specify | stats count BY source*. 

Renaming fields

You cannot rename one field with multiple names. For example, if you have field A, you cannot rename A as B, A as C. The following example is not valid. 
... | stats first(host) AS site, first(host) AS report


1. Return the average transfer rate for each host

sourcetype=access* | stats avg(kbps) BY host

2. Search the access logs, and return the total number of hits from the top 100 values of "referer_domain"

Search the access logs, and return the total number of hits from the top 100 values of "referer_domain". The "top" command returns a count and percent value for each "referer_domain". 
sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) AS total

Related Page:: Learn Splunk Enterprise Security

Detailed Examples.

1. Count the number of events by HTTP status and host

Count the number of events for a combination of HTTP status code values and host: 
sourcetype=access_* | chart count BY status, host
This creates the following table

2. Determine how much email comes from each domain

This example will show how much mail coming from which domain.

sourcetype="cisco_esa" mailfrom=* | eval accountname=split(mailfrom,"@") | eval from_domain=mvindex(accountname,-1) | stats count(eval(match(from_domain, "[^nrs]"))) AS ".com", count(eval(match(from_domain, "[^nrs]"))) AS ".net", count(eval(match(from_domain, "[^nrs]"))) AS ".org", count(eval(NOT match(from_domain, "[^nrs]+.(com|net|org)"))) AS "other"

The first half of this search uses eval to break up the email address in the mailfrom field and define the from_domain as the portion of the mailfrom field after the @ symbol. 

The results are then piped into the stats command. The count() function is used to count the results of the eval expression. Here, eval uses the match() function to compare the from_domain to a regular expression that looks for the different suffixes in the domain. If the value of from_domain matches the regular expression, the count is updated for each suffix, .com, .net, and .org. Other domain suffixes are counted as other. 

Related Page:: Splunk Streamstats Command

This produces the following results table: 

Popular Courses in 2018

Get Updates on Tech posts, Interview & Certification questions and training schedules