This functionality is experimental and may be changed or removed completely in a future release.
A parent pipeline aggregation, which executes a script which can perform per bucket computations on specified metrics in the parent multi-bucket aggregation. The specified metric must be numeric and the script must return a numeric value.
Want to become a master in Elasticsearch Enroll here for Free Elasticsearch Online Training Demo!
A bucket_script aggregation looks like this in isolation:
{
"bucket_script": {
"buckets_path": {
"my_var1": "the_sum",
"my_var2": "the_value_count"
},
"script": "my_var1 / my_var2"
}
}
Here, my_var1 is the name of the variable for this buckets path to use in the script, the_sum is the path to the metrics to use for that variable.
bucket_script Parameters
Parameter Name | Description | Required | Default Value |
script | The script to run for this aggregation. The script can be inline, file or indexed. | Required | |
buckets_path | A map of script variables and their associated path to the buckets we wish to use for the variable | Required | |
gap_policy | The policy to apply when gaps are found in the data | Optional, defaults to skip | |
format | format to apply to the output value of this aggregation | Optional, defaults to null |
The following snippet calculates the ratio percentage of t-shirt sales compared to total sales each month:
{
"aggs" : {
"sales_per_month" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "price"
}
},
"t-shirts": {
"filter": {
"term": {
"type": "t-shirt"
}
},
"aggs": {
"sales": {
"sum": {
"field": "price"
}
}
}
},
"t-shirt-percentage": {
"bucket_script": {
"buckets_path": {
"tShirtSales": "t-shirts>sales",
"totalSales": "total_sales"
},
"script": "tShirtSales / totalSales * 100"
}
}
}
}
}
}
And the following may be the response:
{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2015/01/01 00:00:00",
"key": 1420070400000,
"doc_count": 3,
"total_sales": {
"value": 50
},
"t-shirts": {
"doc_count": 2,
"sales": {
"value": 10
}
},
"t-shirt-percentage": {
"value": 20
}
},
{
"key_as_string": "2015/02/01 00:00:00",
"key": 1422748800000,
"doc_count": 2
"total_sales": {
"value": 60
},
"t-shirts": {
"doc_count": 1,
"sales": {
"value": 15
}
},
"t-shirt-percentage": {
"value": 25
}
},
{
"key_as_string": "2015/03/01 00:00:00",
"key": 1425168000000,
"doc_count": 2,
"total_sales": {
"value": 40
},
"t-shirts": {
"doc_count": 1,
"sales": {
"value": 20
}
},
"t-shirt-percentage": {
"value": 50
}
}
]
}
}
}
Frequently Asked Elasticsearch Interview Questions & Answers
This functionality is experimental and may be changed or removed completely in a future release.
A metric aggregation that executes using scripts to provide a metric output.
Example:
{
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : "_agg['transactions'] = []",
"map_script" : "if (doc['type'].value == "sale") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }",
"combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit",
"reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit"
}
}
}
}
map_script is the only required parameter
The above aggregation demonstrates how one would use the script aggregation compute the total profit from sale and cost transactions.
The response for the above aggregation:
{
...
"aggregations": {
"profit": {
"value": 170
}
}
}
The above example can also be specified using file scripts as follows:
{
"query" : {
"match_all" : {}
},
"aggs": {
"profit": {
"scripted_metric": {
"init_script" : {
"file": "my_init_script"
},
"map_script" : {
"file": "my_map_script"
},
"combine_script" : {
"file": "my_combine_script"
},
"params": {
"field": "amount"
},
"reduce_script" : {
"file": "my_reduce_script"
},
}
}
}
}
script parameters for init, map and combine scripts must be specified in a global params object so that it can be shared between the scripts
Related Page: Learn About Searching Data In Elasticsearch
By now it should, hopefully, be clear that aggregations are generated values based on the documents that match a search request.
There are a ton of use cases for aggregations. If we use ElasticSearch to analyze logs or statistical data, we can use aggregations to extract information from the data, such as the number of HTTP requests per URL, average call time to a call center per day of the week or number of restaurants that are open on Sundays in different geographical areas.
One especially powerful and interesting aggregation type when analyzing data is the significant_- terms aggregation. This aggregation type allows us to find things in a foreground set compared to a background set (such as all support tickets).
Another use case for aggregations is navigation. In such cases, we may use aggregations to generate a list of categories based on the content on a website to build a menu, or we may aggregate values from many different fields from documents that match a search query to allow users to narrow their search. The below screen shot from Amazon illustrates an example of the latter:
An example of how facets/aggregations are used to filter search results on Amazon.com.
Name | Dates | |
---|---|---|
Elasticsearch Training | Oct 19 to Nov 03 | View Details |
Elasticsearch Training | Oct 22 to Nov 06 | View Details |
Elasticsearch Training | Oct 26 to Nov 10 | View Details |
Elasticsearch Training | Oct 29 to Nov 13 | View Details |
Yamuna Karumuri is a content writer at Mindmajix.com. Her passion lies in writing articles on IT platforms including Machine learning, PowerShell, DevOps, Data Science, Artificial Intelligence, Selenium, MSBI, and so on. You can connect with her via LinkedIn.