Mindmajix

ElasticSearch Bucket-and-Scripted-Metric-Aggregations

Bucket Script Aggregation

This functionality is experimental and may be changed or removed completely in a future release.

A parent pipeline aggregation, which executes a script which can perform per bucket computations on specified metrics in the parent multi-bucket aggregation. The specified metric must be numeric and the script must return a numeric value.

Syntax

A bucket_script aggregation looks like this in isolation:

{
    "bucket_script": {
        "buckets_path": {
            "my_var1": "the_sum", 
            "my_var2": "the_value_count"
        },
        "script": "my_var1 / my_var2"
    }
}
Here, my_var1 is the name of the variable for this buckets path to use in the script, the_sum is the path to the metrics to use for that variable.

bucket_script Parameters
Parameter Name Description Required Default Value
script The script to run for this aggregation. The script can be inline, file or indexed.
Required  
buckets_path A map of script variables and their associated path to the buckets we wish to use for the variable
Required  
gap_policy The policy to apply when gaps are found in the data
Optional, defaults to skip  
format format to apply to the output value of this aggregation Optional, defaults to null  

The following snippet calculates the ratio percentage of t-shirt sales compared to total sales each month:

{
 "aggs" : {
   "sales_per_month" : {
       "date_histogram" : {
          "field" : "date",
          "interval" : "month"
},
 "aggs": {
   "total_sales": {
        "sum": {
          "field": "price"
   }
},
  "t-shirts": {
   "filter": {
        "term": {
           "type": "t-shirt"
  }
},
  "aggs": {
     "sales": {
         "sum": {
             "field": "price"
                 }
              }
           }
},
   "t-shirt-percentage": {
       "bucket_script": {
       "buckets_path": {
           "tShirtSales": "t-shirts>sales",
           "totalSales": "total_sales"
                        },
             "script": "tShirtSales / totalSales * 100"
                    }
                }
            }
        }
    }
}

And the following may be the response:

{
 "aggregations": {
    "sales_per_month": {
       "buckets": [
          {
           "key_as_string": "2015/01/01 00:00:00",
           "key": 1420070400000,
           "doc_count": 3,
           "total_sales": {
             "value": 50
         },
           "t-shirts": {
              "doc_count": 2,
                "sales": {
                  "value": 10
         }
},
           "t-shirt-percentage": {
              "value": 20
      }
},
  {
     "key_as_string": "2015/02/01 00:00:00",
     "key": 1422748800000,
     "doc_count": 2
     "total_sales": {
        "value": 60
    },
        "t-shirts": {
         "doc_count": 1,
          "sales": {
           "value": 15
               }
    },
       "t-shirt-percentage": {
        "value": 25
            }
         },
    {
       "key_as_string": "2015/03/01 00:00:00",
         "key": 1425168000000,
         "doc_count": 2,
         "total_sales": {
         "value": 40
       },
          "t-shirts": {
           "doc_count": 1,
             "sales": {
               "value": 20
       }
},
          "t-shirt-percentage": {
             "value": 50
               }
            }
         ]
      }
   }
}

Scripted Metric Aggregation

This functionality is experimental and may be changed or removed completely in a future release.

A metric aggregation that executes using scripts to provide a metric output.

Example:

{
  "query" : {
      "match_all" : {}
  },
  "aggs": {
      "profit": {
          "scripted_metric": {
              "init_script" : "_agg['transactions'] = []",
              "map_script" : "if (doc['type'].value == \"sale\") { _agg.transactions.add(doc['amount'].value) } else { _agg.transactions.add(-1 * doc['amount'].value) }", 
              "combine_script" : "profit = 0; for (t in _agg.transactions) { profit += t }; return profit",
              "reduce_script" : "profit = 0; for (a in _aggs) { profit += a }; return profit"
         }
      }
   }
}
map_script is the only required parameter

The above aggregation demonstrates how one would use the script aggregation compute the total profit from sale and cost transactions.

The response for the above aggregation:

{
    ...

    "aggregations": {
        "profit": {
            "value": 170
        }
   }
}

The above example can also be specified using file scripts as follows:

{
    "query" : {
        "match_all" : {}
    },
    "aggs": {
        "profit": {
            "scripted_metric": {
                "init_script" : {
                    "file": "my_init_script"
                },
                "map_script" : {
                    "file": "my_map_script"
                },
                "combine_script" : {
                    "file": "my_combine_script"
                },
                "params": {
                    "field": "amount" 
                },
                "reduce_script" : {
                    "file": "my_reduce_script"
                },
            }
        }
    }
}

script parameters for init, map and combine scripts must be specified in a global params object so that it can be shared between the scripts

What are aggregations good for?

By now it should, hopefully, be clear that aggregations are generated values based on the documents that match a search request.

There are a ton of use cases for aggregations. If we use ElasticSearch to analyze logs or statistical data, we can use aggregations to extract information from the data, such as the number of HTTP requests per URL, average call time to a call center per day of the week or number of restaurants that are open on Sundays in different geographical areas.

One especially powerful and interesting aggregation type when analyzing data is the significant_- terms aggregation. This aggregation type allows us to find things in a foreground set compared to a background set (such as all support tickets).

Another use case for aggregations is navigation. In such cases, we may use aggregations to generate a list of categories based on the content on a website to build a menu, or we may aggregate values from many different fields from documents that match a search query to allow users to narrow their search. The below screen shot from Amazon illustrates an example of the latter:

bucket aggregations

An example of how facets/aggregations are used to filter search results on Amazon.com.


 

0 Responses on ElasticSearch Bucket-and-Scripted-Metric-Aggregations"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.