Mindmajix

Elasticsearch Post Filter Aggregation

Post filter

The post_filter is applied to the search hits at the very end of a search request, after aggregations have already been calculated. Its purpose is best explained by example:

Imagine that you are selling shirts, and the user has specified two filters: color:red and brand:gucci. You only want to show them red shirts made by Gucci in the search results. Normally you would do this with a bool query:

curl -XGET localhost:9200/shirts/_search -d '
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "color": "red"   }},
        { "term": { "brand": "gucci" }}
      ]
    }
  }
}
'

However, you would also like to use faceted navigation to display a list of other options that the user could click on. Perhaps you have a model field that would allow the user to limit their search results to red Gucci t-shirts or dress-shirts.

This can be done with a terms aggregation:

curl -XGET localhost:9200/shirts/_search -d '
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "color": "red"   }},
        { "term": { "brand": "gucci" }}
      ]
    }
  },
  "aggs": {
    "models": {
      "terms": { "field": "model" } 
    }
  }
}
'
Returns the most popular models of red shirts by Gucci.

But perhaps you would also like to tell the user how many Gucci shirts are available in other colors. If you just add a terms aggregation on the color field, you will only get back the color red, because your query returns only red shirts by Gucci.

Instead, you want to include shirts of all colors during aggregation, then apply the colors filter only to the search results. This is the purpose of the post_filter:

curl -XGET localhost:9200/shirts/_search -d '
{
  "query": {
    "bool": {
      "filter": {
        { "term": { "brand": "gucci" }} 
      }
    }
  },
  "aggs": {
    "colors": {
      "terms": { "field": "color" } 
    },
    "color_red": {
      "filter": {
        "term": { "color": "red" } 
      },
      "aggs": {
        "models": {
          "terms": { "field": "model" } 
        }
      }
    }
  },
  "post_filter": { 
    "term": { "color": "red" }
  }
}
'
The main query now finds all shirts by Gucci, regardless of color.
The colors agg returns popular colors for shirts by Gucci.
The color_red agg limits the models sub-aggregation to red Gucci shirts.

Finally, the post_filter removes colors other than red from the search hits

Performance consideration

Use a post_filter only if you need to differentiate filter search results and aggregations. Sometimes people will use post_filter for regular searches.

The nature of the post_filter means it runs after the query, so any performance benefit of filtering (such as caches) is lost completely.

The post_filter should be used only in combination with aggregations, and only when you need differential filtering.

Only use post_filter when needed

The post_filter parameter has an alias, filter. This is for backwards compatibility as post_filter used to be named filter in early versions of ElasticSearch. The name was changed for a reason. While it’s certainly possible, and more convenient, to use post_filter instead of the query parameter when creating a request that should only filter the results, it’s not as good as using the query parameter performance wise. So, feel free to use post_filter even if you don’t need to while debugging, but only use it when you actually need it against a production cluster.


0 Responses on Elasticsearch Post Filter Aggregation"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.