Combine Aggregations & Filters in ElasticSearch

Rating: 4
  
 
11291

A natural extension to aggregation scoping is filtering. Because the aggregation operates in the context of the query scope, any filter applied to the query will also apply to the aggregation.

COMBINING AGGREGATIONS AND FILTERS

Aggregations can be used for visualizing aggregated values from the search results and allowing users to filter by them. If we were to do something similar for our movies, it might look something like this:

AGGREGATIONS AND FILTERS

In order to be able to create a page such as the one above, we’d use a search request such as this:
A search request for all movies and terms aggregations for directors and genres.

Learn how to use Elasticsearch, from beginner basics to advanced techniques, with online video tutorials taught by industry experts. Enroll for Free Elasticsearch Certification Training Demo!

 

curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
{
"aggregations": {
"directors": {
"terms": {
"field": "director.original"
}
},
"genres": {
"terms": {
"field": "genres.original"
}
}
}
}?
'

Now, what if a user wants to filter by a director? On the web development side of things we’d send the director name as a parameter of some sort back to the server. Once on the server we’d need to modify our request to ElasticSearch to add a filter, like this:
The same request as the previous one, only this time with filtering for movies by a specific director.

curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
 {
 "query": {
 "filtered": {
 "query": {
 "match_all": {}
 },
 "filter": {
 "term": {
 "director.original": "Francis Ford Coppola"
 }
 }
 }
 },
 "aggregations": {
 "directors": {
 "terms": {
 "field": "director.original"
 }
 },
 "genres": {
 "terms": {
 "field": "genres.original"
 }
 }
 }
 }'
Related Article: Elasticsearch Interview Questions

With the filtered response from ElasticSearch, we rebuild the web page based on the new response:

rebuild the web page

As the search response now only contains the two movies directed by Francis Ford Coppola, only two hits will be shown. Also, as aggregations are calculated over the document set that the query generates the filters in the left part of the page has also changed. Only the genres and directors found in the movies by Francis Ford Coppola are shown.

Often this is the desired behavior, letting the aggregations reflect the result of applied queries and filters. However, sometimes it’s not. For instance, what if we want to allow users to filter by multiple directors? In such cases, we’d still want buckets for the other directors, even though there are no documents with them in the director field that match the current query.

In such cases, we can add a min_doc_count parameter to our aggregations with zero as value. Like this:
Including empty buckets by setting the min_doc_count parameter in the aggregations.

curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
 {
 "query": {
 "filtered": {
 "query": {
 "match_all": {}
 },
 "filter": {
 "term": {
 "director.original": "Francis Ford Coppola"
 }
 }
 }
 },
 "aggregations": {
 "directors": {
 "terms": {
 "field": "director.original",
 "min_doc_count": 0
 }
 },
 "genres": {
 "terms": {
 "field": "genres.original",
 "min_doc_count": 0
 }
 }
 }
 }'

The min_doc_count parameter allows us to control the minimum number of documents that must match a term in order for a bucket to be created by a terms aggregation. The default value is one. By setting it to zero buckets will be created for terms even though no document in the search results has that term. For our page, this would mean that other genres and directors would still be listed:

Related Article: Elasticsearch Post Filter Aggregation

Post filter aggregation

Join our newsletter
inbox

Stay updated with our newsletter, packed with Tutorials, Interview Questions, How-to's, Tips & Tricks, Latest Trends & Updates, and more ➤ Straight to your inbox!

Course Schedule
NameDates
Elasticsearch TrainingApr 20 to May 05View Details
Elasticsearch TrainingApr 23 to May 08View Details
Elasticsearch TrainingApr 27 to May 12View Details
Elasticsearch TrainingApr 30 to May 15View Details
Last updated: 03 Apr 2023
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read more
Recommended Courses

1 / 15