A natural extension to aggregation scoping is filtering. Because the aggregation operates in the context of the query scope, any filter applied to the query will also apply to the aggregation.
Aggregations can be used for visualizing aggregated values from the search results and allowing users to filter by them. If we were to do something similar for our movies, it might look something like this:
In order to be able to create a page such as the one above, we’d use a search request such as this:
A search request for all movies and terms aggregations for directors and genres.
Learn how to use Elasticsearch, from beginner basics to advanced techniques, with online video tutorials taught by industry experts. Enroll for Free Elasticsearch Certification Training Demo! |
curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
{
"aggregations": {
"directors": {
"terms": {
"field": "director.original"
}
},
"genres": {
"terms": {
"field": "genres.original"
}
}
}
}?
'Now, what if a user wants to filter by a director? On the web development side of things we’d send the director name as a parameter of some sort back to the server. Once on the server we’d need to modify our request to ElasticSearch to add a filter, like this:
The same request as the previous one, only this time with filtering for movies by a specific director.
curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"director.original": "Francis Ford Coppola"
}
}
}
},
"aggregations": {
"directors": {
"terms": {
"field": "director.original"
}
},
"genres": {
"terms": {
"field": "genres.original"
}
}
}
}'
Related Article: Elasticsearch Interview Questions |
With the filtered response from ElasticSearch, we rebuild the web page based on the new response:
As the search response now only contains the two movies directed by Francis Ford Coppola, only two hits will be shown. Also, as aggregations are calculated over the document set that the query generates the filters in the left part of the page has also changed. Only the genres and directors found in the movies by Francis Ford Coppola are shown.
Often this is the desired behavior, letting the aggregations reflect the result of applied queries and filters. However, sometimes it’s not. For instance, what if we want to allow users to filter by multiple directors? In such cases, we’d still want buckets for the other directors, even though there are no documents with them in the director field that match the current query.
In such cases, we can add a min_doc_count parameter to our aggregations with zero as value. Like this:
Including empty buckets by setting the min_doc_count parameter in the aggregations.
curl -XPOST "https://localhost:9200/movies/movie/_search" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"director.original": "Francis Ford Coppola"
}
}
}
},
"aggregations": {
"directors": {
"terms": {
"field": "director.original",
"min_doc_count": 0
}
},
"genres": {
"terms": {
"field": "genres.original",
"min_doc_count": 0
}
}
}
}'
The min_doc_count parameter allows us to control the minimum number of documents that must match a term in order for a bucket to be created by a terms aggregation. The default value is one. By setting it to zero buckets will be created for terms even though no document in the search results has that term. For our page, this would mean that other genres and directors would still be listed:
Related Article: Elasticsearch Post Filter Aggregation |
Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:
Name | Dates | |
---|---|---|
Elasticsearch Training | Jan 25 to Feb 09 | View Details |
Elasticsearch Training | Jan 28 to Feb 12 | View Details |
Elasticsearch Training | Feb 01 to Feb 16 | View Details |
Elasticsearch Training | Feb 04 to Feb 19 | View Details |
Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.