Mindmajix

The Elasticsearch Nested Type Mapping

Elasticsearch Nested Mapping

Documents in ElasticSearch can contain properties with arrays or other JSON objects as values. In most cases, this just works. However, some times it doesn’t. Let’s again index a movie, only this time we’ll add an array of actors to it and let each actor be a JSON object:

Indexing a movie with a ‘cast’ property.

curl -XPUT “http://localhost:9200/movies/movie/7” -d’
{
“title”: “The Matrix”,
“cast”: [
{
“firstName”: “Keanu”,
“lastName”: “Reeves”
},
{
“firstName”: “Laurence”,
“lastName”: “Fishburne”
    }
  ]
}’

Now, with movie indexed, we will get a hit for it if we search for movies with an actor whose first name is “Keanu”. Or, rather the lowercased version, “keanu”, if we filter on a field mapped with the standard analyzer.

Searching for movies, filtering on the cast.firstName fields.

curl -XPOST “http://localhost:9200/movies/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“term”: {
“cast.firstName”: “keanu”
        }
      }
    }
  }
}’

Running the above query indeed returns The Matrix. The same is true if we try to find movies that have an actor with the first name “Keanu” and last name “Reeves”:

Filtering on both the cast.firstName and cast.lastName fields.

curl -XPOST “http://localhost:9200/movies/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“bool”: {
“must”: [
{
“term”: {
“cast.firstName”: “keanu”
}
},
{
“term”: {
“cast.lastName”: “reeves”
              }
            }
          ]
        }
      }
    }
  }
}

The above request does indeed also result in a hit for The Matrix. All is well. Or, is it? Let’s see what happens if we search for movies with an actor with “Keanu” as first name and “Fishburne” as last name.

Again filtering on both cast fields, but this time with a different value for cast.lastName.

curl -XPOST “http://localhost:9200/movies/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“bool”: {
“must”: [
{
“term”: {
“cast.firstName”: “keanu”
The nested type mapping 83
}
},
{
“term”: {
“cast.lastName”: “fishburne”
              }
            }
          ]
        }
      }
    }
  }
}’

Clearly the above request should, at first glance, not return The Matrix as there’s no such actor amongst its cast. However, ElasticSearch will return The Matrix for the above query. After all, the movie does contain an actor with “Keanu” as first name and, albeit a different one, an actor with “Fishburne” as last name. Based on the above query it has no way of knowing that we want the two term filters to match the same unique object in the list of actors. And even if it did, the way the data is indexed it wouldn’t be able to handle that requirement.

When ElasticSearch indexes fields from JSON objects in an array the relationship of belonging to the same object is lost. In other words, the document will have fields and values like this:

elasticsearch nested

So, what to do? Often the simplest solution is the best. We could prepare our index for this use case by adding a property with both the first name and last name to the actors, like this:

{
“firstName”: “Keanu”,
“lastName”: “Reeves”,
“fullName”: “Keanu Reeves”
}

Using this approach we could simply filter on the fullName property to find all movies starring an actor named Keanu Reeves, and no others. However, sometimes such a simple approach doesn’t cut it. Luckily ElasticSearch provides a way for us to be able to filter on multiple fields within the same objects in arrays; mapping such fields as the nested type. To try this out, let’s create ourselves a new index with the “actors” field mapped as nested.

Creating a new index with the cast field mapped as nested.

curl -XPUT “http://localhost:9200/movies-2” -d’
{
“mappings”: {
“movie”: {
“properties”: {
“cast”: {
“type”: “nested”
        }
      }
    }
  }
}’

After indexing the same movie document into the new index we can now find movies based on multiple properties of each actor by using a nested filter. Here’s how we would search for movies starring an actor named “Keanu Fishburne”:

Using a nested filter in order to filter on both the cast.firstName and cast.lastName fields within the same objects in the cast field.

curl -XPOST “http://localhost:9200/movies-2/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“nested”: {
“path”: “cast”,
“filter”: {
“bool”: {
“must”: [
{
“term”: {
“firstName”: “keanu”
}
},
{
“term”: {
“lastName”: “fishburne”
}
}
]
}
}
}
The nested type mapping 85
}
}
}
}’

As you can see we’ve wrapped our initial bool filter in a nested filter. The nested filter contains a path property where we specify that the filter applies to the cast property of the searched document. It also contains a filter (or a query) which will be applied to each value within the nested property.

As intended, running the abobe query doesn’t return The Matrix while modifying it to instead match “reeves” as last name will make it match The Matrix. However, there’s one caveat.

Including nested values in parent documents

If we go back to our very first query, filtering only on actors first names without using a nested filter, like the request below, we won’t get any hits.

curl -XPOST “http://localhost:9200/movies-2/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“term”: {
“cast.firstName”: “keanu”
}
}
}
}
}’

This happens because movie documents no longer have cast.firstName fields. Instead each element in the cast array is, internally in ElasticSearch, indexed as a separate document. Obviously we can still search for movies based only on first names amongst the cast, by using nested filters. Like this:

Searching for movies, filtering on the cast.firstName fields, in the index with the cast field mapped as nested.

curl -XPOST “http://localhost:9200/movies-2/movie/_search” -d’
{
“query”: {
“filtered”: {
“query”: {
“match_all”: {}
},
“filter”: {
“nested”: {
“path”: “cast”,
“filter”: {
“term”: {
“firstName”: “keanu”
}
}
}
}
}
}
}’

The above request returns The Matrix. However, sometimes having to use nested filters or queries when all we want to do is filter on a single property is a bit tedious. To be able to utilize the power of nested filters for complex criterias while still being able to filter on values in arrays the same way as if we hadn’t mapped such properties as nested we can modify our mappings so that the nested values will also be included in the parent document. This is done using the include_in_parent parameter, like this:

Creating a new index with the cast field mapped as nested and with include_in_parent set to true.

In an index such as the one created with the above request we’ll both be able to filter on combinations of values within the same complex objects in the actors array using nested filters and still be able to filter on single fields without using nested filters. However, we now need to carefully consider where to use, and where to not use, nested filters in our queries as a query for “keanu fishburne” will match The Matrix using a regular bool filter while it won’t when wrapping it in a nested filter. In other words, when using include_in_parent we may get unexpected results due to queries matching documents that it shouldn’t if we forget to use nested filters.


0 Responses on The Elasticsearch Nested Type Mapping"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.