Mindmajix

Introduction to Elasticsearch Mapping

Mapping

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define:

  • which string fields should be treated as full text fields.
  • which fields contain numbers, dates, or geolocations.
  • whether the values of all fields in the document should be indexed into the catch-all _all field.
  • the format of date values.
  • custom rules to control the mapping for dynamically added fields.

In order to be able to treat date fields as dates, numeric fields as numbers, and string fields as full-text or exact-value strings, Elasticsearch needs to know what type of data each field contains. This information is contained in the mapping.

Every type has its own mapping, or schema definition. A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch. A mapping is also used to configure metadata associated with the type.

Elasticsearch Mapping

Mapping consists of the properties associated with the documents in a specific index type; such as, string, date, integer of each field in that document. So defining the mapping when we create an index plays a very important role, as inappropriate mapping could make things difficult for us.

Mappings can be applied in many methods such as to the types of an index, to particular fields and both can be done in a static and in dynamic ways.

Let’s take a look at search requests for movies, this time filtering by the director’s name:

A search request filtering using the author’s full name.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "constant_score": {
 "filter": {
 "term": { "director": "Francis Ford Coppola" }
          }
       }
    }
 }'

As we have two movies directed by Francis Ford Coppola in our index, it doesn’t seem too far fetched that this request should result in two hits, right? That’s not the case, however.

elasticsearch mapping

The search request and its result, without a single hit, in Sense.

We’ve obviously indexed two movies with “Francis Ford Coppola” as director and that’s what we see in the search results as well. Well, while ElasticSearch has a JSON object with that data that it returns to us in search results in the form of the _source property that’s not what it has in its index.

When we index a document with ElasticSearch, it (simplified) does two things: it stores the original data untouched for later retrieval in the form of _source and it indexes each JSON property into one or more fields in a Lucene index. During the indexing, it processes each field, according to how the field is mapped. If it isn’t mapped, default mappings depending on the field type (string, number etc.) is used.

As we haven’t supplied any mappings for our index, ElasticSearch uses the default mappings for strings for the director field. This means that in the index, the director fields value isn’t “Francis Ford Coppola”. Instead, it’s something more like [“francis”, “ford”, “coppola”]. We can verify that by modifying our filter to instead match “francis” (or “ford” or “coppola”):

Screenshot_7

The search request after modifying it so that the term filter now looks for “francis” instead of the full name and its result with two hits.

So, what to do if we want to filter by the exact name of the director? We modify how the field is mapped. There are a number of ways to add mappings to ElasticSearch, through a configuration file, as part of a HTTP request that creates and index and by calling the _mapping endpoint.

Using the last approach in theory, fix the above issue by adding a mapping for the “director” field instructing ElasticSearch not to analyze (tokenize etc.) the field at all when indexing it, like this:

Explicitly mapping the director field as not_analyzed, meaning it will be indexed exactly as it is (not tokenized etc.).

curl -XPUT "http://localhost:9200/movies/movie/_mapping" -d'
 {
 "movie": {
 "properties": {
 "director": {
 "type": "string",
 "index": "not_analyzed"
       }
     }
   }
 }'

There are however a couple of issues if we do this. First of all, it won’t work as there already is a mapping for the field. Try the above request and you’ll get an error message like this:

Response from ElasticSearch which doesn’t allow us to change the mapping for the already mapped field.

{
 "error": "MergeMappingException[Merge failed with failures {[mapper [director] has di\
 fferent index values, mapper [director] has different tokenize values, mapper [director\
 ] has different index_analyzer]}]",
 "status": 400
 }

In many cases, such as this, it’s not possible to modify existing mappings. Often the easiest work around for that is to create a new index with the desired mappings and re-index all of the data into the new index.

The second problem with adding the above mapping is that, even if we could add it, we would have limited our ability to search in the director field. That is, while a search for the exact value in the field would match, we wouldn’t be able to search for single words in the field any more.

Luckily, there’s a simple solution to our problem. We add a mapping that upgrades the field to a “multi field”. What that means is that we’ll map the field multiple times for indexing. Given that one of the ways we map it match the existing mapping both by name and settings that will work fine and we won’t have to create a new index. Here’s a request that does that:

Upgrading the director field to a multi_field so that it will be indexed twice during indexing.

curl -XPUT "http://localhost:9200/movies/movie/_mapping" -d'
 {
 "movie": {
 "properties": {
 "director": {
 "type": "multi_field",
 "fields": {
 "director": {
 "type": "string"
 },
 "original": {
 "type" : "string",
 "index" : "not_analyzed"
           }
         }
       }
     }
   }
 }'

This time ElasticSearch is happy with us as we don’t modify an existing mapping, but only add a ‘sub field’.

{
 "acknowledged": true
 }

We told ElasticSearch that whenever it sees a property named director in a movie document that is about to be indexed in the movies index, it should index it multiple times. Once into a field with the same name (director) and once into a field named director.original and the latter field should not be analyzed, maintaining the original value allowing us to filter by the exact director name.

With our new shiny mapping in place, we can re-index one or both of the movies directed by Francis Ford Coppola and try the search request that filtered by director again. Only, this time we don’t filter on the director field (which is indexed the same way as before) but instead on the director.original field:

A search request filtering using the author’s full name, this time on the director.original field.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "constant_score": {
 "filter": {
 "term": { "director.original": "Francis Ford Coppola" }
       }
     }
   }
 }'


0 Responses on Introduction to Elasticsearch Mapping"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.