Mindmajix

Dynamic Mapping Overview | Elasticsearch

Mapping

Mapping consists of the properties associated with the documents in a specific index type; such as, string, date, integer of each field in that document. So defining the mapping when we create an index plays a very important role, as inappropriate mapping could make things difficult for us.

Mappings can be applied in many methods such as to the types of an index, to particular fields and both can be done in a static and in dynamic ways.

Default mappings allow generic mapping definitions to be automatically applied to types that do not have mappings predefined.

When Elasticsearch encounters a previously unknown field in a document, it uses dynamic mapping to determine the datatype for the field and automatically adds the new field to the type mapping.

Dynamic mapping

Upgrading the mapping for the director field to a multi field and re-indexing the documents solves the problem of being able to filter on the exact director name. However, doing this explicitly for multiple fields can be tedious and it hardly makes using ElasticSearch seem friction free. Luckily, ElasticSearch has a solution for this, the concept of dynamic mapping.

During indexing when ElasticSearch encounters an unmapped field, a field for which we haven’t provided any explicit mappings, that contains a value (not null or an empty array) it uses the name of the field and the value’s type to look for a template. So far, in the examples that we’ve looked at, the default templates built into ElasticSearch has been used. However, we can provide templates of our own.

For instance, we can provide a template that maps all new string fields (existing ones we’ll have to update ourselves, as we did with the director field) the way that we mapped the director field. Here’s a request that adds a dynamic mapping for string that does just that for the movie type:

curl -XPUT "http://localhost:9200/movies/_mapping/movie" -d'
  {
  "movie": {
  "dynamic_templates": [
  {
  "strings": {
  "match_mapping_type": "string",
  "path_match": "*",
  "mapping": {
  "type": "string",
  "fields": {
  "original": {
  "type": "string",
  "index": "not_analyzed"
                }
              }
            }
          }
        }
      ]
    }
  }'

Now, whenever a new string field is added to movies, we can query and filter on both <name of field> and <name of field>.original. In most cases however it’s convenient to provide default templates not only for a specific type but for all types in the index. To do so, we can explicitly create an index (using PUT <index name>) and provide mappings for the special _default_ type.

First, we need to delete the existing movies index. We haven’t covered how to do that yet, but it’s quite easy. Simply make a DELETE request to a URL matching the index name. Like this:

Deleting the movies index. Note that both mappings and documents will be gone. Forever.

curl -XDELETE "http://localhost:9200/movies"

Now, to create the index again we switch the HTTP verb to PUT. As the request body, we send a JSON object with a property named mappings in which we can provide mappings.

Creating the movies index again. This time with mappings for the _default_ type.

curl -XPUT "http://localhost:9200/movies" -d'
  {
  "mappings": {
  "_default_": {
  "dynamic_templates": [
  {
  "strings": {
  "match_mapping_type": "string",
  "path_match": "*",
  "mapping": {
  "type": "string",
  "fields": {
  "original": {
  "type": "string",
  "index": "not_analyzed"
                  }
                }
              }
            }
          }
        ]
      }
    }
  }'

Now, whenever a new field with a string value is encountered during indexing, no matter what the document type is, it will be mapped using our template. Index the movies again and then inspect the mappings for the movie type (curl -XGET “http://localhost:9200/movies/movie/_mapping”) to verify this.

The result should look like the JSON object below. Note that the movie type has “inherited” the dynamic templates from the _default_ type and that each string field has been mapped using our string template.

The mappings for the movie type after having indexed the movie documents.

{
  "movies": {
  "mappings": {
  "movie": {
  "dynamic_templates": [
  {
  "strings": {
  "mapping": {
  "type": "string",
  "fields": {
  "original": {
  "index": "not_analyzed",
  "type": "string"
  }
  }
  },
  "match_mapping_type": "string",
  "path_match": "*"
  }
  }
  ],
  "properties": {
  "director": {
  "type": "string",
  "fields": {
  "original": {
  "type": "string",
  "index": "not_analyzed"
  }
  }
  },
  "genres": {
  "type": "string",
  "fields": {
  "original": {
  "type": "string",
  "index": "not_analyzed"
  }
  }
  },
  "title": {
  "type": "string",
  "fields": {
  "original": {
  "type": "string",
  "index": "not_analyzed"
  }
  }
  },
  "year": {
  "type": "long"
            }
          }
        }
      }
    }
  }

All these three scenarios are taken to account by Elasticsearch and have provided with the “dynamic” setting, which can be set to any of the following three values:

  • true: add new fields automatically. This is the default setting.
  • false: ignore the new fields
  • strict: throw an exception error when a previously unknown field is detected

More on dynamic mapping

As we’ve seen, dynamic mapping is a powerful feature that we can use to ensure that types and fields, even though we don’t know if it will exist in advance, in an index will be indexed in the way we need. To further delve into dynamic mapping, let’s look at another example that will illustrate several useful features.

Indexing a simple ‘tweet’ object into an index named ‘myindex’.

curl -XPOST "http://localhost:9200/myindex/tweet/" -d'
 {
 "content": "Hello World!",
 "postDate": "2009-11-15T14:12:12"
 }'

Given that there isn’t already an indexed named “myindex”, the above request will cause a number of things to happen in the ElasticSearch cluster.

  1. An index named “myindex” will be created.
  2. Mappings for a type named tweet will be created for the index. The mappings will contain two properties, content and postDate.
  3. The JSON object in the request body will be indexed.

After having made the above request, we can inspect the mappings that will have been automatically created (using curl -XGET “http://localhost:9200/myindex/_mapping“). The result looks like this:

The mappings that have been automatically created for the tweet type.

{
 "myindex": {
 "mappings": {
 "tweet": {
 "properties": {
 "content": {
 "type": "string"
 },
 "postDate": {
 "type": "date",
 "format": "dateOptionalTime"
        }
       }
      }
     }
   }
 }

As we can see in the above response, ElasticSearch has mapped the content property as a string and the postDate property as a date. All is well. However, let’s look at what happens if we delete the index and modify our indexing request to instead look like this:

Indexing another tweet object with a different value for the ‘content’ property.

curl -XPOST "http://localhost:9200/myindex/tweet/" -d'
 {
 "content": "1985-12-24",
 "postDate": "2009-11-15T14:12:12"
 }'

In the above request, the content property is still a string, but the only content of the string is a date. Retrieving the mappings now gives us a different result.

The automatically generated mappings for the tweet type, again.

{
 "myindex": {
 "mappings": {
 "tweet": {
 "properties": {
 "content": {
 "type": "date",
 "format": "dateOptionalTime"
 },
 "postDate": {
 "type": "date",
 "format": "dateOptionalTime"
           }
         }
       }
     }
   }
 }

ElasticSearch has now inferred that the content property also is a date. After all, JSON doesn’t have a specific date type, so we can hardly expect it to know that we intend a string that looks like a date to be mapped as a string. If we now try to index our original JSON object, we’ll get an exception in our faces:

The response from ElasticSearch when indexing a tweet with a value for the content property that can’t be parsed as a date.

{
 "error": "MapperParsingException[failed to parse [content]]; nested: MapperParsingExc\
 eption[failed to parse date field [Hello World!], tried both date format [dateOptionalT\
 ime], and timestamp number with locale []]; nested: IllegalArgumentException[Invalid fo\
 rmat: \"Hello World!\"]; ",
 "status": 400
 }

We’re trying to insert a string value into a field which is mapped as a date. Naturally ElasticSearch won’t allow us to do that. While this scenario isn’t very likely to happen, when it does, it can be quite annoying and cause problems that can only be fixed by re-indexing everything into a new index. Luckily, there’s a number of possible solutions.

Customizing Dynamic Mapping

If you know that you are going to be adding new fields on the fly, you probably want to leave dynamic mapping enabled. At times, though, the dynamic mapping “rules” can be a bit blunt. Fortunately, there are settings that you can use to customize these rules to better suit your data.

Disabling date detection

When Elasticsearch encounters a new string field, it checks to see if the string contains a recognizable date, like 2014-01-01. If it looks like a date, the field is added as type date. Otherwise, it is added as type string.

As a first step, we can disable date detection for dynamic mapping. Here’s how we would do that explicitly for documents of type tweet when creating the index:

Creating an index with automatic date detection disabled for the tweet type.

curl -XPUT "http://localhost:9200/myindex" -d'
 {
 "mappings": {
 "tweet": {
 "date_detection": false
 }
 }
 }'

Let’s index the “problematic” tweet with the date in the content property again and inspect the mappings that have been dynamically created. This time we see a different result:

The automatically generated mappings for the tweet type after having disabled date detection.

{
 "myindex": {
 "mappings": {
 "tweet": {
 "date_detection": false,
 "properties": {
 "content": {
 "type": "string"
 },
 "postDate": {
 "type": "string"
 }
 }
 }
 }
 }
 }

Now, both fields have been mapped as strings, which they indeed are, even though they contain values that can be parsed as dates. However, this isn’t good either as we’d like the postDate field to be mapped as a date so that we can use range filters and the like on it.

Explicitly mapping date fields

We can explicitly map the postDate field as a date by re-creating the index and include a property mapping, like this:

Creating an index with automatic date detection disabled and a specific mapping for the postDate property for the tweet type.

curl -XPUT "http://localhost:9200/myindex" -d'
 {
 "mappings": {
 "tweet": {
 "date_detection": false,
 "properties": {
 "postDate": {
 "type": "date"
 }
 }
 }
 }
 }'

If we now index the “problematic” tweet with a date in the content field we’ll get the desired mappings; the content field mapped as a string and the postDate field mapped as a date. However, this approach can be cumbersome when dealing with many types or types that we don’t know about prior to when documents of those types are indexed.

Mapping date fields using naming conventions

An alternative approach to disabling date detection and explicitly mapping specific fields as dates is instruct ElasticSearch dynamic mapping functionality to adhere to naming conventions for dates. Take a look at the below request that (again) creates an index.

Creating an index with automatic date detection disabled for the all types (unless overridden with specific mappings) and a template for mapping fields matching a regular expression as dates.

{
 "mappings": {
 "_default_": {
 "date_detection": false,
 "dynamic_templates": [
 {
 "dates": {
 "match": ".*Date|date",
 "match_pattern": "regex",
 "mapping": {
 "type": "date"
 }
 }
 }
 ]
 }
 }
 }'

Compared to our previous requests used to creating an index with mapping, this is quite different. First of all, we no longer provide mappings for the tweet type. Instead, we provide mappings for the _default_ type. As we’ve already discussed, this is a special type whose mappings will be used as the default “template” for all other types.

As before we start by disabling date detection in the mappings. However, after that, we no longer provide mappings for properties but instead provide a dynamic template named dates. Within the dates template we provide a pattern and specify that the pattern should be interpreted as a regular expression. Using this, the template will be applied to all fields whose names either end with “Date” or whose names are exactly “date”. For such fields the template instructs the dynamic mapping functionality to map them as dates.

Using this approach, all string fields, no matter if their values can be parsed as dates or not will be mapped as string unless the field name is something like “postDate”, “updateDate” or simply “date”. Fields with such names will be mapped as dates instead.

While this is nice, there’s one caveat. Indexing a JSON object with a property matching the naming convention for date fields but whose value can’t be parsed as a date will cause an exception. Still, adhering to naming conventions for dates may be a small price to pay compared to the headaches of seemingly randomly having string fields mapped as dates simply because the first document to be indexed of a specific type happened to contain a string value that could be parsed as a date.


0 Responses on Dynamic Mapping Overview | Elasticsearch"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.