Blog

Introduction to Elasticsearch Mapping

  • (4.0)
  • | 1776 Ratings

In this article, we will be looking into the concepts of Elasticsearch mapping and also understand the topics that relate to this altogether. Just to provide some initial understanding of Elasticsearch in comparison with a traditional RDBMS system, an index in Elasticsearch can be understood as a database in the traditional RDBMS system. A Type in Elasticsearch can be understood as a Table in the RDBMS terms, a Document in Elasticsearch can be understood as a Row in the RDBMS terms, and finally, a Field in Elasticsearch can be understood as a Column in the RDBMS terms.

Elasticsearch Mapping

A mapping can be understood as the way documents and its fields are indexed or stored. In other words, it can also be compared as the mapping to a database schema defining the properties and fields that these documents hold and the datatypes how these properties are stored within the database. Let us discuss more of these in the following sections.

Learn how to use Elasticsearch, from beginner basics to advanced techniques, with online video tutorials taught by industry experts. Enroll for Free Elasticsearch Training Demo!

What is Mapping?

A mapping can be defined as the summary of documents that are stored in an index. This defines the data types that are defined for the fields, format of the fields that are made available in the documents, and also talk a great deal about the rules that get applied with the mapping of fields.

Data-type fields

Elasticsearch supports a variety of data types for the fields within a document that gets saved in an index. Let us discuss each and every available data types that are used to store data in Elasticsearch:

  • Core data type: Core data types are the basic data types that are available and also supported by most of the systems - like integer, double, long, short, byte, float, string, Boolean, date, and binary.
  • Complex data type: Complex data types are a combination of the core data types - like the arrays, JSON objects, nested data types.
  • Geo data type:Geo dataGeodata type is the data type to hold details like the geographical location of a place. For example, geo_point is used to identify the latitude and longitude.
  • Specialized data type:Specialized data types are those data types that hold details unique in nature - such as the IP addresses, auto-complete suggestions, and counting tokens out of a string.

Checkout Elasticsearch Tutorials

Meta fields with examples:

There are a different set of fields with various data types that constitute mappings in an index. Out of these available fields, there are fields that provide more meaningful information about these mappings, and also the other objects which get associated with them - as like the _type, _id, _source and _index fields. Let us go through these meta fields in detail:

  • Identity meta-fields: Out of the available meta-fields, identity meta-fields are as follows:
    • _index : Denotes the index to which document it belongs to
    • _uid : A composite key with the combination of _type and _id
    • _type: Denotes the document’s mapping type
    • _id : Denotes the document’s ID
  • Document source meta-fields: Out of the available meta-fields, document source meta-fields are as follows:
    • _source : Denotes the original JSON object which represents the document body
    • _size : It denotes the size of the _source field in bytes
  • Indexing meta-fields: Out of the available meta-fields, indexing meta-fields are as follows:
    • _all : This field takes the responsibility of indexing the values of all other fields.
    • _field_names : Denotes all fields in a given document which contain non null values.
    • _timestamp : A manual or an automatically generated timestamp that gets associated with each and every document.
    • _ttl : Denotes the time to which the document should be kept alive, after which it gets deleted.
  • Routing meta-fields: Out of the available meta-fields, routing meta-fields are as follows:
    • _parent : This is used when a parent-child relationship has to be created within multiple types.
    • _routing : A proprietary value that helps route the given document to a specified shard.
  • Other meta-fields:
    • _meta : Denotes application specific metadata.

[Related Article: Introducing AWS Elasticsearch]

Mapping Types

Every index has more than one mapping type available with it, which are further used to divide the documents of the index as logical chunks or groups. Mappings available with each of these indexes differ from each other based on the factors such as Meta fields, and Fields.

Static Mapping:

In usual scenarios, we know beforehand the kind of data that gets saved in the document. In such cases, there is an ease in creating the fields and the types whenever we are going to create the index altogether.

Dynamic Mapping:

There is a provision provided by Elasticsearch that automatically creates the mapping for an index. When a user posts certain data to a mapping, Elasticsearch creates the mapping automatically - which is referred to as “Dynamic Mapping”. Let us take a look at a sample request and response to understand this better:

Request:
{
   "from":"789XXXX12", "to":"756XXXX17",
   "date":"8/1/2019", "amount":1000001
}
Response:
{
   "_index":"debitaccountdetails", "_type":"transactionreport",
   "_id":"AVI3FaeH0dicjGepNBgI4aqke", "_version":1,
   "_shards":{"total":2, "successful":1, "failed":0},
   "created":true
}

Mapping Parameters:

Mapping Parameters are the fundamental factors that define the mapping, and store the crux of the data - fields, storage, and also how these details be analyzed while these are being searched for. Following are the available mapping parameters with Elasticsearch:

  • analyzer
  • coerce
  • boost
  • doc_values
  • copy_to
  • dynamic
  • enabled
  • geohash
  • geohash_prefix
  • geohash_precision
  • fielddata
  • format
  • ignore_above
  • ignore_malformed
  • index
  • index_options
  • include_in_all
  • lat_lon
  • fields
  • null_value
  • norms
  • properties
  • position_increment_gap
  • search_analyzer
  • store
  • team_vector
  • similarity

Mapping Types in Elasticsearch 5.0:

In the earlier versions of Elasticsearch, especially in Elasticsearch 2.0, there was a “string” data type to enable full-text search and also to identifies keyword identifiers. Basically, relevant text can be identified within documents using the full-text search and other features like aggregation, filtering, and sorting documents can be achieved by using the keyword identifiers. In Elasticsearch 2.0, there was no provision to mention which one to use for full-text search and which ones to use for sorting or aggregation. This has been the feature that has been revamped in the release of Elasticsearch 5.0 providing two new data types namely - “text” and “keyword” which further replace the “string” data type.

[Related Article: Elasticsearch Sorting]

When you are using these “text” and “keyword” data types in Elasticsearch 5.0, we can follow the rule of thumb here:

  • “Text” data type can be used for performing a full-text search for any specific fields as like the email body or any product description.
  • “Keyword” data type can be used for exact value searches, when you’re performing filtering or sorting. Exact value searches are possible only on the keyword fields as like the email addresses, status codes, hostnames, tags, or zip codes.

A point to remember here is that the “string” data type has been replaced with the “text” and “keyword” data types in Elasticsearch 5.0 but the “string” data type has been removed completely starting Elasticsearch 6.0 release.

Frequently Asked Elasticsearch Interview Questions & Answers

Mapping Types in Elasticsearch 6.0:

The recent news about the removal of Mapping types from the release Elasticsearch 6.0 has surfaced and this begins the end of this concept in the future. This can be seen as a huge void, given its usage in the realms of Elasticsearch - this change is completely controversial. There is a strong reason why this change has been brought in, from the Elasticsearch’s performance standpoint. It all boils down to the point how Lucene handles empty fields within documents, which is termed as data sparsity. Though this causes inconveniences with the usage of Elasticsearch 6.0 itself, there is a concept that hasn’t been removed from Elasticsearch 6.0 as yet - Types.

Indices that are created within Elasticsearch 6.0 are allowed to have only one mapping type whereas the indices that are created with Elasticsearch 5.0 will still continue to work allowing multiple mapping types. These indices can be converted to single-type indices using the Elasticsearch ReIndex APIs anyway. Instead of putting all your properties under a single type, you could very well have an index per document type. Or on the contrary, there is another solution wherein you can create a custom field that works in tandem with the _type meta field, or just the way how it works.

[Related Article: The Bulk API in Elasticsearch]

Removal of the Mapping types that has taken a step in Elasticsearch 5.0 has seen drastic effects in the release of Elasticsearch 6.0 which will continue further to be removed completely by the release of Elasticsearch 9.0. Though the change in the fundamental functionality is painful enough, there is the brighter side of being able to use performant searches and ease of use. This enforces the usage of indices in a manner well suited to the underlying data structure and also enables speeding up the searches further. 

Conclusion:

In this article, we have discussed in detail about the Elasticsearch Mapping concepts and also at the same time, we have discussed about the available data types, meta fields, and the mapping types. Not just that, we have seen how this feature has been utilized in Elasticsearch 5.0 and how it was replaced with another feature in Elasticsearch 6.0. We have also discussed to some detail about the changes that have occured over the releases of this product and how better the features have been improved altogether.

Subscribe For Free Demo

Free Demo for Corporate & Online Trainings.

Ravindra Savaram
About The Author

Ravindra Savaram is a Content Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.


DMCA.com Protection Status

Close
Close