Elasticsearch is an open-source, highly scalable, full-text search, and analytics engine. You can crawl through big volume of data rapidly with the help of Elasticsearch. Generally, it is used in applications where complex search is required. It is developed in Java and licensed under Apache license version 2.0. Nowadays, many big companies around the world are using it for their growth.
Firstly, let us understand why Elasticsearch was invented. Consider one example where customers are looking for some product information from huge product volume. But the system is taking too much time for information retrieval due to large volume of data. This in turn leads to poor user experience and there may be chances to lose the potential customer due to the same. RDBMS (Relational Database Management System) works slow when it comes to large amount of data. To overcome this problem, Elasticsearch was invented.
Elasticsearch is a document-based system which stores, manages, and retrieves document-oriented or semi-structured data. Data is stored in JSON document format in Elasticsearch. It is also schema-less. It is a NoSQL database which uses Lucene search engine
Elasticsearch uses Query Domain Specific Language to interact with data. Here, queries are written in JSON format. With the help of Query DSL, we can accommodate all the complex logic in a single query. Query DSL is designed to handle all real-world complex logics in a single query.
[Related Page: Elasticsearch Nested Type Mapping]
Let us explore Elasticsearch features to understand what it offers.
Below are features offered by Elasticsearch:
[Related Page: Retrieving Multiple Documents ]
Elasticsearch is not a data store primarily. But, technically yes, we can make it a data store. Elasticsearch stores documents and its versions. If two processes simultaneously start writing to a document, latest version will be kept. It doesn’t support ACID (Atomicity, Consistency, Isolation, and Durability) properties like a database.
Let us understand its architecture by exploring below concepts.
Nodes and Clusters
Node is defined as a single instance of Elasticsearch. Usually, it runs one instance for each machine. Clusters are termed as a collection of nodes which communicate with each other to read/write to an index. Cluster requires a unique name to avoid unnecessary nodes to join the cluster. There is a master node which manages the whole cluster. Master node is responsible for any changes to clusters like adding a node, removing a node, creating or deleting indices, etc. Each cluster and node has a unique name.
Each node in a cluster contributes to the searching and indexing capabilities of cluster. For example, if we have run some search query, each node will execute that to search through the data it stores. Each node supports searching, indexing, manipulating of existing data.
[Related Page: The Bulk API In Elasticsearch]
Documents and Indices
Whatever data item we store in cluster is nothing but the document. Document is a JSON object here and we can relate it to rows in database terminology. For example, if you want to store a student, then you will add one object having name and standard as its properties. As we are aware that data will be spread across all the nodes, but do we know how to organize it? These documents are stored under indices. Index is defined as the collection of documents having similar properties or we can say logically related. For instance, an index for orders’ data, products’ data and customers’ data.
Documents have their unique ID, which can be assigned by Elasticsearch or by users while adding them to index. Any document is uniquely identified by its ID and index. There is no limit to number of documents being added to index.
Indices are also identified by their name. Their names can be used to search for any document.
[Related Page: Elasticsearch Post Filter Aggregation]
Shards and replicas
Elasticsearch uses Lucene technology for faster retrieval of data. It uses the power of Lucene index in distributed system to retrieve data extremely fast. Shards are termed as individual instances of Lucene index. As data volume increases, index performance also slows down. To overcome this, Elasticsearch uses shards to divide indexes and multiple pieces. Shards are important due to below two reasons.
Replicas are invented to avoid any unexpected network failure. Replica shards, as their name implies, are replicas of index’s shards. Replicas are important in Elasticsearch architecture for below 2 reasons.
While creating index, we can choose number of shards and its replicas. However, we can change number of replicas dynamically anytime.
[Related Page: Combine Aggregations & Filters In ElasticSearch]
Below are few advantages of Elasticsearch:
[Related Page: Overview On ElasticSearch Aggregations]
Below are few use-cases for Elasticsearch:
[Related Page: Introduction To Elasticsearch Aggregations]
Elasticsearch is a No SQL database. It doesn’t have any joins, relations, constraints, or any transactional behaviour. Elasticsearch is easier to scale if we compare it with RDBMS. To know in detail, let us compare how Elasticsearch is different from RDBMS.
|Semi-structured or unorganized data||Structured and organized data|
|Eventual Consistency||Tight Consistency|
|BASE transactions||ACID transactions|
|No Predefined Schema||Data and relationships stored in tables.|
|Everything is indexed||Index|
Like Elasticsearch, MongoDB is also a document oriented database management system. They have many features in common like Document-oriented system, schema free, sharding, replicas, high availability, etc. But still they cater to different set of users. Following table depicts the comparison between Elasticsearch and MongoDB.
|Speed||Speed remains constant irrespective of volume of data||Speed can be increased by adding more shards. But speed will drop if volume of data increases|
|Security||Paid plug in is required to manage access rights||User management by roles|
|Scalability||Simplified scalability||Horizontal scalability better than RDBMS|
|Consistency||Eventual Consistency||Eventual Consistency|
|Replication Methods||Master-slave replication||Yes|
Solr is also a search-text engine which is built on top of Apache Lucene platform like Elasticsearch. Due to the same platform, they have many similar features. But still they have many differences when it comes to ease of deployment, scalability, and many more functionalities. Below is the comparison between Elasticsearch and Solr.
|License||Open Source||Open Source|
|Data Schema||Schema Free||Yes|
|OS||All OS with JVM||All OS with JVM and servlet container|
|MapReduce||With Hadoop Integration||No|
|Consistency||Eventual Consistency||Eventual Consistency|
|Transaction Concepts||No||Optimistic Locking|
|APIs||Java, RESTful, HTTP/JSON API||Java, RESTful, HTTP API|
|Indexing/Searching||Better performance of analytical queries||Text-oriented|
|Documentation||Lack in documentation||Very well documented|
|Installation and Configuration||More intuitive||Detailed documentation|
Elasticsearch is the most popular, open source, distributed, cross-platform, and scalable search engine. Elasticsearch is growing exponentially since 2010 and making a remarkable impression all over the IT industry. Due to its exponential growth, there is a very high demand of talents having Elasticsearch skills. IT professionals having knowledge of Elasticsearch are hired with a great salary and are valued more. It is trending in IT industry as it has a very bright future due to its capabilities to handle large amount of data and faster search.
Elasticsearch stands out from all its competitors as it is highly scalable and widely distributed in nature. If you have a large volume of data and you want a faster search, then there is no way you can find anything which is as good as Elasticsearch.
Free Demo for Corporate & Online Trainings.