Elasticsearch Tutorial

Elasticsearch is an open-source, highly scalable, full-text search, and analytics engine. You can crawl through the big volume of data rapidly with the help of Elasticsearch. Generally, it is used in applications where complex search is required. It is developed in Java and licensed under Apache license version 2.0. Nowadays, many big companies around the world are using it for their growth. In this Elasticsearch tutorial, we will start from the basics of Elasticsearch and learn all the major concepts of Elasticsearch.

Elasticsearch Tutorial for Beginners

The following topics will be covered in this Elasticsearch tutorial

What is Elasticsearch?

Firstly, let us understand why Elasticsearch was invented. Consider one example where customers are looking for some product information from huge product volume. But the system is taking too much time for information retrieval due to the large volume of data. This in turn leads to poor user experience and there may be chances to lose the potential customer due to the same. RDBMS (Relational Database Management System) works slow when it comes to a large amount of data. To overcome this problem, Elasticsearch was invented.

Elasticsearch is a document-based system that stores, manages, and retrieves document-oriented or semi-structured data. Data is stored in JSON document format in Elasticsearch. It is also schema-less. It  is a NoSQL database that uses Lucene search engine

Elasticsearch uses Query Domain Specific Language to interact with data. Here, queries are written in JSON format. With the help of Query DSL, we can accommodate all the complex logic in a single query. Query DSL is designed to handle all real-world complex logic in a single query.

Learn how to use Elasticsearch, from beginner basics to advanced techniques, Enrol for Elasticsearch Online Training  This course will help you to achieve excellence in this domain.

Let us explore Elasticsearch features to understand what it offers.

Elasticsearch Features

Below are features offered by Elasticsearch:

  • Elasticsearch is best suitable for structured and unstructured data.
  • Elasticsearch is an alternative document store for MongoDB and RavenDB.
  • Elasticsearch has implemented denormalization to improve the performance of a search.
  • Many big organizations like Wikipedia, Github, StackOverflow use Elasticsearch for their search engine.
  • It is an open-source technology.
  • It is an easy to use and developer-friendly environment.
  • The Elasticsearch community is very active and always tries to ensure that Elasticsearch is compatible with everything.

Elasticsearch Architecture

Elasticsearch is not a data store primarily. But, technically yes, we can make it a data store. Elasticsearch stores documents and its versions. If two processes simultaneously start writing to a document, latest version will be kept. It doesn’t support ACID (Atomicity, Consistency, Isolation, and Durability) properties like a database.

Read these latest Elasticsearch Interview Questions that help you grab high-paying jobs!

Let us understand its architecture by exploring below concepts.

Nodes and Clusters

Node is defined as a single instance of Elasticsearch. Usually, it runs one instance for each machine. Clusters are termed as a collection of nodes which communicate with each other to read/write to an index. Cluster requires a unique name to avoid unnecessary nodes to join the cluster. There is a master node which manages the whole cluster. Master node is responsible for any changes to clusters like adding a node, removing a node, creating or deleting indices, etc. Each cluster and node has a unique name.

Nodes and Clusters

Each node in a cluster contributes to the searching and indexing capabilities of the cluster. For example, if we have run some search query, each node will execute that to search through the data it stores. Each node supports searching, indexing, manipulating existing data.

Documents and Indices

Whatever data item we store in the cluster is nothing but the document. A document is a JSON object here and we can relate it to rows in database terminology. For example, if you want to store a student, then you will add one object having name and standard as its properties. As we are aware that data will be spread across all the nodes, but do we know how to organize it? These documents are stored under indices. The index is defined as the collection of documents having similar properties or we can say logically related. For instance, an index for orders’ data, products’ data, and customers’ data.

Documents and Indices

Documents have their unique ID, which can be assigned by Elasticsearch or by users while adding them to the index. Any document is uniquely identified by its ID and index. There is no limit to the number of documents being added to the index. 

Indices are also identified by their name. Their names can be used to search for any document.

Shards and replicas

Elasticsearch uses Lucene technology for faster retrieval of data. It uses the power of the Lucene index in a distributed system to retrieve data extremely fast. Shards are termed individual instances of the Lucene index. As data volume increases, index performance also slows down. To overcome this, Elasticsearch uses shards to divide indexes and multiple pieces. Shards are important due to below two reasons.

  • Shards enable us to divide the content horizontally
  • Shards allow parallel operations across multiple nodes which in turn increases performance.

Replicas are invented to avoid any unexpected network failure. Replica shards, as their name implies, are replicas of index shards. Replicas are important in Elasticsearch architecture for the below 2 reasons.

  • In case of shard or node failure, it will act as a life savior for Elasticsearch. Replica shard is never associated with that node on which primary shard is defined
  • Due to replica shards, we can increase the throughput and performance as parallel search can happen on replica shards as well.

While creating an index, we can choose a number of shards and their replicas. However, we can change a number of replicas dynamically at anytime.

MindMajix Youtube Channel

Elasticsearch Advantages

Below are few advantages of Elasticsearch:

  • Elasticsearch is built on Lucene – a full-featured information retrieval library. So, it gives the most efficient and powerful full-text search capabilities of an open source product. It will be great as it is widely known by developers. 
  • Elasticsearch has implemented a lot of features like Faceted search, customized stemming, customized splitting text into words, etc.
  • Elasticsearch supports fuzzy search. As you can find even though there are spelling mistakes in the search text. 
  • Elasticsearch supports the IntelliSense feature which autocompletes your search text by predicting your search based on your search history or completing your text with existing tags. For example, Google search.
  • As Elasticsearch is API driven, any action can be performed using a RESTful API.
  • Elasticsearch stores any changes in data in transaction loss which reduces the risk of data loss.
  • As Elasticsearch is distributed in nature, it is very easy to scale and integrate Elasticsearch in any organization.
  • Elasticsearch supports faceted search which is like having multiple filters on data along with a classification system over them. This search is more robust in nature than normal text-search.
  • Elasticsearch implements multi-tenancy in a better way as a large Elasticsearch index.
  • Using Elasticsearch query DSL, it is very easy to prepare complex queries and tune them precisely. Moreover, query DSL provides a way to rank and group the results.
  • As Elasticsearch uses JSON objects, it is very easy to communicate with other various programming languages.

Elasticsearch Use-cases

Below are few use-cases for Elasticsearch:

  • An online store that allows its customers to explore all the products they sell. In this case, you can use Elasticsearch to store the whole product inventory and catalog. It also allows users to search and use autocomplete option.
  • Consider a scenario where you need to store log or transactions which you can use to analyze trends, summarizations, anomalies or statistics. In this case, you can use Logstash, a part of ELK Stack (Elasticsearch/Logstash/Kibana), to store and parse your data. Logstash helps you to feed data into Elasticsearch.
  • Have you seen the button “Notify me if an item is in stock” or “Notify me if the price of this item falls down” on e-commerce sites? This feature can be achieved with the help of Elasticsearch. Using Elasticsearch, you can reverse-search and have a watch on price movements or stock movements and send the alerts to customers once conditions are satisfied.
  • Consider the requirement where you need to quickly analyze the data and visualize it. In this case, Kibana can be best used with Elasticsearch. Elasticsearch is used to store data and Kibana can visualize that data in various custom dashboards. Kibana is a part of ELK Stack (Elasticsearch, Logstash, Kibana). 

Elasticsearch Vs. RDBMS

Elasticsearch is a No SQL database. It doesn’t have any joins, relations, constraints, or transactional behavior. Elasticsearch is easier to scale if we compare it with RDBMS. To know in detail, let us compare how Elasticsearch is different from RDBMS.

ElasticsearchRDBMS
Semi-structured or unorganized dataStructured and organized data
Eventual ConsistencyTight Consistency
BASE transactionsACID transactions
No Predefined SchemaData and relationships stored in tables.
IndexDatabase
ShardPartition
TypeTable
DocumentRow
FieldColumn
MappingSchema
Everything is indexedIndex
Query DSLSQL

Elasticsearch Vs. MongoDB

Like Elasticsearch, MongoDB is also a document-oriented database management system. They have many features in common like a Document-oriented system, schema free, sharding, replicas, high availability, etc. But still, they cater to a different sets of users. The following table depicts the comparison between Elasticsearch and MongoDB.

FeatureElasticsearchMongoDB
FlexibilitySchema-preciseSchema-flexible
SpeedSpeed remains constant irrespective of the volume of dataSpeed can be increased by adding more shards. But speed will drop if the volume of data increases
SecurityPaid plug in is required to manage access rightsUser management by roles
ScalabilitySimplified scalability Horizontal scalability better than RDBMS
ConcurrencyYesYes
ConsistencyEventual ConsistencyEventual Consistency
Replication MethodsMaster-slave replicationYes
Partitioning MethodsShardingSharding
Transaction ConceptsNoNo

Elasticsearch Vs. Solr

Solr is also a search-text engine that is built on top of Apache Lucene platforms like Elasticsearch. Due to the same platform, they have many similar features. But still, they have many differences when it comes to ease of deployment, scalability, and many more functionalities. Below is the  Comparison Between Elasticsearch and Solr.

FeatureElasticsearchSolr
LicenseOpen SourceOpen Source
Implementation LanguageJavaJava
Data SchemaSchema FreeYes
OSAll OS with JVMAll OS with JVM and servlet container
Secondary IndicesYesYes
Partitioning MethodsShardingSharding
MapReduceWith Hadoop IntegrationNo
ConsistencyEventual ConsistencyEventual Consistency
Transaction ConceptsNoOptimistic Locking
ConcurrencyYesYes
APIsJava, RESTful, HTTP/JSON APIJava, RESTful, HTTP API 
Supported Programing Languages.NET, Java, JavaScript, Perl, Scala, PHP, Python, Ruby, Erlang.NET, Java, JavaScript, Perl, Scala, PHP, Python, Ruby, Erlang, XML
Indexing/SearchingBetter performance of analytical queriesText-oriented
DocumentationLack in documentationVery well documented
Installation and ConfigurationMore intuitiveDetailed documentation

Current Demand and Future of Elasticsearch

Elasticsearch is the most popular, open source, distributed, cross-platform, and scalable search engine. Elasticsearch is growing exponentially since 2010 and making a remarkable impression all over the IT industry. Due to its exponential growth, there is a very high demand for talents having Elasticsearch skills. IT professionals having knowledge of Elasticsearch are hired with a great salary and are valued more. It is trending in the IT industry as it has a very bright future due to its capabilities to handle a large amounts of data and faster search. 

Conclusion

Elasticsearch stands out from all its competitors as it is highly scalable and widely distributed in nature. If you have a large volume of data and you want a faster search, then there is no way you can find anything which is as good as Elasticsearch.

Explore Elasticsearch Sample Resumes! Download & Edit, Get Noticed by Top Employers!
 

Job Support Program

Online Work Support for your on-job roles.

jobservice

Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:

  • Pay Per Hour
  • Pay Per Week
  • Monthly
Learn MoreGet Job Support
Course Schedule
NameDates
Elasticsearch TrainingJan 21 to Feb 05View Details
Elasticsearch TrainingJan 25 to Feb 09View Details
Elasticsearch TrainingJan 28 to Feb 12View Details
Elasticsearch TrainingFeb 01 to Feb 16View Details
Last updated: 08 Jan 2024
About Author

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less