Mindmajix

Learn About Searching Data in Elasticsearch Search

Search

The search API allows you to execute a search query and get back search hits that match the query. The query can either be provided using a simple query string as a parameter, or using a request body.

Searching

As with everything else, Elasticsearch can be searched using HTTP.

It’s time to move on to more exciting things – searching. We’ll first need some sample data. Below is a number of indexing requests that we’ll use.

Indexing request for sample data.

curl -XPUT "http://localhost:9200/movies/movie/1" -d'
 {
 "title": "The Godfather",
 "director": "Francis Ford Coppola",
 "year": 1972,
 "genres": ["Crime", "Drama"]
 }'
 curl -XPUT "http://localhost:9200/movies/movie/2" -d'
 {
 "title": "To Kill a Mockingbird",
 "director": "Robert Mulligan",
 "year": 1962,
 "genres": ["Crime", "Drama", "Mystery"]
 }'
 curl -XPUT "http://localhost:9200/movies/movie/3" -d'
 {
 "title": "Lawrence of Arabia",
 "director": "David Lean",
 "year": 1962,
 "genres": ["Adventure", "Biography", "Drama"]
 }'
curl -XPUT "http://localhost:9200/movies/movie/4" -d'
 {
 "title": "Apocalypse Now",
 "director": "Francis Ford Coppola",
 "year": 1979,
 "genres": ["Drama", "War"]
 }'
 curl -XPUT "http://localhost:9200/movies/movie/5" -d'
 {
 "title": "Kill Bill: Vol. 1",
 "director": "Quentin Tarantino",
 "year": 2003,
 "genres": ["Action", "Crime", "Thriller"]
 }'
 curl -XPUT "http://localhost:9200/movies/movie/6" -d'
 {
 "title": "The Assassination of Jesse James by the Coward Robert Ford",
 "director": "Andrew Dominik",
 "year": 2007,
 "genres": ["Biography", "Crime", "Drama"]
 }'

It’s worth pointing out that ElasticSearch has an endpoint (_bulk) for indexing multiple documents with a single request. We use six separate requests.

The _search endpoint

Now that we have some movies in our index, let’s see if we can find them again by searching. In order to search with ElasticSearch, we use the _search endpoint, optionally with an index and type. That is, by making requests to an URL following this pattern: <index>/<type>/_search, where index and type are both optional. In other words, in order to search for our movies we can make GET or POST requests to either of the following URLs:

  • http://localhost:9200/_search – Search across all indexes and all types.
  • http://localhost:9200/movies/_search – Search across all types in the movies index.
  • http://localhost:9200/movies/movie/_search – Search explicitly for documents of type movie within the movies index.

If you use the first URL, searching across all indexes, and if you have Marvel installed, you’ll probably get other hits from an index other than the “movies” index. This is because Marvel, by default, indexes various metrics to the same cluster that it’s running on.

A search request limited to the ‘movies’ index but without any other criteria.

curl -XGET "http://localhost:9200/movies/_search"

The result should look something like in the image below.

elasticsearch search

The search request body and ElasticSearch’s query DSL

If we simply make a GET request to the _search endpoint like we did above, we’ll get all of the movies back. In order to make a more useful search request, we also need to supply a request body with a query.

The query element within the search request body allows to define a query using the Query DSL.

{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

The request body should be a JSON object which, among other things, can contain a property named “query”, in which we can use ElasticSearch’s query DSL.

{
 "query": {
 //Query DSL here
 }
 }

The query DSL is ElasticSearch’s own domain specific language based on JSON in which queries and filters can be expressed. Think of it like ElasticSearch’s equivalent of SQL for a relational database.

Basic free text search

The query DSL features a long list of query types that we can use. For “ordinary” free text search, we’ll most likely want to use one called “query string query”.

A query string query is an advanced query with a lot of different options that ElasticSearch will parse and transform into a tree of simpler queries. Still, it can be very easy to use if we ignore all of its optional parameters and simply feed it a string to search for. Let’s try a search for the word “Robert”:

A search request, this time using POST, with a query_string query in the POST data.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "query_string": {
 "query": "Robert"
     }
   }
 }'

Given that you have indexed six movies as listed at the starting, the above request should result in two hits. The movies “The Assassination of Jesse James by the Coward Robert Ford” and “To Kill a Mockingbird”. The reason for this is that a query_string query by default searches in a special field named “_all”. This field is automatically created during indexing and is, by default, made up text extracted from each of the document’s fields. As such, both documents contains the word “Robert” in their _all fields. The first from its title property and the second from its director property.

Relevance scoring

The relevance score of each document is represented by a positive floating-point number called the _score. The higher the _score, the more relevant the document.

If you run the above search request, the two hits in the result will both have a _score property. This property indicates how relevant the hit is with regards to the query and it’s what ElasticSearch sorts the hits by unless we’ve told it otherwise.

The value of the _score property is always between 0 and 1 where 1 is the most relevant. Note though that this value is normalized, meaning that it’s useful for comparing the difference in relevancy between two hits in the same result, but not between two different search results.

Fine tuning query string queries

In the previous example, we used a query_string query with no other properties than query, in which we specified the search term. As mentioned before, the query string query has a number of settings that we can specify and if we don’t, it will use sensible default values.

One such setting is called “fields” and that can be used to specify a list of fields to search in. Let’s use that to only search in the title field.

The search request extended with a fields property.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "query_string": {
 "query": "Robert",
 "fields": ["title"]
      }
    }
 }'

Run this request and this time you’ll only get a single hit.

The query_string query type also supports a wide range of other settings. For instance, take a look at the below request:

Searching for three words.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "query_string": {
 "query": "Francis Ford Coppola"
       }
    }
 }'

The above request will result in three hits. Two of them will be the movies directed by Francis Ford Coppola. The third will be The Assassination of Jesse James by the Coward Robert Ford as that contains the word “Ford”. This is because the query property will be parsed as “Francis OR Ford OR Coppola”. Should we want to change this behavior, we can do so by setting the default_operator of the query to “AND”, like this:

Using the default_operator parameter to customize how the query parameter is parsed.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "query_string": {
 "query": "Francis Ford Coppola",
 "default_operator": "AND"
       }
    }
 }'

The result to the above request will be limited to documents whose _all field contains all of the words “Francis”, “Ford” and “Coppola”. However, they don’t have to come in that order. Should we want that, simply wrap the query in double quotes (“query”: “\“Francis Ford Coppola”\”).

Highlighting

A common feature when building free text search functionality is highlighting; making matching words within the hits stand out visually to the user. Highlighting allows to highlight search results on one or more fields. The implementation uses either the lucene highlighter, fast-vector-highlighter or postings-highlighter

In order to retrieve highlights with ElasticSearch, we can add an additional property to the search request body named highlight. The value of this property should be a JSON object that describes which fields we want highlights from as well as, optionally, details about how the highlights should work. Below is a an example request that includes a highlight property.

Searching for movies with Robert in the title and requesting highlights for the title field.

curl -XPOST "http://localhost:9200/movies/_search" -d'
 {
 "query": {
 "query_string": {
 "query": "Robert",
 "fields": ["title"]
 }
 },
 "highlight": {
 "fields": {
 "title": {}
       }
    }
 }'

The response to the above request looks something like this:

The response to a search request including highlights.

{
 "took": 28,
 "timed_out": false,
 "_shards": {
 "total": 5,
 "successful": 5,
 "failed": 0
 },
 "hits": {
 "total": 1,
 "max_score": 0.3125,
 "hits": [
 {
 "_index": "movies",
 "_type": "movie",
 "_id": "6",
 "_score": 0.3125,
 "_source": {
 //Omitted for brevity
 },
 "highlight": {
 "title": [
 "The Assassination of Jesse James by the Coward <em>Robert</em> Ford"
                          ]
                     }
               }
          ]
     }
 }

There are two interesting things to note in the above response. First of all, the hit object includes a property we haven’t seen before, the highlight property. Unsurprisingly, this contains one (in this case) or more highlighted extracts from the title field. Second, within the returned highlight, the word Robert has been enclosed in em tags.

Highlighting can be customized in a number of ways. We can choose how many fragments should be extracted from each field and how long they should be, customize the tags that enclose highlighted terms, use different highlighting implementations and even use a separate query for highlighting.


0 Responses on Learn About Searching Data in Elasticsearch Search"

Leave a Message

Your email address will not be published. Required fields are marked *

Copy Rights Reserved © Mindmajix.com All rights reserved. Disclaimer.
Course Adviser

Fill your details, course adviser will reach you.