ElasticSearch interview questions

ElasticSearch interview questions

Elasticsearch is a real-time distributed, RESTful search and analytics engine that built on the top of Apache Lucene which is a full-text search engine. you can see Elasticsearch as a distributed storage and that features Real-time Analytics. It is document oriented that stores objects as document and make then indexable so the content of documents is searchable.

Few Known Fact about ElasticSearch

  • Built on Top of Lucene (A full-text search engine by Apache )
  • Document-Oriented (Stores data structured JSON documents)
  • Full-Text Search (Supports Full-text search indexing which giving faster result retrieval)
  • Schema-Free (Uses NoSQL)
  • Restful API (Support Restful APIs for storage and retrieval of records)
  • Supports Autocompletion & Instant Search

Following are the list of Top 25 ElasticSearch Interview questions with their answers

Download ElasticSearch interview questions PDF

ElasticSearch interview questions

Generally, Elasticsearch uses the port range of 9200-9300.
So, to check if it is running on your server just type the URL of the homepage followed by the port number.
Ex: mysitename.com:9200
Some of the companies that use Elasticsearch along with Logstash and Kibana are:
  • Wikipedia
  • Netflix
  • Accenture
  • Stack Overflow
  • Fujitsu
  • Tripwire
  • Medium
  • Swat.io
  • Hip chat
  • IFTTT
Elasticsearch is a search engine that is based on Lucene.It offers a distributed, multitenant – capable full-text search engine with as HTTP (Hyper Text Transfer Protocol) web interface and Schema-free JSON (JavaScript Object Notation) documents.It is developed in Java and is an open source released under Apache License.
Run Following command on your terminal to start Elasticsearch server:
cd elasticsearch
./bin/elasticsearch

curl ‘http://localhost:9200/?pretty’ command is used to check ElasticSearch server is running or not.

A type in Elasticsearch is a logical category of the index whose semantics are completely up to the user.
Basically, Elasticsearch will automatically create the mapping according to the data provided by the user in the request body. Its bulk functionality can be used to add more than one JSON object in the index.

Ex: POST website /_bulk

The Boolean model is used by the Lucene to find the similar documents, and a formula called practical scoring function is used to calculate the relevance.
This formula copies concepts from the inverse document/term-document frequency and the vector space model and adds the modern features like coordination factor, field length normalization as well.
Score (q, d) is the relevance score of document “d” for query “q”.
A document in Elasticsearch is similar to a row in relational databases.The only difference is that every document in an index can have a different structure or fields but having the same data type for common fields is mandatory.Each field with different data types can occur multiple times in a document.
The fields can also contain other documents.
There are resource limitations like RAM, vCPU etc., for scale out, due to which applications employ multiple instances of Elasticsearch on separate machines.
Data in an index can be partitioned into multiple portions which are managed by a separate node or instance of Elasticsearch.Each such portion is called a Shard.And an Elasticsearch index has 5 shards by default.
We can perform the following searches in Elasticsearch:
  • Multi-index, Multitype search: All search APIs can be applied across all multiple indices with the support for the multi-index system.
    We can search certain tags across all indices as well as all across all indices and all types.
  • URI search: A search request is executed purely using a URI by providing request parameters.
  • Request body search:A search request can be executed by a search DSL, that includes the query DSL within the body.
  • Term-based Queries : Queries like the term query or fuzzy query are the low-level queries that do not have analysis phase.A term Query for the term Foo searches for the exact term in the inverted index and calculates the IDF/TF relevance score for every document that has a term.
  • Full-text Queries : Queries like match query or query string queries are the high-level queries that understand that mapping of a field.As soon as the query assembles the complete list of items it executes the appropriate low-level query for every term, and finally combines their results to produce the relevance score of every document.
By using the command PUT before the index name, creates the index and if you want to add another index then use the command POST before the index name.
Ex: PUT website

An index named computer is created

By using GET / _index name/ indices we can get the list of indices present in the cluster.
The Queries are divided into two types with multiple queries categorized under them.
  • Full-text queries: Match Query, Match phrase Query, Multi match Query, Match phrase prefix Query, common terms Query, Query string Query, simple Query String Query.
  • Term level queries: term Query, term set Query, terms Query, Range Query, Prefix Query, wildcard Query, regexp Query, fuzzy Query, exists Query, type Query, ids Query.
To retrieve a document in Elasticsearch, we use the GET verb followed by the _index, _type, _id.
Ex: GET / computer / blog / 123?=pretty