It is a set or a collection of one or more than one nodes or servers that hold your complete data and offers federated indexing and search capabilities across all the nodes.It is identified by a different and unique name that is “Elasticsearch” by default. This name is considered to be important because a node can be a part of a cluster only if it is set up to join the cluster by its name.
An index in Elasticsearch is similar to a table in relational databases.The only difference lies in storing the actual values in the relational database, whereas that is optional in Elasticsearch. An index is capable of storing actual or analyzed values in an index.
Mapping is a process which defines how a document is mapped to the search engine, searchable characteristics are included such as which fields are tokenized as well as searchable. In Elasticsearch an index created may contain documents of all “mapping types”.
A document in Elasticsearch is similar to a row in relational databases.The only difference is that every document in an index can have a different structure or fields but having the same data type for common fields is mandatory.Each field with different data types can occur multiple times in a document. The fields can also contain other documents.
There are resource limitations like RAM, vCPU etc., for scale out, due to which applications employ multiple instances of Elasticsearch on separate machines. Data in an index can be partitioned into multiple portions which are managed by a separate node or instance of Elasticsearch.Each such portion is called a Shard.And an Elasticsearch index has 5 shards by default.
Basically, Elasticsearch will automatically create the mapping according to the data provided by the user in the request body. Its bulk functionality can be used to add more than one JSON object in the index.
The Boolean model is used by the Lucene to find the similar documents, and a formula called practical scoring function is used to calculate the relevance. This formula copies concepts from the inverse document/term-document frequency and the vector space model and adds the modern features like coordination factor, field length normalization as well. Score (q, d) is the relevance score of document “d” for query “q”.
We can perform the following searches in Elasticsearch:
Multi-index, Multitype search: All search APIs can be applied across all multiple indices with the support for the multi-index system. We can search certain tags across all indices as well as all across all indices and all types.
URI search: A search request is executed purely using a URI by providing request parameters.
Request body search:A search request can be executed by a search DSL, that includes the query DSL within the body.
Term-based Queries : Queries like the term query or fuzzy query are the low-level queries that do not have analysis phase.A term Query for the term Foo searches for the exact term in the inverted index and calculates the IDF/TF relevance score for every document that has a term.
Full-text Queries : Queries like match query or query string queries are the high-level queries that understand that mapping of a field.As soon as the query assembles the complete list of items it executes the appropriate low-level query for every term, and finally combines their results to produce the relevance score of every document.
The aggregation framework provides aggregated data based on search query.It can be seen as a unit of work that builds analytic information over the set of documents.There are different types of aggregations with different purpose and outputs.
Yes, Elasticsearch can be used as a replacement for a database as the Elasticsearch is very powerful. It offers features like multitenancy, sharding and Replication, distribution and cloud Realtime get, Refresh, commit, versioning and re-indexing and many more, which make it an apt replacement of a database.