Unlike other articles, here i am going to discuss about the Elasticsearch theoretical background in the form of questions and answers. I hope that this would be more appropriate way to emphasize what is Elasticsearch and its importance. So lets start.
1. What is ElasticSearch?
ElasticSearch is distributed full text search engine based on Apache Lucene. It is kind of NoSQL database stores the data as JSON Documents. Lucene is the underlying technology that Elasticsearch uses for extremely fast data retrieval.
In ElasticSearch everything is indexed (It index everything in the document to make the search faster). Therefore it provides speed search operations across large data set. ElasticSearch is commonly used to build search engine because it is optimized for it.
ElasticSearch is built by targeting speed and scalability.
SPEED : Elasticsearch is fast, really fast! Since everything is indexed, you’re never left with index envy. You search and retrieve the data in really speedy manner due the the inbuilt indexing support.
SCALABILITY : Distributed architecture of the ElasticSearch helps to scale it horizontally.
2. What are the advantages of ElasticSearch?
ElasticSearch is really fast because everything is indexed. Therefore complex search queries can be performed on larger data set and retrieve the data in lesser time (speedy manner)
Horizontally Scalable Architecture
ElasticSearch has a distributed architecture and it supports the horizontal scaling with adding more Nodes. Elasticsearch cluster is not limited to a single machine, you can infinitely scale your system to handle higher traffic and larger data sets.
Easy development and integration
ElasticSearch client libraries are available for many programming languages such as Java, PHP, Python, .Net and much more. In addition, most of the frameworks provide their own libraries and implementations for the accessing the ElasticSearch. Therefore it is easy to develop the application that uses ElasticSearch for search and data retrieval operations. In addition, it provides a REST api to access data and perform data search queries.
3. What are the limitations of ElasticSearch with compared to other NoSQL databases?
No Transaction support
ElasticSearch is not a transactional database and it does not support transactions. Therefore it may lose data and data integrity issues may arise.
Slow on inserting new data
Adding new data may require to create or update indexes. Therefore insertion operation may take a considerable amount of time with compared to other NoSQL databases.
Lack of authentication and authorization.
No default (built in free) authentication and authorization module. You have to buy plugins (like X-Pack) if you need any authentication and authorization support.
4. Can we use the ElasticSearch as the main application database?
There is no direct answer for this question. It depends on the type of application that you are building. ElasticSearch is not a transactional database and does not support transactions. On the other hand, it slows on inserting data as the indexes need to be updated.
If you are building a search engine and that does not need any database transactional support, you can go head with Elasticsearch as the primary data storage. But in real world, it is hard to find such applications. Every application has a some sort of transactional behaviour and database transactional support is expected. Therefore Elasticsearch is used as a secondary storage that helps to improve the speed data search and retrieval.
In most of the cases, the recommendation is to use Elasticsearch as the secondary storage along with any transactional persistence storage. Your primary storage can be any transactional database either relational or NoSQL. Elasticsearch is used to perform speed search operations in the application. Therefore if your application has a large data set and you need to perform the speed search operations on the data set, then you can migrate or replicate the relevant data set to the ElasticSearch. The other data can be kept remaining in the primary data storage which is your main persistence storage.
If you already have a working schema in SQL but you have slow search (not necessarily SQL’s strength), then maybe you copy searchable data to ElasticSearch and use it to perform searches.
5. What are the similarities of ElasticSearch and other NoSQL databases? (like mongoDB)
- Store the data as JSON objects.
- Allow querying the body of the JSON Objects.
6. Why transactional support is much important in NoSQL databases?
In NoSQL databases, data is not normalized (they are denormalized). Unlike relational database, the data is not split up into multiple collections. In the relational database, the data will be normalized and will be split up into multiple tables. The related data will be connected and referred through foreign key constraints.
In NoSQL world, this is completely different. Data is denormalized and related referrencial data will be managed as nested documents.
E.g:- think about the employee and department relationship. Each employee document will contain the department (nested) document that has the information about the employee departments.
In NoSQL, data is duplicated in each document. Therefore whenever the data is changed, it need to be updated and reflected in all the required documents (multiple documents need to be updated). Since the elasticsearch does not support transactions, we cannot guarantee on the integrity of the data after performing such operations. This is because the partially updating of the documents will not rollback the operations (partially updated data) due to the lack of the transaction support. Therefore this will lead to arise data integrity issues.
7. What is the difference between mongoDB and the Elasticsearch?
mongoDB is a general purpose database that can be used as the main persistence storage. It provides transactional support. If we need to speed up the data retrieval, we can create the indexes on the collections. But It is optional and it is decided by the developer.
Elasticsearch is a full text based search engine and it does not provide transactional support. It performs speed search operations since everything (everything in the document) is indexed. Therefore insertion operation is slower when compared to the insertion operation of the mongoDB. This is because the index need to be created/updated whenever new entry is added. In addition, everything is indexed by default and developer cannot control that behaviour.
8. Why do we need the ElasticSearch? Why can’t we use the full text search feature of the mongoDB ?
mongoDB also provides the full text search feature. It is ok to proceed with the mongodb’s full text search feature when you need a basic full text search feature on medium scale data set. If you plan to proceed with mongoDB, make sure that the indexes are created on the relevant fields.
If the data set it too large and searching is complex, then it is better to go with Elasticsearch. It has been developed and optimized to work as a full text search engine with speed data retrieval.
Hope this gives you a proper introduction and overview of elasticsearch with its importance.